3
\$\begingroup\$

I've implemented a simple program to sort an input of unique numbers given as decimals on a separate line. The idea is, since all numbers are unique, to use a (long) bitfield where a 1 on index n represents that the number n is in the sorted list. The algorithm thus consists of these phases (as described in the book):

/* phase 1: initialize set to empty */
 for i = [0, n)
 bit[i] = 0
/* phase 2: insert present elements into the set */
 for each i in the input file
 bit[i] = 1
/* phase 3: write sorted output */
 for i = [0, n)
 if bit[i] == 1
 write i on the output file

In order to (re-)freshen my Rust, I chose to implement this in Rust, to be an efficient replacement for sort(1) when given the -n option, assuming unique input. Thus the program can be called as:

$ bitfield # sorts stdin, writes to stdout
$ bitfield foo # sorts foo, writes to stdout
$ bitfield foo - bar # sorts foo, stdin, bar, writes to stdout

As with the exercise, I assume n (MAXNUM) to be 10,000,000:

use std::io::*;
use std::fs::File;
use std::env;
const MAXNUM: usize = 10_000_000;
const _MAXLINES: usize = 1_000_000;
fn fill(src: impl BufRead, bitfield: &mut[u8; MAXNUM >> 3]) -> Result<()> {
 for line in src.lines() {
 let num = line?.parse::<usize>().unwrap();
 if num > MAXNUM {
 return Err(Error::new(ErrorKind::Other, "number too large!"));
 }
 bitfield[num >> 3] |= 1 << (num & 0b111);
 }
 return Ok(());
}
fn print_sorted(bitfield: &[u8; MAXNUM >> 3], dest: impl Write) -> Result<()> {
 let mut dest = BufWriter::new(dest);
 for (i, byte) in bitfield.iter().enumerate() {
 for bit in 0..=0b111 {
 if byte & (1 << bit) != 0 {
 write!(&mut dest, "{}\n", (i << 3) | bit)?;
 }
 }
 }
 return Ok(());
}
fn main() -> Result<()> {
 let mut bitfield: [u8; MAXNUM >> 3] = [0; MAXNUM >> 3];
 let args: Vec<_> = env::args().skip(1).collect();
 if args.is_empty() {
 fill(stdin().lock(), &mut bitfield)?;
 }
 for arg in args {
 if arg == "-" {
 fill(stdin().lock(), &mut bitfield)?;
 } else {
 let f = match File::open(&arg) {
 Err(e) => {
 eprintln!("{}: {}", &arg, e);
 continue;
 }
 Ok(f) => BufReader::new(f),
 };
 fill(f, &mut bitfield)?;
 }
 }
 print_sorted(&bitfield, stdout())?;
 return Ok(());
}

To test, I used GNU/shuf(1), time(1), and cmp(1):

$ shuf -i 1-10000000 -n 1000000 > foo
$ time ./target/release/bitsort foo > bar
0.19s user 0.03s system 97% cpu 0.221 total
$ time sort -n foo > bar2
1.56s user 0.06s system 291% cpu 0.556 total
$ cmp bar bar2

Ideas?

asked Feb 12, 2021 at 21:24
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

use declarations

use std::env;
use std::fs::File;
use std::io::*;

That's a sneaky way to import io::{Error, Result}. For the sake of readability, it may be beneficial to spell out io:: each time. Also see the Error handling section below.

use declarations can be condensed. For example:

use std::{env, fs::File, io};

Allocation

let args: Vec<_> = env::args().skip(1).collect();

This is an unnecessary allocation. The main function can be rewritten in terms of iterators using peekable:

fn main() -> Result<()> {
 // ...
 let mut args = env::args().skip(1).peekable();
 if args.peek().is_none() {
 // read from stdin
 }
 for arg in args {
 // handle arg
 }
 // ...
}

Error handling

Instead of unwrapping the result of parse, you can propagate the error.

For a larger program, I would suggest using the anyhow crate for error handling, which reduces a lot of boilerplate.

Miscellaneous

Don't write return Ok(()); at the end of a function — just use Ok(()).

I think you meant >= here: if num > MAXNUM.

In print_sorted, you wrap the writer in a BufWriter. What if the writer is already buffered? I would leave the decision to the caller.

You can use let mut bitfield = [0_u8; MAXNUM >> 3]; for less duplication.

Using a crate like bit-set might be good for readability.

answered Feb 13, 2021 at 6:31
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Tyvm! Regarding import, I do need a bit more (ErrorKind, stdin, stdout, BufRead, Write, BufReader, BufWriter) unfortunately. I do like peek though, although it needs args to become mut then, as you did. I now propagate the error and also used expressions instead of explicit returns. I originally chose to use BufWriter, since I also used the BufRead trait and to be consistent. I will keep bit-set in mind, when I have to "real work" programs. Thanks again! \$\endgroup\$ Commented Feb 13, 2021 at 12:32

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.