7
\$\begingroup\$

I wrote a barebones version of wc in rust. wc is a program that counts the number of characters, words, and lines in a file and outputs those values to the command line. Here is an example of the output:

 9 25 246 Cargo.toml
 52 163 1284 src/main.rs
 61 188 1530 total

My version currently lacks the proper output alignment, and it doesn't print the total (it also lacks the command line options, and it panics when fed a directory). But I would like to get some feedback before I go any further.

use std::env;
use std::fs::read_to_string;
struct InputFile {
 words: u32,
 lines: u32,
 characters: u32,
 name: String,
}
impl InputFile {
 fn new(name: &String) -> Self {
 let content = read_to_string(name).unwrap();
 let (mut characters, mut words, mut lines) = (0, 0, 0);
 let mut spaced: bool = false;
 for c in content.chars() {
 if c as u8 != 0 {
 characters += 1;
 }
 if c != ' ' && c != '\n' {
 spaced = false
 }
 if c == '\n' {
 lines += 1;
 if !spaced {
 words += 1;
 spaced = true;
 }
 }
 if c == ' ' && !spaced {
 words += 1;
 spaced = true;
 }
 }
 Self { lines, words, characters, name: name.to_string() }
 }
}
impl std::fmt::Display for InputFile {
 fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 write!(f, "{} {} {} {}",
 self.lines, self.words, self.characters, self.name
 )
 }
}
fn main() {
 let files: Vec<String> = env::args().collect();
 for f in &files[1..] {
 println!("{}", InputFile::new(f));
 }
}
asked Apr 11, 2021 at 1:12
\$\endgroup\$
1
  • \$\begingroup\$ Glad I am not the only one doing this ;-) github.com/jonasthewolf/wc \$\endgroup\$ Commented Jul 26, 2021 at 9:55

1 Answer 1

1
\$\begingroup\$

InputFile doesn't appear to be a file — a better name might be Statistics.

usize is conventionally used for indexes and sizes instead of u32.

Don't take an argument by &String. Since the algorithm works for not only files but other streams of characters as well, consider taking a BufRead argument.

Reading the whole file into memory isn't efficient — an alternative is to use the utf8-chars crate to identify characters.

c as u8 != 0 is just c != '0円'.

It took me a while to figure out what spaced does — in_word might be a better name.

char::is_whitespace checks for all kinds of whitespace.

In main, Iterator::skip allows you to skip one argument without allocating a Vec.


Here's my version:

use {
 anyhow::Result,
 parse_display::Display,
 std::{
 env,
 fs::File,
 io::{BufRead, BufReader},
 },
 utf8_chars::BufReadCharsExt,
};
#[derive(Clone, Debug, Display)]
#[display("{characters} {words} {lines}")]
struct Stats {
 characters: usize,
 words: usize,
 lines: usize,
}
impl Stats {
 fn new<R: BufRead>(mut reader: R) -> Result<Self> {
 let mut stats = Stats {
 characters: 0,
 words: 0,
 lines: 0,
 };
 let mut in_word = false;
 for c in reader.chars_raw() {
 let c = c?;
 if c != '0円' {
 stats.characters += 1;
 }
 if !c.is_whitespace() {
 in_word = true;
 } else if in_word {
 stats.words += 1;
 in_word = false;
 }
 if c == '\n' {
 stats.lines += 1;
 }
 }
 Ok(stats)
 }
}
fn main() -> Result<()> {
 for path in env::args().skip(1) {
 let file = BufReader::new(File::open(&path)?);
 let stats = Stats::new(file)?;
 println!("{} {}", stats, path);
 }
 Ok(())
}
answered Apr 18, 2021 at 11:09
\$\endgroup\$
1
  • 2
    \$\begingroup\$ nitpicking: what if the output doesn't end with \n, or with a space? In that case the number of words or the number of lines gets out wrong. It's typical for these kinds of counting problems to need another extra iteration at the very end, to finish the words and the lines. \$\endgroup\$ Commented Apr 19, 2021 at 15:10

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.