Count characters, words, and lines in files (wc in rust)

Question 1

I wrote a barebones version of wc in rust. wc is a program that counts the number of characters, words, and lines in a file and outputs those values to the command line. Here is an example of the output:

 9 25 246 Cargo.toml
 52 163 1284 src/main.rs
 61 188 1530 total

My version currently lacks the proper output alignment, and it doesn't print the total (it also lacks the command line options, and it panics when fed a directory). But I would like to get some feedback before I go any further.

use std::env;
use std::fs::read_to_string;
struct InputFile {
 words: u32,
 lines: u32,
 characters: u32,
 name: String,
}
impl InputFile {
 fn new(name: &String) -> Self {
 let content = read_to_string(name).unwrap();
 let (mut characters, mut words, mut lines) = (0, 0, 0);
 let mut spaced: bool = false;
 for c in content.chars() {
 if c as u8 != 0 {
 characters += 1;
 }
 if c != ' ' && c != '\n' {
 spaced = false
 }
 if c == '\n' {
 lines += 1;
 if !spaced {
 words += 1;
 spaced = true;
 }
 }
 if c == ' ' && !spaced {
 words += 1;
 spaced = true;
 }
 }
 Self { lines, words, characters, name: name.to_string() }
 }
}
impl std::fmt::Display for InputFile {
 fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
 write!(f, "{} {} {} {}",
 self.lines, self.words, self.characters, self.name
 )
 }
}
fn main() {
 let files: Vec<String> = env::args().collect();
 for f in &files[1..] {
 println!("{}", InputFile::new(f));
 }
}

Question 2

Glad I am not the only one doing this ;-) github.com/jonasthewolf/wc

Question 3

InputFile doesn't appear to be a file — a better name might be Statistics.

usize is conventionally used for indexes and sizes instead of u32.

Don't take an argument by &String. Since the algorithm works for not only files but other streams of characters as well, consider taking a BufRead argument.

Reading the whole file into memory isn't efficient — an alternative is to use the utf8-chars crate to identify characters.

c as u8 != 0 is just c != '0円'.

It took me a while to figure out what spaced does — in_word might be a better name.

char::is_whitespace checks for all kinds of whitespace.

In main, Iterator::skip allows you to skip one argument without allocating a Vec.

Here's my version:

use {
 anyhow::Result,
 parse_display::Display,
 std::{
 env,
 fs::File,
 io::{BufRead, BufReader},
 },
 utf8_chars::BufReadCharsExt,
};
#[derive(Clone, Debug, Display)]
#[display("{characters} {words} {lines}")]
struct Stats {
 characters: usize,
 words: usize,
 lines: usize,
}
impl Stats {
 fn new<R: BufRead>(mut reader: R) -> Result<Self> {
 let mut stats = Stats {
 characters: 0,
 words: 0,
 lines: 0,
 };
 let mut in_word = false;
 for c in reader.chars_raw() {
 let c = c?;
 if c != '0円' {
 stats.characters += 1;
 }
 if !c.is_whitespace() {
 in_word = true;
 } else if in_word {
 stats.words += 1;
 in_word = false;
 }
 if c == '\n' {
 stats.lines += 1;
 }
 }
 Ok(stats)
 }
}
fn main() -> Result<()> {
 for path in env::args().skip(1) {
 let file = BufReader::new(File::open(&path)?);
 let stats = Stats::new(file)?;
 println!("{} {}", stats, path);
 }
 Ok(())
}

Question 4

nitpicking: what if the output doesn't end with \n, or with a space? In that case the number of words or the number of lines gets out wrong. It's typical for these kinds of counting problems to need another extra iteration at the very end, to finish the words and the lines.

L. F. L. F. 9,6952 gold badges27 silver badges69 bronze badges · Answer 1 · 2021-04-18 11:09:48Z

InputFile doesn't appear to be a file — a better name might be Statistics.

usize is conventionally used for indexes and sizes instead of u32.

Don't take an argument by &String. Since the algorithm works for not only files but other streams of characters as well, consider taking a BufRead argument.

Reading the whole file into memory isn't efficient — an alternative is to use the utf8-chars crate to identify characters.

c as u8 != 0 is just c != '0円'.

It took me a while to figure out what spaced does — in_word might be a better name.

char::is_whitespace checks for all kinds of whitespace.

In main, Iterator::skip allows you to skip one argument without allocating a Vec.

Here's my version:

use {
 anyhow::Result,
 parse_display::Display,
 std::{
 env,
 fs::File,
 io::{BufRead, BufReader},
 },
 utf8_chars::BufReadCharsExt,
};
#[derive(Clone, Debug, Display)]
#[display("{characters} {words} {lines}")]
struct Stats {
 characters: usize,
 words: usize,
 lines: usize,
}
impl Stats {
 fn new<R: BufRead>(mut reader: R) -> Result<Self> {
 let mut stats = Stats {
 characters: 0,
 words: 0,
 lines: 0,
 };
 let mut in_word = false;
 for c in reader.chars_raw() {
 let c = c?;
 if c != '0円' {
 stats.characters += 1;
 }
 if !c.is_whitespace() {
 in_word = true;
 } else if in_word {
 stats.words += 1;
 in_word = false;
 }
 if c == '\n' {
 stats.lines += 1;
 }
 }
 Ok(stats)
 }
}
fn main() -> Result<()> {
 for path in env::args().skip(1) {
 let file = BufReader::new(File::open(&path)?);
 let stats = Stats::new(file)?;
 println!("{} {}", stats, path);
 }
 Ok(())
}

nitpicking: what if the output doesn't end with \n, or with a space? In that case the number of words or the number of lines gets out wrong. It's typical for these kinds of counting problems to need another extra iteration at the very end, to finish the words and the lines.

Stack Exchange Network

Count characters, words, and lines in files (wc in rust)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Count characters, words, and lines in files (wc in rust)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions