I wrote a barebones version of wc
in rust. wc
is a program that counts the number of characters, words, and lines in a file and outputs those values to the command line. Here is an example of the output:
9 25 246 Cargo.toml
52 163 1284 src/main.rs
61 188 1530 total
My version currently lacks the proper output alignment, and it doesn't print the total (it also lacks the command line options, and it panics when fed a directory). But I would like to get some feedback before I go any further.
use std::env;
use std::fs::read_to_string;
struct InputFile {
words: u32,
lines: u32,
characters: u32,
name: String,
}
impl InputFile {
fn new(name: &String) -> Self {
let content = read_to_string(name).unwrap();
let (mut characters, mut words, mut lines) = (0, 0, 0);
let mut spaced: bool = false;
for c in content.chars() {
if c as u8 != 0 {
characters += 1;
}
if c != ' ' && c != '\n' {
spaced = false
}
if c == '\n' {
lines += 1;
if !spaced {
words += 1;
spaced = true;
}
}
if c == ' ' && !spaced {
words += 1;
spaced = true;
}
}
Self { lines, words, characters, name: name.to_string() }
}
}
impl std::fmt::Display for InputFile {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "{} {} {} {}",
self.lines, self.words, self.characters, self.name
)
}
}
fn main() {
let files: Vec<String> = env::args().collect();
for f in &files[1..] {
println!("{}", InputFile::new(f));
}
}
-
\$\begingroup\$ Glad I am not the only one doing this ;-) github.com/jonasthewolf/wc \$\endgroup\$Jonas Wolf– Jonas Wolf2021年07月26日 09:55:14 +00:00Commented Jul 26, 2021 at 9:55
1 Answer 1
InputFile
doesn't appear to be a file — a better name might be Statistics
.
usize
is conventionally used for indexes and sizes instead of u32
.
Don't take an argument by &String
. Since the algorithm works for not only files but other streams of characters as well, consider taking a BufRead
argument.
Reading the whole file into memory isn't efficient — an alternative is to use the utf8-chars
crate to identify characters.
c as u8 != 0
is just c != '0円'
.
It took me a while to figure out what spaced
does — in_word
might be a better name.
char::is_whitespace
checks for all kinds of whitespace.
In main
, Iterator::skip
allows you to skip one argument without allocating a Vec
.
Here's my version:
use {
anyhow::Result,
parse_display::Display,
std::{
env,
fs::File,
io::{BufRead, BufReader},
},
utf8_chars::BufReadCharsExt,
};
#[derive(Clone, Debug, Display)]
#[display("{characters} {words} {lines}")]
struct Stats {
characters: usize,
words: usize,
lines: usize,
}
impl Stats {
fn new<R: BufRead>(mut reader: R) -> Result<Self> {
let mut stats = Stats {
characters: 0,
words: 0,
lines: 0,
};
let mut in_word = false;
for c in reader.chars_raw() {
let c = c?;
if c != '0円' {
stats.characters += 1;
}
if !c.is_whitespace() {
in_word = true;
} else if in_word {
stats.words += 1;
in_word = false;
}
if c == '\n' {
stats.lines += 1;
}
}
Ok(stats)
}
}
fn main() -> Result<()> {
for path in env::args().skip(1) {
let file = BufReader::new(File::open(&path)?);
let stats = Stats::new(file)?;
println!("{} {}", stats, path);
}
Ok(())
}
-
2\$\begingroup\$ nitpicking: what if the output doesn't end with
\n
, or with a space? In that case the number of words or the number of lines gets out wrong. It's typical for these kinds of counting problems to need another extra iteration at the very end, to finish the words and the lines. \$\endgroup\$Roland Illig– Roland Illig2021年04月19日 15:10:39 +00:00Commented Apr 19, 2021 at 15:10