This is my implementation of the Pig Latin recommended exercise in The Rust Programming Language book. I am using the unicode segmentation crate to split the string into words while also keeping the delimiters. Any pointers on making this code more idiomatic or run more optimal?
use unicode_segmentation::UnicodeSegmentation;
#[allow(overlapping_patterns)]
fn translate_word(s: &str) -> String {
let mut it = s.chars();
let first = it.next().unwrap();
match first.to_ascii_lowercase() {
'a' | 'e' | 'i' | 'o' | 'u' => format!("{}-hay", s),
'a'..='z' => format!("{}-{}ay", it.collect::<String>(), first),
_ => s.to_string(),
}
}
pub fn translate(s: &str) -> String {
s.split_word_bounds()
.map(translate_word)
.collect::<Vec<_>>()
.join("")
}
The code is inside a module named pig_latin
.
1 Answer 1
Be aware that Rust typically uses 4 spaces
It's fine if you consistently use 2 spaces (especially if you override it in rustfmt.toml), but just be aware that the standard is different.
Collect directly to a String
Instead of collecting to a Vec
and then copying that over to a new Vec
(within String
), collect to a String
directly:
pub fn translate(s: &str) -> String {
s.split_word_bounds().map(translate_word).collect()
}
Use Chars::as_str
When you use str::chars
, the specific iterator that it returns is called Chars
. It has a handy function to get the remaining part, so you don't need to allocate a new string:
'a'..='z' => format!("{}-{}ay", it.as_str(), first),
Use Cow
This is a bit of an advanced optimization that you don't need to do—the Book doesn't even mention it once.
Currently, you allocate a new String
even when the output is identical to the input. Instead, return Cow<str>
: if the first character isn't a letter, you can return Cow::Borrowed(s)
, which points to the existing &str
. If it does start with a letter, return Cow::Owned(format!(...))
, which has the same overhead as it did before. Here, I'm using .into()
instead of writing Cow::Owned
and Cow::Borrowed
explicitly. You can do either.
fn translate_word(s: &str) -> Cow<str> {
let mut it = s.chars();
let first = it.next().unwrap();
match first.to_ascii_lowercase() {
'a' | 'e' | 'i' | 'o' | 'u' => format!("{}-hay", s).into(),
'a'..='z' => format!("{}-{}ay", it.as_str(), first).into(),
_ => s.into(),
}
}