I am a beginner to Rust, started learning because of Advent of Code 2022. I solved Day 7's problem by implementing a tree data structure and traversing it. TLDR, the problem provides a log of terminal inputs and outputs, consisting of either ls
or cd <dir>
commands, such as
$ cd /
$ ls
dir a
1 b
$ cd a
$ ls a
2 a
3 b
And here is my solution:
use std::cell::RefCell;
use std::default::Default;
use std::fmt::Debug;
use std::ops::Add;
use std::rc::Rc;
const ITER_ERR: &str = "err: iterator is empty";
const PARSE_ERR: &str = "err: can't parse int";
const SPLIT_ERR: &str = "err: can't split string";
struct Solver {
content: String,
}
trait Solvable {
fn solve(self) -> (usize, usize);
}
// Blanket implementation
trait NodeValTrait<T>: Add<Output = T> + Default + Copy + Debug {}
impl<T> NodeValTrait<T> for T where T: Add<Output = T> + Default + Copy + Debug {}
#[derive(Debug)]
struct TreeNode<T: NodeValTrait<T>> {
val: Option<T>,
name: String,
children: Vec<Rc<RefCell<TreeNode<T>>>>,
}
impl<T: NodeValTrait<T>> TreeNode<T> {
fn new(s: impl ToString) -> TreeNode<T> {
TreeNode {
val: None,
name: s.to_string(),
children: Vec::new(),
}
}
fn val(&mut self, val: T) {
self.val = Some(val);
}
fn push(&mut self, node: TreeNode<T>) {
self.children.push(Rc::new(RefCell::new(node)));
}
fn insert_and_find(&mut self, name: impl ToString) -> Rc<RefCell<TreeNode<T>>> {
let name = name.to_string();
for child in &self.children {
if child.borrow().name == name {
return Rc::clone(child);
}
}
let child = TreeNode::new(name);
self.push(child);
Rc::clone(&self.children[&self.children.len() - 1])
}
fn sum(&self) -> T {
let mut res: T = Default::default();
if let Some(val) = &self.val {
res = res + *val;
}
for child in &self.children {
res = res + child.borrow().sum();
}
res
}
fn dfs<F, T2>(&self, f: F) -> Vec<T2>
where
F: Fn(&TreeNode<T>) -> T2,
{
let mut res = Vec::new();
res.push(f(self));
for child in &self.children {
// workaround for infinite recursion
let f = &f as &dyn Fn(&TreeNode<T>) -> T2;
res.extend(child.borrow_mut().dfs(f));
}
res
}
}
impl Solvable for Solver {
fn solve(self) -> (usize, usize) {
let root: Rc<RefCell<TreeNode<usize>>> = Rc::new(RefCell::new(TreeNode::new("")));
let mut cur_path = Vec::new();
cur_path.push(Rc::clone(&root));
// each $ represents new command input / output group
for cmds in self.content.split("$ ") {
let cur = &cur_path[cur_path.len() - 1];
let lines: Vec<&str> = cmds.trim().lines().collect();
if lines.is_empty() {
continue;
}
let cmd: Vec<&str> = lines[0].split(" ").collect();
if cmd[0] == "ls" {
// handles `ls` commands
for &line in &lines[1..] {
let (prefix, dir) = line.split_once(" ").expect(SPLIT_ERR);
let child = cur.borrow_mut().insert_and_find(dir);
match prefix {
"dir" => {
cur.borrow_mut().insert_and_find(dir);
}
_ => {
let prefix = prefix.parse().expect(PARSE_ERR);
child.borrow_mut().val(prefix);
}
}
}
} else {
// handles `cd` commands
match cmd[1] {
".." => {
cur_path.pop();
}
dir => {
let child = cur.borrow_mut().insert_and_find(dir);
cur_path.push(child);
}
}
}
}
let func = |node: &TreeNode<usize>| (node.val.is_some(), node.sum());
let binding = root.borrow_mut().dfs(func);
let vals = binding
.iter()
.filter(|(is_file, _)| !is_file)
.map(|(_, sum)| sum);
// part 1
let part1 = vals.clone().filter(|&sum| *sum <= 100000).sum();
// part 2
let freeup = root.borrow_mut().sum() - 40000000;
let part2 = vals
.clone()
.filter(|&sum| *sum >= freeup)
.min()
.expect(ITER_ERR);
(part1, *part2)
}
}
fn main() {
let solver = Solver {
content: include_str!("../input").to_string(),
};
let (part1, part2) = solver.solve();
println!("Part 1: {part1:?}");
println!("Part 2: {part2:?}");
}
If there are any suggestions on how to simplify this code, or how to make it more Rust-like, then it would be great.
In particular, the implementation for
dfs
just seems... weird, especially the "hack" with&f as &dyn ...
to avoid infinite recursion at compile time.Also, is it possible to modify
dfs
such that it returns sayVec<(&TreeNode<T>, T2)>
or similar? I tried doing it naively by replaceres.push(f(self))
withres.push((self, f(self)))
, but it fails horribly with tons of lifetime problems.Another question - is it encouraged to use
match
anditer.method().method().method()
chains over other options likeif let
? It seems that especially when usingmatch
, the indentation level goes quite deep.Finally, are there other options to
Rc<RefCell<...>>
? It is working great but keeping track of the types are annoying.
3 Answers 3
In particular, the implementation for dfs just seems... weird, especially the "hack" with &f as &dyn ... to avoid infinite recursion at compile time.
Let's take a look:
fn dfs<F, T2>(&self, f: F) -> Vec<T2>
where
F: Fn(&TreeNode<T>) -> T2,
{
let mut res = Vec::new();
res.push(f(self));
for child in &self.children {
// workaround for infinite recursion
let f = &f as &dyn Fn(&TreeNode<T>) -> T2;
res.extend(child.borrow_mut().dfs(f));
}
res
}
The fundamental problem here is that dfs takes ownership of f
, but you don't want to pass ownership of f
when calling it recursively. The easiest solution is to never take ownership. Specifically, take a borrow of f
:
fn dfs<F, T2>(&self, f: &F) -> Vec<T2>
where
F: Fn(&TreeNode<T>) -> T2,
{
let mut res = Vec::new();
res.push(f(self));
for child in &self.children {
res.extend(child.borrow_mut().dfs(f));
}
res
}
This will require updating calling sites to pass a borrow.
However, there is another problem with this implementation. It will create and destroy a lot of Vecs. It'd be more efficient to put everything directly into a single target vec. I'd split it into two functions:
fn dfs_impl<F, T2>(&self, res: &mut Vec<T2>, f: &F)
where
F: Fn(&TreeNode<T>) -> T2,
{
res.push(f(self));
for child in &self.children {
child.borrow_mut().dfs_impl(res, f);
}
}
fn dfs<F, T2>(&self, f: F) -> Vec<T2>
where
F: Fn(&TreeNode<T>) -> T2,
{
let mut res = Vec::new();
self.dfs_impl(&mut res, &f);
res
}
Another question - is it encouraged to use match and iter.method().method().method() chains over other options like if let? It seems that especially when using match, the indentation level goes quite deep.
In cases where there is a binary decisions (if this else that) then I recommend using if let over a match. Matches makes sense if you have several different paths to choose from. I also usually prefer if let over methods but that's more situations specific.
Also, is it possible to modify dfs such that it returns say Vec<(&TreeNode, T2)> or similar? I tried doing it naively by replace res.push(f(self)) with res.push((self, f(self))), but it fails horribly with tons of lifetime problems.
As it stands, you shouldn't. Your code works with Rc<RefCell<
and so you should return that and not simple borrows. You can do this by defining static methods that operate on Rc<RefCell<
.
fn dfs_impl<F, T2>(me: &Rc<RefCell<Self>>, res: &mut Vec<(Rc<RefCell<Self>>, T2)>, f: &F)
where
F: Fn(&TreeNode<T>) -> T2,
{
res.push((Rc::clone(me), f(&me.borrow())));
for child in &me.borrow().children {
TreeNode::dfs_impl(&child, res, f);
}
}
fn dfs<F, T2>(me: &Rc<RefCell<Self>>, f: F) -> Vec<(Rc<RefCell<Self>>, T2)>
where
F: Fn(&TreeNode<T>) -> T2,
{
let mut res = Vec::new();
TreeNode::dfs_impl(me, &mut res, &f);
res
}
But this leads us to your last question.
Finally, are there other options to Rc<RefCell<...>>? It is working great but keeping track of the types are annoying.
The alternative is to use the types directly. Instead of Vec<Rc<RefCell<TreeNode>>
just use Vec<TreeNode>
. If you just do that with your current code you'll end up in a morass of lifetime errors. The particular problem that you'll find is cur_path
in Solver::solve
. Right now it is a vector of Rc
s which is fine, but a Vec of mutable borrows to different parts of the same tree structure and that won't work very well since Rust's borrow checker can't ensure that you use that safely.
One approach would be reimplement the solution recursively. In particular, anytime you need to move down a layer in the tree, you should call a method on that tree node to take care of it. It should return when it leaves tree (i.e. sees a cd ..) so that the directory above can continue processing.
Another approach would be to use indexes instead of references. The idea here is that your tree structure has methods to manipulate the tree and takes indexes to determine which node is being manipulated instead of operating on references or Rc
s. This avoids issues with the borrow checker and avoids the annoyance of dealing with Rc<RefCell
. The drawback is that Rust borrow checker can't ensure that you use the indexes correctly.
-
\$\begingroup\$ Thank you for your detailed feedback. The trick(? I think it's quite standard, I have seen it before) with
dfs
callingdfs_impl
is pretty nice, I will keep that in mind! Also FYI, you have a "something like: " without code following, but I think I get what you mean. As for your second suggested approach of using indices, does it mean e.g.struct Node
storingVec<usize>
as it's children instead, and having an additionalstruct NodeStorage
storingVec<Node>
? Which I imagine is similar to howtyped_arena
does it, which I used in my answer. Thanks once again! \$\endgroup\$ketsi– ketsi2022年12月11日 03:43:22 +00:00Commented Dec 11, 2022 at 3:43 -
\$\begingroup\$ Oh, and another question. The reason why I implemented my
Solver::solve
as it is right now (maintaining a list ofcur_path
) is that I need to traverse to the parent node. Hence, I am thinking if there is a way to add aparent
reference to the definition ofNode
? As it is right now, I am pretty sure it would be a no, since that would meanroot
referencingchild
referencingnode
. Is it possible though? \$\endgroup\$ketsi– ketsi2022年12月11日 04:01:57 +00:00Commented Dec 11, 2022 at 4:01 -
\$\begingroup\$ @GarethMa, Yes, a plausible approach would be to have each node storing a
Vec<usize>
of children which would index into aVec<TreeNode>
on a baseTree
struct. Its kinda similar to whattyped_arena
does except that it just uses indexes everywhere instead of handing out references. \$\endgroup\$Winston Ewert– Winston Ewert2022年12月11日 04:03:52 +00:00Commented Dec 11, 2022 at 4:03 -
\$\begingroup\$ @GarethMa, Parent node references are possible. You can use weak pointers (see std::Rc::Weak), or indexes, or references from
typed_arena
. Or you can write the algorithm in a way that doesn't require navigating up like that. For example, the recursive approach avoids it because the pointers to parent nodes are held further up the stack. \$\endgroup\$Winston Ewert– Winston Ewert2022年12月11日 04:07:48 +00:00Commented Dec 11, 2022 at 4:07
To simplify the code, you can remove the unused ITER_ERR
, PARSE_ERR
, and SPLIT_ERR
constants and the Debug
trait bounds for the NodeValTrait
trait. You can also remove the lines
variable and directly iterate over the cmds
split by "$ "
. You can also use match statements instead of if statements for the cmd[0] comparisons.
-
\$\begingroup\$ Thank you for your answer. However, do you have any suggestions on the 4 questions I posed? I spent most of my time thinking about that instead of looking at the other parts of my code :" \$\endgroup\$ketsi– ketsi2022年12月08日 12:10:53 +00:00Commented Dec 8, 2022 at 12:10
Here is a temporary solution that I settled with (in particular for question 2 and 4), as well as some self-reviews after learning Rust for 3 more days!
Most of my reviews are already mentioned by Winston Ewert in his answer. However, (削除) some (削除ここまで) one other coding style review I have is:
Prefer if-else if-else
for few branches with long codes
For example, the code
match prefix {
"dir" => {
cur.borrow_mut().insert_and_find(dir);
}
_ => {
let prefix = prefix.parse().expect(PARSE_ERR);
child.borrow_mut().val(prefix);
}
}
Only has two branches, and should be replaced by
if prefix == "dir" {
// ...
} else {
// ...
}
This way, the intention of the code is clearer, and overall reduces indentation.
And now to answer my own questions:
Question 1: Answered by Winston Ewert - essentially, pass in &f
instead so that dfs
does not take ownership of f
.
Question 2: One option is to use an arena
, such as the typed-arena
crate or just a Vec<TreeNode>
. They function as a local memory allocators. Whenever a TreeNode
allocation is required, the arena
can do that and return a reference, either a direct &TreeNode
reference or an uint
index to an internal buffer. This also means the lifetime of nodes are tied to the arena instead of to their parents. Further discussion under Winston Ewert's answer.
Question 3: Answered above.
Question 4: Answered by Winston Ewert - essentially, use TreeNode<T>
directly and rewrite the algorithms around the data structure, or use the arena method mentioned above.
You can see my implementation at this commit.
-
3\$\begingroup\$ If you asked for review of that code, I'd have to object to using unsafe code to get around Rusts rules. \$\endgroup\$Winston Ewert– Winston Ewert2022年12月10日 16:07:32 +00:00Commented Dec 10, 2022 at 16:07
-
\$\begingroup\$ @WinstonEwert Hmm, is using unsafe code really not recommended? I thought it would be fine here especially since all unsafe code is within the struct impl i.e. the user does not have to write any
unsafe {node.method()}
when calling methods on the TreeNode. I will read your detailed answer now :) \$\endgroup\$ketsi– ketsi2022年12月11日 03:20:23 +00:00Commented Dec 11, 2022 at 3:20 -
4\$\begingroup\$ @GarethMa, yes, using unsafe code is pretty heavily discouraged for most Rust development. Many crates make a big point of how they don't contain any unsafe code. There are exceptions (for example if implementing a data structure) but generally the practice is to avoid unsafe unless you really have to use it. \$\endgroup\$Winston Ewert– Winston Ewert2022年12月11日 04:10:05 +00:00Commented Dec 11, 2022 at 4:10