Tree implementation in Rust

Question 1

I am a beginner to Rust, started learning because of Advent of Code 2022. I solved Day 7's problem by implementing a tree data structure and traversing it. TLDR, the problem provides a log of terminal inputs and outputs, consisting of either ls or cd <dir> commands, such as

$ cd /
$ ls
dir a
1 b
$ cd a
$ ls a
2 a
3 b

And here is my solution:

use std::cell::RefCell;
use std::default::Default;
use std::fmt::Debug;
use std::ops::Add;
use std::rc::Rc;
const ITER_ERR: &str = "err: iterator is empty";
const PARSE_ERR: &str = "err: can't parse int";
const SPLIT_ERR: &str = "err: can't split string";
struct Solver {
 content: String,
}
trait Solvable {
 fn solve(self) -> (usize, usize);
}
// Blanket implementation
trait NodeValTrait<T>: Add<Output = T> + Default + Copy + Debug {}
impl<T> NodeValTrait<T> for T where T: Add<Output = T> + Default + Copy + Debug {}
#[derive(Debug)]
struct TreeNode<T: NodeValTrait<T>> {
 val: Option<T>,
 name: String,
 children: Vec<Rc<RefCell<TreeNode<T>>>>,
}
impl<T: NodeValTrait<T>> TreeNode<T> {
 fn new(s: impl ToString) -> TreeNode<T> {
 TreeNode {
 val: None,
 name: s.to_string(),
 children: Vec::new(),
 }
 }
 fn val(&mut self, val: T) {
 self.val = Some(val);
 }
 fn push(&mut self, node: TreeNode<T>) {
 self.children.push(Rc::new(RefCell::new(node)));
 }
 fn insert_and_find(&mut self, name: impl ToString) -> Rc<RefCell<TreeNode<T>>> {
 let name = name.to_string();
 for child in &self.children {
 if child.borrow().name == name {
 return Rc::clone(child);
 }
 }
 let child = TreeNode::new(name);
 self.push(child);
 Rc::clone(&self.children[&self.children.len() - 1])
 }
 fn sum(&self) -> T {
 let mut res: T = Default::default();
 if let Some(val) = &self.val {
 res = res + *val;
 }
 for child in &self.children {
 res = res + child.borrow().sum();
 }
 res
 }
 fn dfs<F, T2>(&self, f: F) -> Vec<T2>
 where
 F: Fn(&TreeNode<T>) -> T2,
 {
 let mut res = Vec::new();
 res.push(f(self));
 for child in &self.children {
 // workaround for infinite recursion
 let f = &f as &dyn Fn(&TreeNode<T>) -> T2;
 res.extend(child.borrow_mut().dfs(f));
 }
 res
 }
}
impl Solvable for Solver {
 fn solve(self) -> (usize, usize) {
 let root: Rc<RefCell<TreeNode<usize>>> = Rc::new(RefCell::new(TreeNode::new("")));
 let mut cur_path = Vec::new();
 cur_path.push(Rc::clone(&root));
 // each $ represents new command input / output group
 for cmds in self.content.split("$ ") {
 let cur = &cur_path[cur_path.len() - 1];
 let lines: Vec<&str> = cmds.trim().lines().collect();
 if lines.is_empty() {
 continue;
 }
 let cmd: Vec<&str> = lines[0].split(" ").collect();
 if cmd[0] == "ls" {
 // handles `ls` commands
 for &line in &lines[1..] {
 let (prefix, dir) = line.split_once(" ").expect(SPLIT_ERR);
 let child = cur.borrow_mut().insert_and_find(dir);
 match prefix {
 "dir" => {
 cur.borrow_mut().insert_and_find(dir);
 }
 _ => {
 let prefix = prefix.parse().expect(PARSE_ERR);
 child.borrow_mut().val(prefix);
 }
 }
 }
 } else {
 // handles `cd` commands
 match cmd[1] {
 ".." => {
 cur_path.pop();
 }
 dir => {
 let child = cur.borrow_mut().insert_and_find(dir);
 cur_path.push(child);
 }
 }
 }
 }
 let func = |node: &TreeNode<usize>| (node.val.is_some(), node.sum());
 let binding = root.borrow_mut().dfs(func);
 let vals = binding
 .iter()
 .filter(|(is_file, _)| !is_file)
 .map(|(_, sum)| sum);
 // part 1
 let part1 = vals.clone().filter(|&sum| *sum <= 100000).sum();
 // part 2
 let freeup = root.borrow_mut().sum() - 40000000;
 let part2 = vals
 .clone()
 .filter(|&sum| *sum >= freeup)
 .min()
 .expect(ITER_ERR);
 (part1, *part2)
 }
}
fn main() {
 let solver = Solver {
 content: include_str!("../input").to_string(),
 };
 let (part1, part2) = solver.solve();
 println!("Part 1: {part1:?}");
 println!("Part 2: {part2:?}");
}

If there are any suggestions on how to simplify this code, or how to make it more Rust-like, then it would be great.

In particular, the implementation for dfs just seems... weird, especially the "hack" with &f as &dyn ... to avoid infinite recursion at compile time.
Also, is it possible to modify dfs such that it returns say Vec<(&TreeNode<T>, T2)> or similar? I tried doing it naively by replace res.push(f(self)) with res.push((self, f(self))), but it fails horribly with tons of lifetime problems.
Another question - is it encouraged to use match and iter.method().method().method() chains over other options like if let? It seems that especially when using match, the indentation level goes quite deep.
Finally, are there other options to Rc<RefCell<...>>? It is working great but keeping track of the types are annoying.

Question 2

In particular, the implementation for dfs just seems... weird, especially the "hack" with &f as &dyn ... to avoid infinite recursion at compile time.

Let's take a look:

fn dfs<F, T2>(&self, f: F) -> Vec<T2>
where
 F: Fn(&TreeNode<T>) -> T2,
{
 let mut res = Vec::new();
 res.push(f(self));
 for child in &self.children {
 // workaround for infinite recursion
 let f = &f as &dyn Fn(&TreeNode<T>) -> T2;
 res.extend(child.borrow_mut().dfs(f));
 }
 res
}

The fundamental problem here is that dfs takes ownership of f, but you don't want to pass ownership of f when calling it recursively. The easiest solution is to never take ownership. Specifically, take a borrow of f:

fn dfs<F, T2>(&self, f: &F) -> Vec<T2>
where
 F: Fn(&TreeNode<T>) -> T2,
{
 let mut res = Vec::new();
 res.push(f(self));
 for child in &self.children {
 res.extend(child.borrow_mut().dfs(f));
 }
 res
}

This will require updating calling sites to pass a borrow.

However, there is another problem with this implementation. It will create and destroy a lot of Vecs. It'd be more efficient to put everything directly into a single target vec. I'd split it into two functions:

fn dfs_impl<F, T2>(&self, res: &mut Vec<T2>, f: &F)
where
 F: Fn(&TreeNode<T>) -> T2,
{
 res.push(f(self));
 for child in &self.children {
 child.borrow_mut().dfs_impl(res, f);
 }
}
fn dfs<F, T2>(&self, f: F) -> Vec<T2>
where
 F: Fn(&TreeNode<T>) -> T2,
{
 let mut res = Vec::new();
 self.dfs_impl(&mut res, &f);
 res
}

Another question - is it encouraged to use match and iter.method().method().method() chains over other options like if let? It seems that especially when using match, the indentation level goes quite deep.

In cases where there is a binary decisions (if this else that) then I recommend using if let over a match. Matches makes sense if you have several different paths to choose from. I also usually prefer if let over methods but that's more situations specific.

Also, is it possible to modify dfs such that it returns say Vec<(&TreeNode, T2)> or similar? I tried doing it naively by replace res.push(f(self)) with res.push((self, f(self))), but it fails horribly with tons of lifetime problems.

As it stands, you shouldn't. Your code works with Rc<RefCell< and so you should return that and not simple borrows. You can do this by defining static methods that operate on Rc<RefCell<.

fn dfs_impl<F, T2>(me: &Rc<RefCell<Self>>, res: &mut Vec<(Rc<RefCell<Self>>, T2)>, f: &F)
where
 F: Fn(&TreeNode<T>) -> T2,
{
 res.push((Rc::clone(me), f(&me.borrow())));
 for child in &me.borrow().children {
 TreeNode::dfs_impl(&child, res, f);
 }
}
fn dfs<F, T2>(me: &Rc<RefCell<Self>>, f: F) -> Vec<(Rc<RefCell<Self>>, T2)>
where
 F: Fn(&TreeNode<T>) -> T2,
{
 let mut res = Vec::new();
 TreeNode::dfs_impl(me, &mut res, &f);
 res
}

But this leads us to your last question.

Finally, are there other options to Rc<RefCell<...>>? It is working great but keeping track of the types are annoying.

The alternative is to use the types directly. Instead of Vec<Rc<RefCell<TreeNode>> just use Vec<TreeNode>. If you just do that with your current code you'll end up in a morass of lifetime errors. The particular problem that you'll find is cur_path in Solver::solve. Right now it is a vector of Rcs which is fine, but a Vec of mutable borrows to different parts of the same tree structure and that won't work very well since Rust's borrow checker can't ensure that you use that safely.

One approach would be reimplement the solution recursively. In particular, anytime you need to move down a layer in the tree, you should call a method on that tree node to take care of it. It should return when it leaves tree (i.e. sees a cd ..) so that the directory above can continue processing.

Another approach would be to use indexes instead of references. The idea here is that your tree structure has methods to manipulate the tree and takes indexes to determine which node is being manipulated instead of operating on references or Rcs. This avoids issues with the borrow checker and avoids the annoyance of dealing with Rc<RefCell. The drawback is that Rust borrow checker can't ensure that you use the indexes correctly.

Question 3

Thank you for your detailed feedback. The trick(? I think it's quite standard, I have seen it before) with dfs calling dfs_impl is pretty nice, I will keep that in mind! Also FYI, you have a "something like: " without code following, but I think I get what you mean. As for your second suggested approach of using indices, does it mean e.g. struct Node storing Vec<usize> as it's children instead, and having an additional struct NodeStorage storing Vec<Node>? Which I imagine is similar to how typed_arena does it, which I used in my answer. Thanks once again!

Question 4

Oh, and another question. The reason why I implemented my Solver::solve as it is right now (maintaining a list of cur_path) is that I need to traverse to the parent node. Hence, I am thinking if there is a way to add a parent reference to the definition of Node? As it is right now, I am pretty sure it would be a no, since that would mean root referencing child referencing node. Is it possible though?

Question 5

@GarethMa, Yes, a plausible approach would be to have each node storing a Vec<usize> of children which would index into a Vec<TreeNode> on a base Tree struct. Its kinda similar to what typed_arena does except that it just uses indexes everywhere instead of handing out references.

Question 6

@GarethMa, Parent node references are possible. You can use weak pointers (see std::Rc::Weak), or indexes, or references from typed_arena. Or you can write the algorithm in a way that doesn't require navigating up like that. For example, the recursive approach avoids it because the pointers to parent nodes are held further up the stack.

Question 7

To simplify the code, you can remove the unused ITER_ERR, PARSE_ERR, and SPLIT_ERR constants and the Debug trait bounds for the NodeValTrait trait. You can also remove the lines variable and directly iterate over the cmds split by "$ ". You can also use match statements instead of if statements for the cmd[0] comparisons.

Question 8

Thank you for your answer. However, do you have any suggestions on the 4 questions I posed? I spent most of my time thinking about that instead of looking at the other parts of my code :"

Question 9

Here is a temporary solution that I settled with (in particular for question 2 and 4), as well as some self-reviews after learning Rust for 3 more days!

Most of my reviews are already mentioned by Winston Ewert in his answer. However, ~~(削除) some (削除ここまで)~~ one other coding style review I have is:

Prefer `if-else if-else` for few branches with long codes

For example, the code

match prefix {
 "dir" => {
 cur.borrow_mut().insert_and_find(dir);
 }
 _ => {
 let prefix = prefix.parse().expect(PARSE_ERR);
 child.borrow_mut().val(prefix);
 }
}

Only has two branches, and should be replaced by

if prefix == "dir" {
 // ...
} else {
 // ...
}

This way, the intention of the code is clearer, and overall reduces indentation.

And now to answer my own questions:

Question 1: Answered by Winston Ewert - essentially, pass in &f instead so that dfs does not take ownership of f.

Question 2: One option is to use an arena, such as the typed-arena crate or just a Vec<TreeNode>. They function as a local memory allocators. Whenever a TreeNode allocation is required, the arena can do that and return a reference, either a direct &TreeNode reference or an uint index to an internal buffer. This also means the lifetime of nodes are tied to the arena instead of to their parents. Further discussion under Winston Ewert's answer.

Question 3: Answered above.

Question 4: Answered by Winston Ewert - essentially, use TreeNode<T> directly and rewrite the algorithms around the data structure, or use the arena method mentioned above.

You can see my implementation at this commit.

Question 10

If you asked for review of that code, I'd have to object to using unsafe code to get around Rusts rules.

Question 11

@WinstonEwert Hmm, is using unsafe code really not recommended? I thought it would be fine here especially since all unsafe code is within the struct impl i.e. the user does not have to write any unsafe {node.method()} when calling methods on the TreeNode. I will read your detailed answer now :)

Question 12

@GarethMa, yes, using unsafe code is pretty heavily discouraged for most Rust development. Many crates make a big point of how they don't contain any unsafe code. There are exceptions (for example if implementing a data structure) but generally the practice is to avoid unsafe unless you really have to use it.

Winston Ewert Winston Ewert 30.7k4 gold badges52 silver badges79 bronze badges · Accepted Answer · 2022-12-10 00:06:04Z

In particular, the implementation for dfs just seems... weird, especially the "hack" with &f as &dyn ... to avoid infinite recursion at compile time.

Let's take a look:

fn dfs<F, T2>(&self, f: F) -> Vec<T2>
where
 F: Fn(&TreeNode<T>) -> T2,
{
 let mut res = Vec::new();
 res.push(f(self));
 for child in &self.children {
 // workaround for infinite recursion
 let f = &f as &dyn Fn(&TreeNode<T>) -> T2;
 res.extend(child.borrow_mut().dfs(f));
 }
 res
}

The fundamental problem here is that dfs takes ownership of f, but you don't want to pass ownership of f when calling it recursively. The easiest solution is to never take ownership. Specifically, take a borrow of f:

fn dfs<F, T2>(&self, f: &F) -> Vec<T2>
where
 F: Fn(&TreeNode<T>) -> T2,
{
 let mut res = Vec::new();
 res.push(f(self));
 for child in &self.children {
 res.extend(child.borrow_mut().dfs(f));
 }
 res
}

This will require updating calling sites to pass a borrow.

However, there is another problem with this implementation. It will create and destroy a lot of Vecs. It'd be more efficient to put everything directly into a single target vec. I'd split it into two functions:

fn dfs_impl<F, T2>(&self, res: &mut Vec<T2>, f: &F)
where
 F: Fn(&TreeNode<T>) -> T2,
{
 res.push(f(self));
 for child in &self.children {
 child.borrow_mut().dfs_impl(res, f);
 }
}
fn dfs<F, T2>(&self, f: F) -> Vec<T2>
where
 F: Fn(&TreeNode<T>) -> T2,
{
 let mut res = Vec::new();
 self.dfs_impl(&mut res, &f);
 res
}

Another question - is it encouraged to use match and iter.method().method().method() chains over other options like if let? It seems that especially when using match, the indentation level goes quite deep.

In cases where there is a binary decisions (if this else that) then I recommend using if let over a match. Matches makes sense if you have several different paths to choose from. I also usually prefer if let over methods but that's more situations specific.

Also, is it possible to modify dfs such that it returns say Vec<(&TreeNode, T2)> or similar? I tried doing it naively by replace res.push(f(self)) with res.push((self, f(self))), but it fails horribly with tons of lifetime problems.

As it stands, you shouldn't. Your code works with Rc<RefCell< and so you should return that and not simple borrows. You can do this by defining static methods that operate on Rc<RefCell<.

fn dfs_impl<F, T2>(me: &Rc<RefCell<Self>>, res: &mut Vec<(Rc<RefCell<Self>>, T2)>, f: &F)
where
 F: Fn(&TreeNode<T>) -> T2,
{
 res.push((Rc::clone(me), f(&me.borrow())));
 for child in &me.borrow().children {
 TreeNode::dfs_impl(&child, res, f);
 }
}
fn dfs<F, T2>(me: &Rc<RefCell<Self>>, f: F) -> Vec<(Rc<RefCell<Self>>, T2)>
where
 F: Fn(&TreeNode<T>) -> T2,
{
 let mut res = Vec::new();
 TreeNode::dfs_impl(me, &mut res, &f);
 res
}

But this leads us to your last question.

Finally, are there other options to Rc<RefCell<...>>? It is working great but keeping track of the types are annoying.

The alternative is to use the types directly. Instead of Vec<Rc<RefCell<TreeNode>> just use Vec<TreeNode>. If you just do that with your current code you'll end up in a morass of lifetime errors. The particular problem that you'll find is cur_path in Solver::solve. Right now it is a vector of Rcs which is fine, but a Vec of mutable borrows to different parts of the same tree structure and that won't work very well since Rust's borrow checker can't ensure that you use that safely.

One approach would be reimplement the solution recursively. In particular, anytime you need to move down a layer in the tree, you should call a method on that tree node to take care of it. It should return when it leaves tree (i.e. sees a cd ..) so that the directory above can continue processing.

Another approach would be to use indexes instead of references. The idea here is that your tree structure has methods to manipulate the tree and takes indexes to determine which node is being manipulated instead of operating on references or Rcs. This avoids issues with the borrow checker and avoids the annoyance of dealing with Rc<RefCell. The drawback is that Rust borrow checker can't ensure that you use the indexes correctly.

Thank you for your detailed feedback. The trick(? I think it's quite standard, I have seen it before) with dfs calling dfs_impl is pretty nice, I will keep that in mind! Also FYI, you have a "something like: " without code following, but I think I get what you mean. As for your second suggested approach of using indices, does it mean e.g. struct Node storing Vec<usize> as it's children instead, and having an additional struct NodeStorage storing Vec<Node>? Which I imagine is similar to how typed_arena does it, which I used in my answer. Thanks once again!
Oh, and another question. The reason why I implemented my Solver::solve as it is right now (maintaining a list of cur_path) is that I need to traverse to the parent node. Hence, I am thinking if there is a way to add a parent reference to the definition of Node? As it is right now, I am pretty sure it would be a no, since that would mean root referencing child referencing node. Is it possible though?
@GarethMa, Yes, a plausible approach would be to have each node storing a Vec<usize> of children which would index into a Vec<TreeNode> on a base Tree struct. Its kinda similar to what typed_arena does except that it just uses indexes everywhere instead of handing out references.
@GarethMa, Parent node references are possible. You can use weak pointers (see std::Rc::Weak), or indexes, or references from typed_arena. Or you can write the algorithm in a way that doesn't require navigating up like that. For example, the recursive approach avoids it because the pointers to parent nodes are held further up the stack.

Stack Exchange Network

Tree implementation in Rust

3 Answers 3

Prefer `if-else if-else` for few branches with long codes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Tree implementation in Rust

3 Answers 3

Prefer if-else if-else for few branches with long codes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Prefer `if-else if-else` for few branches with long codes