-
Notifications
You must be signed in to change notification settings - Fork 326
Conversation
@newpavlov
newpavlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked into the implementation and the algorithm itself yet, so for now I have only the following comments.
IIUC the main reason to use TTH is to be able to parallelize processing of data blocks. It may be worth to use rayon for it. Also I think it should be possible to implement this algorithm without storing all leaves, i.e. you could collapse the tree and only store number of nodes proportional to the tree height.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If TTH can not be implemented without allocations, then it should be feature-gated, i.e. you should introduce a new feature in Cargo.toml (e.g. tree-hash) and add #[cfg(feature = "tree-hash")] where appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to write it like this:
let s = hex!("deadbeef"); let mut content = Vec::with_capacity(s.len() * N); for _ in 0..N { content.extend(&s); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's embarrassing, I meant to clean that up before submitting the PR. Will do
baodrate
commented
Jul 5, 2023
I think it should be possible to implement this algorithm without storing all leaves, i.e. you could collapse the tree and only store number of nodes proportional to the tree height.
That's a good point. I opted for the easier implementation to get some feedback, but I think the constant-memory version shouldn't be too hard. I would have to give it a input limit. Assuming 2^64B is a reasonable size limit, that's gives a max tree height of 54 (with 1024B blocks) and a minimum space requirement of Tiger::OutputSize * 54 ~= 1.3KiB. Which seems reasonable?
IIUC the main reason to use TTH is to be able to parallelize processing of data blocks. It may be worth to use rayon for it.
I'm a little unsure about this suggestion. Do you mean this library should depend on rayon? Would multi-threading behavior be configurable by the user? Or is this something we would allow the user to handle by e.g. exposing hash_nodes, so the user can calculate the partial trees in parallel and pass the collection of hashes to hash_nodes to get the hash of the full tree?
newpavlov
commented
Jul 5, 2023
I think the constant-memory version shouldn't be too hard.
You can continue to use dynamic allocation, the point is to reduce amount of used memory.
Do you mean this library should depend on rayon?
Yes, but rayon can be optional, disabled-by-default dependency.
Would multi-threading behavior be configurable by the user?
No need for separate configuration knobs, rayon defaults should be fine.
BTW, can you add link(s) for TTH test vector source?
baodrate
commented
Jul 5, 2023
You can continue to use dynamic allocation, the point is to reduce amount of used memory
I've updated the algorithm to reduce memory usage. I've also swapped out Vec for arrayvec as per my previous comment. If this isn't acceptable I can switch back to using Vec.
BTW, can you add link(s) for TTH test vector source?
TTH tests were calculated by hand by hashing individual blocks with the tiger implementation in this repo, and confirmed with rhash and tthsum. I can look for more authoritative tests. But TTH doesn't seem to be well-documented (rather, it seems to be the consensus of various P2P file-sharing software).
Instead of storing every leaf and processing the tree once at the end, process the tree every time a leaf is added, so that a maximum of N nodes need to be stored, for a tree of height N. Then a final walk of the nodes is performed at the end to promote and merge any single-child leaves.
I gave integrating rayon a shot, but I couldn't get a measurable difference in performance. edit: I've profiled the hasher and it does seem like there should be room for rayon to do its thing, I just can't seem to get it to parallelize properly. Will keep working on it
note to self: the blake3 crate (https://github.com/BLAKE3-team/BLAKE3) implements digest and supports (optionally) rayon multithreading
edit: added rayon-multithreading modeled after the blake3 crate (-SIMD stuff) and I still can't beat the performance of the naive implementation. I'm so bad at this dudes 😭
TTH/reference/TigerTreeCore
time: [3.7648 μs 3.7824 μs 3.8038 μs]
thrpt: [514.47 MiB/s 517.37 MiB/s 519.81 MiB/s]
TTH/serial/compress_subtree<Serial>
time: [3.8919 μs 3.9154 μs 3.9416 μs]
thrpt: [496.48 MiB/s 499.81 MiB/s 502.83 MiB/s]
TTH/rayon/compress_subtree<Rayon>/1
time: [20.298 μs 20.656 μs 21.018 μs]
thrpt: [93.110 MiB/s 94.742 MiB/s 96.408 MiB/s]
TTH/rayon/compress_subtree<Rayon>/2
time: [14.996 μs 15.198 μs 15.420 μs]
thrpt: [126.91 MiB/s 128.77 MiB/s 130.49 MiB/s]
TTH/rayon/compress_subtree<Rayon>/4
time: [3.6958 μs 3.7084 μs 3.7219 μs]
thrpt: [525.79 MiB/s 527.71 MiB/s 529.51 MiB/s]
TTH/rayon/compress_subtree<Rayon>/8
time: [3.7162 μs 3.7411 μs 3.7685 μs]
thrpt: [519.29 MiB/s 523.10 MiB/s 526.59 MiB/s]
TTH/rayon/compress_subtree<Rayon>/16
time: [3.6799 μs 3.6958 μs 3.7153 μs]
thrpt: [526.73 MiB/s 529.50 MiB/s 531.80 MiB/s]
Implements the Tiger Tree Hash algorithm as mentioned here.
I'm a rust newbie, so implementation (and general Rust) advice is welcome.
The digest crate doesn't seem to be happy with block sizes > 128, but TTH operates on 1024K data blocks, so I emulated it internally. Doesn't feel like the best way to do this but I couldn't figure out a more elegant method.
I didn't attempt any optimization so performance isn't great (although much better than I expected), approx 20-30% slower than https://github.com/rhash/RHash.