tiger: implement TTH algorithm#494

Draft

baodrate wants to merge 2 commits into

RustCrypto:master from

baodrate:tth

Draft

tiger: implement TTH algorithm #494
baodrate wants to merge 2 commits into
RustCrypto:master from
baodrate:tth

Conversation

@baodrate

@baodrate baodrate commented Jul 4, 2023

Copy link

Copy Markdown

Implements the Tiger Tree Hash algorithm as mentioned here.

I'm a rust newbie, so implementation (and general Rust) advice is welcome.

The digest crate doesn't seem to be happy with block sizes > 128, but TTH operates on 1024K data blocks, so I emulated it internally. Doesn't feel like the best way to do this but I couldn't figure out a more elegant method.

I didn't attempt any optimization so performance isn't great (although much better than I expected), approx 20-30% slower than https://github.com/rhash/RHash.

@baodrate


 tiger: implement TTH hash algorithm

55b0596

newpavlov

newpavlov reviewed

Jul 5, 2023

View reviewed changes

@newpavlov newpavlov left a comment

Copy link

Copy Markdown

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into the implementation and the algorithm itself yet, so for now I have only the following comments.

IIUC the main reason to use TTH is to be able to parallelize processing of data blocks. It may be worth to use rayon for it. Also I think it should be possible to implement this algorithm without storing all leaves, i.e. you could collapse the tree and only store number of nodes proportional to the tree height.

tiger/src/lib.rs Outdated

#![forbid(unsafe_code)]

#![warn(missing_docs, rust_2018_idioms)]

extern crate alloc;

@newpavlov newpavlov Jul 5, 2023

Copy link

Copy Markdown

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If TTH can not be implemented without allocations, then it should be feature-gated, i.e. you should introduce a new feature in Cargo.toml (e.g. tree-hash) and add #[cfg(feature = "tree-hash")] where appropriate.

tiger/tests/mod.rs

#[test]

fn tth_two_blocks() {

let content = hex!("deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef");

@newpavlov newpavlov Jul 5, 2023

Copy link

Copy Markdown

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to write it like this:

let s = hex!("deadbeef");
let mut content = Vec::with_capacity(s.len() * N);
for _ in 0..N {
 content.extend(&s);
}

@baodrate baodrate Jul 5, 2023

Copy link

Copy Markdown

Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's embarrassing, I meant to clean that up before submitting the PR. Will do

@baodrate

baodrate commented Jul 5, 2023

Copy link

Copy Markdown

Author

I think it should be possible to implement this algorithm without storing all leaves, i.e. you could collapse the tree and only store number of nodes proportional to the tree height.

That's a good point. I opted for the easier implementation to get some feedback, but I think the constant-memory version shouldn't be too hard. I would have to give it a input limit. Assuming 2^64B is a reasonable size limit, that's gives a max tree height of 54 (with 1024B blocks) and a minimum space requirement of Tiger::OutputSize * 54 ~= 1.3KiB. Which seems reasonable?

IIUC the main reason to use TTH is to be able to parallelize processing of data blocks. It may be worth to use rayon for it.

I'm a little unsure about this suggestion. Do you mean this library should depend on rayon? Would multi-threading behavior be configurable by the user? Or is this something we would allow the user to handle by e.g. exposing hash_nodes, so the user can calculate the partial trees in parallel and pass the collection of hashes to hash_nodes to get the hash of the full tree?

@newpavlov

newpavlov commented Jul 5, 2023

Copy link

Copy Markdown

Member

I think the constant-memory version shouldn't be too hard.

You can continue to use dynamic allocation, the point is to reduce amount of used memory.

Do you mean this library should depend on rayon?

Yes, but rayon can be optional, disabled-by-default dependency.

Would multi-threading behavior be configurable by the user?

No need for separate configuration knobs, rayon defaults should be fine.

BTW, can you add link(s) for TTH test vector source?

@baodrate

baodrate commented Jul 5, 2023

Copy link

Copy Markdown

Author

You can continue to use dynamic allocation, the point is to reduce amount of used memory

I've updated the algorithm to reduce memory usage. I've also swapped out Vec for arrayvec as per my previous comment. If this isn't acceptable I can switch back to using Vec.

BTW, can you add link(s) for TTH test vector source?

TTH tests were calculated by hand by hashing individual blocks with the tiger implementation in this repo, and confirmed with rhash and tthsum. I can look for more authoritative tests. But TTH doesn't seem to be well-documented (rather, it seems to be the consensus of various P2P file-sharing software).

@baodrate


 tiger: avoid alloc for calculating TTH

a3555de

Instead of storing every leaf and processing the tree once at the end,
process the tree every time a leaf is added, so that a maximum of N
nodes need to be stored, for a tree of height N. Then a final walk of
the nodes is performed at the end to promote and merge any single-child
leaves.

@baodrate baodrate force-pushed the tth branch from f7034fe to a3555de Compare

July 5, 2023 23:05

@baodrate

baodrate commented Jul 5, 2023 •

edited

Loading

Copy link

Copy Markdown

Author

I gave integrating rayon a shot, but I couldn't get a measurable difference in performance. edit: I've profiled the hasher and it does seem like there should be room for rayon to do its thing, I just can't seem to get it to parallelize properly. Will keep working on it

note to self: the blake3 crate (https://github.com/BLAKE3-team/BLAKE3) implements digest and supports (optionally) rayon multithreading

edit: added rayon-multithreading modeled after the blake3 crate (-SIMD stuff) and I still can't beat the performance of the naive implementation. I'm so bad at this dudes 😭

TTH/reference/TigerTreeCore
	time: [3.7648 μs 3.7824 μs 3.8038 μs]
	thrpt: [514.47 MiB/s 517.37 MiB/s 519.81 MiB/s]
TTH/serial/compress_subtree<Serial>
	time: [3.8919 μs 3.9154 μs 3.9416 μs]
	thrpt: [496.48 MiB/s 499.81 MiB/s 502.83 MiB/s]
TTH/rayon/compress_subtree<Rayon>/1
	time: [20.298 μs 20.656 μs 21.018 μs]
	thrpt: [93.110 MiB/s 94.742 MiB/s 96.408 MiB/s]
TTH/rayon/compress_subtree<Rayon>/2
	time: [14.996 μs 15.198 μs 15.420 μs]
	thrpt: [126.91 MiB/s 128.77 MiB/s 130.49 MiB/s]
TTH/rayon/compress_subtree<Rayon>/4
	time: [3.6958 μs 3.7084 μs 3.7219 μs]
	thrpt: [525.79 MiB/s 527.71 MiB/s 529.51 MiB/s]
TTH/rayon/compress_subtree<Rayon>/8
	time: [3.7162 μs 3.7411 μs 3.7685 μs]
	thrpt: [519.29 MiB/s 523.10 MiB/s 526.59 MiB/s]
TTH/rayon/compress_subtree<Rayon>/16
	time: [3.6799 μs 3.6958 μs 3.7153 μs]
	thrpt: [526.73 MiB/s 529.50 MiB/s 531.80 MiB/s]

@baodrate baodrate marked this pull request as draft

July 18, 2023 20:02

Labels

None yet

2 participants

@baodrate @newpavlov

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tiger: implement TTH algorithm#494

tiger: implement TTH algorithm #494
baodrate wants to merge 2 commits into
RustCrypto:master from
baodrate:tth

Conversation

@baodrate baodrate commented Jul 4, 2023

Uh oh!

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

Uh oh!

@newpavlov newpavlov Jul 5, 2023

Choose a reason for hiding this comment

Uh oh!

@newpavlov newpavlov Jul 5, 2023

Choose a reason for hiding this comment

Uh oh!

@baodrate baodrate Jul 5, 2023

Choose a reason for hiding this comment

Uh oh!

baodrate commented Jul 5, 2023

Uh oh!

newpavlov commented Jul 5, 2023

Uh oh!

baodrate commented Jul 5, 2023

Uh oh!

baodrate commented Jul 5, 2023 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

@baodrate baodrate commented Jul 4, 2023

Uh oh!

@newpavlov newpavlov left a comment

Choose a reason for hiding this comment

Uh oh!

@newpavlov newpavlov Jul 5, 2023

Choose a reason for hiding this comment

Uh oh!

@newpavlov newpavlov Jul 5, 2023

Choose a reason for hiding this comment

Uh oh!

@baodrate baodrate Jul 5, 2023

Choose a reason for hiding this comment

Uh oh!

baodrate commented Jul 5, 2023

Uh oh!

newpavlov commented Jul 5, 2023

Uh oh!

baodrate commented Jul 5, 2023

Uh oh!

baodrate commented Jul 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

baodrate commented Jul 5, 2023 •

edited

Loading