Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

perf: improve performance of 2018 day 2 part 2 #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
maneatingape merged 4 commits into maneatingape:main from Rolv-Apneseth:2018_day2_p2_perf
Jul 12, 2025

Conversation

@Rolv-Apneseth
Copy link
Contributor

@Rolv-Apneseth Rolv-Apneseth commented Jul 12, 2025

Hi again. This is a quick one for improving the performance of 2018 day 2 part 2 (85μs => 15μs for me). Let me know if you want any changes, and no hard feelings if you don't want the PR at all.

maneatingape reacted with heart emoji
Copy link
Owner

Let me know if you want any changes, and no hard feelings if you don't want the PR at all.

I really appreciate anyone taking the time to make performance improvements or showing how solutions could be better, so all PR are much appreciated!

buffer[0..width].copy_from_slice(id);
buffer[column] = b'*';
let mut diff = false;
for (a, b) in id1.iter().zip(id2) {
Copy link
Owner

@maneatingape maneatingape Jul 12, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a performance regression on my input, 87 μs to 129 μs.

Two nested loops have worst case quadratic complexity O(n2).
Hashing is a constant time operation so the original solution worst case complexity is O(1) * 26 * O(n) = O(n).

So it looks like you're getting lucky, while I'm getting unlucky with the input order.

Copy link
Contributor Author

@Rolv-Apneseth Rolv-Apneseth Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no expert in O(N) notation, but wouldn't the N here be the number of IDs, so it should be lower than O(N^2) (second loop starts at the index of the first loop)?

Anyway, fair enough about the performance being worse. Just luck based that it's so much faster for my input then. I'll close the PR if that's alright with you, but out of curiosity would mind sharing your input too?

Copy link
Owner

@maneatingape maneatingape Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no expert in O(N) notation, but wouldn't the N here be the number of IDs, so it should be lower than O(N^2) (second loop starts at the index of the first loop)?

Say you have 10 ids. First inner loop iteration will check 9 items, next iteration 8 and so on...
9 + 8 + 7 + ... + 2 + 1 = n(n + 1) / 2 = (1/2)(n2 + n)
So you save half the comparisons but it's still overall quadratic.

Anyway, fair enough about the performance being worse. Just luck based that it's so much faster for my input then. I'll close the PR if that's alright with you, but out of curiosity would mind sharing your input too?

If you re-order the elements (for example swap the first and second halves) you can change the performance drastically. With a few random swaps I got benchmarks from 33 μs to 250 μs.

Copy link
Owner

@maneatingape maneatingape Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it might be possible to improve the performance by taking advantage of the special structure of the input (always 26 lower case ASCII characters).

Copy link
Contributor Author

@Rolv-Apneseth Rolv-Apneseth Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you save half the comparisons but it's still overall quadratic.

I seeee, thank you for helping me understand.

If you re-order the elements (for example swap the first and second halves) you can change the performance drastically

Ah yes, can confirm moving one of the solutions can make it much worse. I really did get quite lucky.

I still think it might be possible to improve the performance by taking advantage of the special structure of the input (always 26 lower case ASCII characters).

I'll have a look at it again later and see if I can come up with anything (unless you do first)

Copy link
Owner

BTW I welcome PRs (even if they don't always work out). It's always an interesting discussion and could spark an improvement either in this or another solution.

Copy link
Contributor Author

BTW I welcome PRs (even if they don't always work out). It's always an interesting discussion and could spark an improvement either in this or another solution.

Sure yeah. Thank you for being so understanding.

Copy link
Contributor Author

I tried a couple different (more complicated) approaches without much luck. Went back to the original approach and changed it from storing the ID with a character replaced in the set, to just storing the prefix and suffix around the character, and I believe there is some performance gained from that

Copy link
Contributor Author

If you're interested @maneatingape, it was just a simple change but for me it's giving around 30-45% speed up:

pub fn part2(input: &[&[u8]]) -> String {
 let width = input[0].len();
 let mut seen = FastSet::with_capacity(input.len());
 // Use a set to check for duplicates by comparing the prefix and suffix of IDs excluding one
 // column at a time.
 for column in 0..width {
 for &id in input {
 let prefix = &id[..column];
 let suffix = &id[column + 1..];
 if !seen.insert([prefix, suffix]) {
 // Convert to String
 return prefix.iter().chain(suffix).cloned().map(char::from).collect();
 }
 }
 seen.clear();
 }
 unreachable!()
}

Copy link
Owner

@maneatingape maneatingape left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! Faster and more elegant.

79 μs => 49 μs.

@maneatingape maneatingape merged commit a8a3007 into maneatingape:main Jul 12, 2025
4 of 6 checks passed
Copy link
Contributor Author

Missed that lint - not getting these from clippy locally but yeah cloned -> copied makes sense

@Rolv-Apneseth Rolv-Apneseth deleted the 2018_day2_p2_perf branch July 12, 2025 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@maneatingape maneatingape maneatingape approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /