Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Alternative optimisations for insertion #980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
HuwCampbell wants to merge 4 commits into haskell:master
base: master
Choose a base branch
Loading
from HuwCampbell:perf/lazy-insert-ptr-removal

Conversation

Copy link

@HuwCampbell HuwCampbell commented Jan 11, 2024
edited
Loading

This is to open the conversation. I haven't looked at the strict version or gone 100% at this stage.

I'm not convinced that the (unsafe) optimisations for identical pointers is worth the complexity, and it hurts performance for inserts of new elements and the replacement of elements compared to this version.

Here, instead of checking pointer equality, we instead pass upwards whether a rebalance is required.

Old
 insert absent: OK (5.11s)
 309 μs ± 9.5 μs
 insert present: OK (2.05s)
 249 μs ± 21 μs
 insert alternate: OK (2.09s)
 253 μs ± 21 μs
 insertWith absent: OK (2.61s)
 313 μs ± 16 μs
 insertWith present: OK (2.16s)
 260 μs ± 22 μs
New
 insert absent: OK (2.90s)
 351 μs ± 14 μs
 insert present: OK (3.20s)
 194 μs ± 16 μs
 insert alternate: OK (3.18s)
 192 μs ± 11 μs
 insertWith absent: OK (1.43s)
 345 μs ± 31 μs
 insertWith present: OK (1.66s)
 200 μs ± 14 μs

So inserting when the key is missing (common) or a new item at the same key (also common), are faster, while only literally inserting the same identical item (i.e., a no-op, probably less common) is slower.

I've redone the benchmarks, adding some extra calls to rnf in the suite. It looks like insert is about 22% faster if the key is matched and 14% slower if it's not.

I've also done the same trick for insertWith, where it's closer to 25% faster and 10% slower respectively.

For insertR, I've instead manually built the continuation, and just return the top level map if the key is found. I'm not sure if this helps much at this stage; union uses it, but it doesn't lean on it too hard.

The current logic uses unsafe pointer equality to determine if
a rebalance is required, but this case is niche, and we can instead
pass this information upwards.
This performs well in benchmarks.
@HuwCampbell HuwCampbell changed the title (削除) Perf/lazy insert ptr removal (削除ここまで) (追記) Alternative optimisations for insertion, (追記ここまで) Jan 11, 2024
@HuwCampbell HuwCampbell changed the title (削除) Alternative optimisations for insertion, (削除ここまで) (追記) Alternative optimisations for insertion (追記ここまで) Jan 11, 2024
The pointer equality is used to check if we can not do anything when
we retraverse back up the chain, but, if we turn it around and use
an explicit continuation instead, we can just choose to not run it.
Copy link
Contributor

treeowl commented Jan 12, 2024

I'm surprised you're getting good results with explicit continuations. Do you know how GHC is transforming those?

Copy link
Author

HuwCampbell commented Jan 12, 2024
edited
Loading

Yeah I couldn't really see any differences in the union benchmarks (maybe one was marginally faster?). From trying an explicit continuation on the other inserts, I expect that it is slower when there is a value inserted.

I haven't looked at the core yet.

I also considered just throwing a sentinel exception if EQ is found, then catching and returning the top map. It would probably be fast, but exceptions for control flow are a bit gross.

(Actually, delimited continuations are in GHC now, could use those).

@@ -84,6 +86,9 @@ main = do
, bench "mapMaybeWithKey" $ whnf (M.mapMaybeWithKey (const maybeDel)) m
, bench "lookupIndex" $ whnf (lookupIndex keys) m
, bench "union" $ whnf (M.union m_even) m_odd
, bench "union_identical" $ whnf (M.union m_even) m_even
, bench "union_sparse" $ whnf (M.union m_even) m_sparse
, bench "union_into_sparse" $ whnf (M.union m_sparse) m_even
Copy link
Author

@HuwCampbell HuwCampbell Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more with m_sparse and m_odd would show if there's a slowdown in insertR.

Copy link
Contributor

treeowl commented Jan 12, 2024 via email

I would expect exceptions to be slow. I think I might have tried? I don't remember for sure. If you try, use unsafeDupablePerformIO to catch the exception. Also consider using raise# and catch# manually, to allow an exception type other than SomeException (this may require reallyUnsafePtrEquality# to determine whether you got the not-present exception).
...
On Thu, Jan 11, 2024, 10:31 PM Huw Campbell ***@***.***> wrote: Yeah I couldn't really see any differences in the union benchmarks (maybe one was marginally faster?). From trying an explicit continuation on the other inserts, I expect that it is slower when there is a value inserted. I haven't looked at the core yet. I also considered just throwing a sentinel exception if EQ is found, then catching and returning the top map. It would probably be fast, but exceptions for control flow are a bit gross. — Reply to this email directly, view it on GitHub <#980 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOOF7I45DIO6Q6DACGP4PLYOCVARAVCNFSM6AAAAABBWC3ULWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGM3TQNRTGA> . You are receiving this because you commented.Message ID: ***@***.***>

Copy link
Author

I'm not as concerned with insertR as insert and insertWith.

I guess the question is do we want a 10-15% sacrifice on insertion when the key in not already present but a 20-30% improvement when it is?

How would that decision be made?

Copy link
Contributor

meooow25 commented Oct 6, 2024
edited
Loading

I'm not convinced that the (unsafe) optimisations for identical pointers is worth the complexity, and it hurts performance for inserts of new elements and the replacement of elements compared to this version.

So inserting when the key is missing (common) or a new item at the same key (also common), are faster, while only literally inserting the same identical item (i.e., a no-op, probably less common) is slower.

This sounds reasonable. The current code seems optimized for a very rare case. We have a comment acknowledging the same:

-- Unlike insertR, we only get sharing here
-- when the inserted value is at the same address
-- as the present value. We try anyway; this condition
-- seems particularly likely to occur in 'union'.

But why is it likely in union? It would happen when unioning two maps with a large overlap in shared entries. That again seems like a rare situation.


There are a few variants of insert we can try:

A. Pointer equality (current)
B. Return a bool which controls rebalancing (this PR)
C. No tricks
D. (削除) Both A and B (削除ここまで) (let's not go there)

Curiously, strict map uses C instead of A:

case compare kx ky of
LT -> balanceL ky y (go kx x l) r
GT -> balanceR ky y l (go kx x r)
EQ -> Bin sz kx x l r

And there are 3 situations that can occur:

  1. It's a new key. C is the best here, A and B waste work.
  2. It's an existing key but the value is new. B is the best here. How likely is this? Perhaps not too unlikely, shared key objects might be around.
  3. It's an existing key and existing value. C is the best here, and B should also help. This seems rather unlikely.

So, there's a bunch of benchmarking involved, if you want to figure this out.
For union benchmarks you can use the set-operations-map, it covers various sets of inputs.
Related to the history of the current code: #315, #416. Though I don't see any new information that is not already documented.
Also, let's set aside insertR for now. That one doesn't seem too problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /