-
Notifications
You must be signed in to change notification settings - Fork 193
improve replace performance take 2 (this time I have confirmed substantial improvements in the profiler)#1051
improve replace performance take 2 (this time I have confirmed substantial improvements in the profiler) #1051jedwards1211 wants to merge 1 commit intoatom:master from
Conversation
There was one test failure I haven't figured out. I do believe that the profiling results still give an accurate impression of what the speedup would be, though obviously I would need to fix the test failure if it's decided to approve this new approach.
jedwards1211
commented
Sep 11, 2018
Is there some kind of bug with the V8 profiler Atom is using? I don't understand why it's splitting up the one call to replaceAll into a dozen apparently separate calls in the timeline view.
maxbrunsfeld
commented
Sep 11, 2018
Hey, nice job on this investigation @jedwards1211! This is a useful starting point.
When I changed replace to use plain JS string operations and then called setTextInBufferRange once to replace the entire buffer text at the end, the same 60000 replacements took about 5.5 seconds
I get the entire text of the buffer, build up the replacement buffer text with native JS string operations, and replace the entire buffer text
I think it's important that we use individual setTextInRange calls for two reasons:
- We need to update cursors, selections, and other markers according to the deltas, rather than losing all of their positions.
- Atom can deal with files that are too large to fit in JavaScript strings. We need to avoid algorithms that require manifesting the entire buffer's text as a String.
The problem is that the BufferSearch's search results are also stored as markers, and there is a cost associated with updating markers when editing the buffer. I think that what replaceAll should do is:
- Get the ranges of every marker.
- Destroy the results marker layer, so that it won't need to be updated during subsequent
setTextInRangecalls. - Iterate the ranges from step 1 in reverse, and update the buffer. Iterating the ranges in reverse is important so that as we update the buffer, the subsequent ranges don't become stale.
- Recompute the marker layer by calling
findAndMarkAllInRangeSync, because for a regex search, it's possible that replacing all the matches will create new matches.
☝️ All of this only applies to when replacing all markers though. When replacing just a single marker, we probably want to keep things as they are. Maybe we need to separate those two code paths instead of having them call a common method.
jedwards1211
commented
Sep 11, 2018
Yeah I figured there are good reasons we wouldn't want to operate on the whole buffer as plain strings.
Do you think anything could be gained from adding the ability to set text in multiple ranges with a single call to text-buffer? Especially considering that there might be many replacements within a given chunk of text (if my understanding of the basic structure of the text buffer is correct)
And any idea why splice is taking so long?
Also
Atom can deal with files that are too large to fit in JavaScript strings. We need to avoid algorithms that require manifesting the entire buffer's text as a String.
That's good...for such a file though, assuming Atom is also capable of handling too many markers to fit in memory, do you think it would make more sense to do the replacements first and rebuild the markers later?
jedwards1211
commented
Oct 8, 2018
@maxbrunsfeld sorry to bother again but do you have any idea why splice would take so long?
@jedwards1211 splice is expensive because it's called a huge number of times and each time, it's updating a huge number of markers - the markers that represent all the search results. But we don't need to update these markers because we're about to invalidate them all by replacing the text that they're marking. That's why I made the suggestion above.
jedwards1211
commented
Oct 8, 2018
Oh I see. What about other kinds of markers though -- I'm guessing error underlines are another marker layer that would slow down calls to splice even if the search results markers are destroyed?
And probably git diff plugins use markers as well?
jedwards1211
commented
Oct 8, 2018
For this reason I'm wondering if we can get even better performance by devising a way to pass multiple changes to the text buffer at once, so that we only have to recompute the markers for each chunk of text once, rather than once for each marker being replaced within that chunk.
maxbrunsfeld
commented
Oct 8, 2018
Yes those features all use markers as well. It should be fine though; updating markers is already really efficient; in this case there’s just a huge amount of markers that don’t need to be updated.
Well it's a bit absurd to imagine, but if for whatever reason there were 60000 diff markers, it would cause the same performance issue as 60000 search result markers, right?
I'd like to eventually try to make a PR to solve this but I'd prefer to fix the problem once and for all...
Uh oh!
There was an error while loading. Please reload this page.
@maxbrunsfeld I did some more digging on this. The way changes are communicated to the marker layer doesn't seem to be the main culprit in my profiling results; instead I'm seeing
splicecalls from withinsetTextInBufferRangetaking the most time:image
I haven't figured out why each
spliceoperation is taking from 0.5 - 1 millsecond, but it means that if 60000 markers get replaced, thesplices consume a combined total of around 45 seconds, which is indeed at least how long it took for the editor to respond (I stopped the profiler after about 40 seconds).When I changed
replaceto use plain JS string operations and then calledsetTextInBufferRangeonce to replace the entire buffer text at the end, the same 60000 replacements took about 5.5 seconds, of which at least 4 seconds are probably mostly due to the destroy marker notifications.image
Description of the Change
Instead of calling
getTextInBufferRangeandsetTextInBufferRangefor each individual replacement, I get the entire text of the buffer, build up the replacement buffer text with native JS string operations, and replace the entire buffer text with a single call tosetTextInBufferRangeat the end.Alternate Designs
This is really just an experiment right now to contribute to the discussion of
text-bufferperformanceBenefits
In my profiling test case (replacing "foo" with "hello" in a buffer with 60000 lines of "foo bar baz qux")
The time to replace is reduced from over 45 seconds to 5.5 seconds.
Possible Drawbacks
Unfortunately, replacing the entire buffer text clears the selection, so I doubt you will want to merge this PR as-is.
Applicable Issues
#653