-
Notifications
You must be signed in to change notification settings - Fork 38.8k
Optimize WebFlux multipart upload performance #35366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Improve AbstractNestedMatcher by using a thread-local buffer and chunked scanning to reduce allocations and speed up multipart boundary detection. Closes spring-projectsgh-34651 Signed-off-by: Nabil Fawwaz Elqayyim <master@nabilfawwaz.com>
Thanks for raising this @xyraclius , but I'm not sure the approach is valid.
For WebFlux applications, there is no assumption about the processing of a single request. Unlike Servlet applications where the processing of a request is happening on a single thread, reactive apps can schedule work on many different threads. Isn't using a ThreadLocal
likely to create concurrency issues if many multipart requests happen in parallel?
Hi @bclozel,
I initially used ThreadLocal to reduce per-call allocations and improve CPU/memory usage. That said, I now understand that in WebFlux a single request can run across multiple threads, so using ThreadLocal could be unsafe.
The safest approach would be to switch to a per-request local buffer, like:
final byte[] chunk = new byte[8 * 1024];
This avoids any concurrency issues while keeping allocations reasonable. I can implement this change and test the performance to make sure we maintain the improvements.
Yeah please refine accordingly and we will review it.
- Replace ThreadLocal buffer with a per-instance reusable buffer - Improves memory locality and reduces ThreadLocal overhead - Update Javadoc for clarity, performance notes, and subclassing guidance Closes spring-projectsgh-34651 Signed-off-by: Nabil Fawwaz Elqayyim <master@nabilfawwaz.com>
73a2f0a
to
1bf232d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is great to see the performance benchmarks clearly showing the significant improvements in WebFlux multipart upload performance especially for large files. The real-world file tests also show the optimisation can handle large-scale uploads seamlessly.
Also nice on making the PR so clear and easy to follow! It's evident that a lot of thought went into it.
Uh oh!
There was an error while loading. Please reload this page.
🚀 Overview
This PR improves performance in WebFlux multipart upload processing by optimizing how
AbstractNestedMatcher
scansDataBuffer
instances for delimiters.🔥 Motivation
Multipart uploads in WebFlux currently suffer from slower performance compared to Spring MVC, especially with large files. A significant bottleneck was found in the delimiter matching logic, which processed buffers one byte at a time and caused unnecessary overhead.
🔧 Changes
AbstractNestedMatcher
to use:(削除) A thread-local buffer (LOCAL_BUFFER
) to avoid per-call allocations. (削除ここまで)Replaced with an instance-local buffer (
localBuffer
) to simplify buffer management.processChunk
,findNextCandidate
,updateMatchIndex
) to reduce complexity and improve readability.✅ Benefits
DataBuffer
instances.existing unit tests.
📈 Performance Impact
MultipartFile
(Spring MVC): ~700 msFilePart
(WebFlux): ~4.1 sPartEvent
(WebFlux): ~4.2 sAfter this change, large multipart uploads in WebFlux
no longer
suffer from excessive overhead in delimiter scanning.📊 Benchmark Results
Before Optimization
After Optimization
Example
Related Issue
Closes #34651
Checklist