Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Optimize WebFlux multipart upload performance #35366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xyraclius wants to merge 4 commits into spring-projects:main
base: main
Choose a base branch
Loading
from xyraclius:34651

Conversation

Copy link

@xyraclius xyraclius commented Aug 21, 2025
edited
Loading

🚀 Overview

This PR improves performance in WebFlux multipart upload processing by optimizing how AbstractNestedMatcher scans DataBuffer instances for delimiters.


🔥 Motivation

Multipart uploads in WebFlux currently suffer from slower performance compared to Spring MVC, especially with large files. A significant bottleneck was found in the delimiter matching logic, which processed buffers one byte at a time and caused unnecessary overhead.


🔧 Changes

  • Reworked AbstractNestedMatcher to use:
    • (削除) A thread-local buffer (LOCAL_BUFFER) to avoid per-call allocations. (削除ここまで)
      Replaced with an instance-local buffer (localBuffer) to simplify buffer management.
    • Chunk-based scanning instead of byte-by-byte iteration.
  • Extracted helper methods (processChunk, findNextCandidate,
    updateMatchIndex) to reduce complexity and improve readability.
  • Preserved existing API and behavior for backward compatibility.
  • Added clear Javadoc to explain usage and limitations.

✅ Benefits

  • Reduced allocation overhead during large multipart uploads.
  • Improved throughput when scanning large DataBuffer instances.
  • Better maintainability through clearer structure and documentation.
  • No breaking changes. Existing API and behavior remain intact.
  • No new tests required. Code is already fully covered (100%) by
    existing unit tests.

📈 Performance Impact

  • 200MB file via MultipartFile (Spring MVC): ~700 ms
  • 200MB file via FilePart (WebFlux): ~4.1 s
  • 200MB file via PartEvent (WebFlux): ~4.2 s

After this change, large multipart uploads in WebFlux no longer suffer from excessive overhead in delimiter scanning.

📊 Benchmark Results

Before Optimization

curl -w "Elapsed: %{time_total}s\n" \
 -F file="@200mb.pdf" \
 http://localhost:8080/file-part \
 -o /dev/null -s
# Elapsed: 4.177509s
curl -w "Elapsed: %{time_total}s\n" \
 -F file="@200mb.pdf" \
 http://localhost:8080/part-event \
 -o /dev/null -s
# Elapsed: 4.224732s

After Optimization

curl -w "Elapsed: %{time_total}s\n" \
 -F file="@200mb.pdf" \
 http://localhost:8080/file-part \
 -o /dev/null -s
# Elapsed: 0.437359s
curl -w "Elapsed: %{time_total}s\n" \
 -F file="@200mb.pdf" \
 http://localhost:8080/part-event \
 -o /dev/null -s
# Elapsed: 0.317705s
#Test 19.53GB file
curl -w "\nElapsed: %{time_total}s\n" \
 -F file="@game.rar" \
 http://localhost:8080/file-part \
 -o /dev/null -s
# Elapsed: 27.586772s
curl -w "Elapsed: %{time_total}s\n" \
 -F file="@game.rar" \
 http://localhost:8080/part-event \
 -o /dev/null -s
# Elapsed: 18.328275s
#Test Multiple files
curl -w "Elapsed: %{time_total}s\n" \
 -F file="@200mb.pdf" \
 -F file="@game.rar" \
 http://localhost:8080/file-part \
 -o /dev/null -s
# Elapsed: 23.678042s
curl -w "Elapsed: %{time_total}s\n" \
 -F file="@200mb.pdf" \
 -F file="@game.rar" \
 http://localhost:8080/part-event \
 -o /dev/null -s
# Elapsed: 17.001363s

Example

@PostMapping(path = "/file-part", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
public Mono<Void> uploadFilePart(@RequestPart("file") FilePart file) {
 log.info("Received file part request {}", file.filename());
 return Mono.empty();
}
@PostMapping(path = "/part-event", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
public Mono<Void> uploadPartEvent(@RequestBody Flux<PartEvent> allPartsEvents) {
 return allPartsEvents.windowUntil(PartEvent::isLast)
 .concatMap(p -> p.switchOnFirst((signal, partEvents) -> {
 if (signal.hasValue()) {
 PartEvent event = signal.get();
 if (event instanceof FormPartEvent formEvent) {
 return Mono.error(new RuntimeException("Unexpected event: " + event));
 } else if (event instanceof FilePartEvent fileEvent) {
 log.info("Received part event request {}", fileEvent.filename());
 return partEvents
 .ofType(FilePartEvent.class)
 .doOnNext(file -> DataBufferUtils.release(file.content()))
 .then(Mono.empty());
 } else {
 return Mono.error(new RuntimeException("Unexpected event: " + event));
 }
 } else {
 return partEvents;
 }
 })).then();
}

Related Issue

Closes #34651


Checklist

Improve AbstractNestedMatcher by using a thread-local buffer and chunked
scanning to reduce allocations and speed up multipart boundary detection.
Closes spring-projectsgh-34651
Signed-off-by: Nabil Fawwaz Elqayyim <master@nabilfawwaz.com>
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Aug 21, 2025
Copy link
Member

bclozel commented Aug 21, 2025

Thanks for raising this @xyraclius , but I'm not sure the approach is valid.

For WebFlux applications, there is no assumption about the processing of a single request. Unlike Servlet applications where the processing of a request is happening on a single thread, reactive apps can schedule work on many different threads. Isn't using a ThreadLocal likely to create concurrency issues if many multipart requests happen in parallel?

@bclozel bclozel added the status: waiting-for-feedback We need additional information before we can continue label Aug 21, 2025
Copy link
Author

Hi @bclozel,

I initially used ThreadLocal to reduce per-call allocations and improve CPU/memory usage. That said, I now understand that in WebFlux a single request can run across multiple threads, so using ThreadLocal could be unsafe.

The safest approach would be to switch to a per-request local buffer, like:

final byte[] chunk = new byte[8 * 1024];

This avoids any concurrency issues while keeping allocations reasonable. I can implement this change and test the performance to make sure we maintain the improvements.

bclozel and sdeleuze reacted with thumbs up emoji

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Aug 21, 2025
@bclozel bclozel added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Aug 21, 2025
Copy link
Contributor

Yeah please refine accordingly and we will review it.

Copy link
Author

Hi @bclozel @sdeleuze,
Refined the change to use a per-instance buffer instead of ThreadLocal, as suggested.
✅ Verified correctness and perf still look good.

Please let me know if further adjustments are needed. Thanks for the guidance! 🙏

sdeleuze reacted with thumbs up emoji

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Aug 22, 2025
- Replace ThreadLocal buffer with a per-instance reusable buffer
- Improves memory locality and reduces ThreadLocal overhead
- Update Javadoc for clarity, performance notes, and subclassing guidance
Closes spring-projectsgh-34651
Signed-off-by: Nabil Fawwaz Elqayyim <master@nabilfawwaz.com>
@bclozel bclozel self-assigned this Aug 25, 2025
Copy link
Contributor

@DhruvTheDev1 DhruvTheDev1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is great to see the performance benchmarks clearly showing the significant improvements in WebFlux multipart upload performance especially for large files. The real-world file tests also show the optimisation can handle large-scale uploads seamlessly.

Also nice on making the PR so clear and easy to follow! It's evident that a lot of thought went into it.

xyraclius reacted with thumbs up emoji
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers
1 more reviewer

@DhruvTheDev1 DhruvTheDev1 DhruvTheDev1 left review comments

Reviewers whose approvals may not affect merge requirements
Labels
status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged or decided on
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Slow webflux multipart upload

AltStyle によって変換されたページ (->オリジナル) /