Stream big uploaded files directly to the file system · pallets/quart · Discussion #399

martinkirch
Dec 19, 2024

I'm writing an application that lets an user upload big files: more than 10Mb, often more that 100Mb, at most 1Gb.

I noticed that using request.files puts the whole file in memory. Not only this consumes a lot of memory, it also blocks the worker while it writes the files to disk. The quart-uploads extension does not provide a better method, as it relies on await request.files too. My expectations for an async framework were that it should be able to write such big files directly to disk, while they're uploaded. FastAPI has UploadFile, for example.

Using another multipart parser, I came up with the following solution. It essentially relies on async for chunk in quart.request.body to write each chunk to disk if it belongs to an uploaded file (that happens at current_file.write(result)).

from multipart import PushMultipartParser, MultipartSegment, parse_options_header, MultiDict
async def stream_form(base_path:Path=None) -> Coroutine[None, None, MultiDict]:
 header = quart.request.headers.get('Content-Type')
 content_type, options = parse_options_header(header)
 boundary = options['boundary']
 form_fields = MultiDict()
 current_form_field = None
 current_file = None
 with ExitStack() as stack:
 parser = PushMultipartParser(boundary, strict=True)
 stack.enter_context(parser)
 async for chunk in quart.request.body:
 for result in parser.parse(chunk):
 if isinstance(result, MultipartSegment):
 if result.filename:
 target_path = base_path / result.filename
 current_file = open(target_path, 'wb')
 stack.enter_context(current_file)
 form_fields[result.name] = {
 'filename': result.filename,
 'content_type': result.content_type,
 'uploaded_to': target_path,
 }
 else:
 current_form_field = result.name
 elif result: # Result is a non-empty bytearray
 if current_form_field:
 form_fields[current_form_field] = result.decode()
 else:
 current_file.write(result)
 else: # end of segment
 current_form_field = None
 current_file = None
 if parser.closed:
 break
 return form_fields

(Excerpt from the complete app)

Using this function, even while sending 800Mb the worker could still process 75 read-update DB requests per second on another endpoint (otherwise it handles ~80req/s), consuming less than 100Mb of memory.

This function could be more generic by letting the caller provide a function to compute target_path. I'm wondering if that could be an interesting addition to Quart itself ? A new extension ? or just too specific ?

In any case: thanks for Quart ! It's the only framework that lets me implement a very weird architecture :)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream big uploaded files directly to the file system #399

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

martinkirch
Dec 19, 2024

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Stream big uploaded files directly to the file system #399

Uh oh!

Uh oh!

martinkirch Dec 19, 2024

Replies: 0 comments

martinkirch
Dec 19, 2024