Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Stream big uploaded files directly to the file system #399

martinkirch started this conversation in Ideas
Discussion options

I'm writing an application that lets an user upload big files: more than 10Mb, often more that 100Mb, at most 1Gb.

I noticed that using request.files puts the whole file in memory. Not only this consumes a lot of memory, it also blocks the worker while it writes the files to disk. The quart-uploads extension does not provide a better method, as it relies on await request.files too. My expectations for an async framework were that it should be able to write such big files directly to disk, while they're uploaded. FastAPI has UploadFile, for example.

Using another multipart parser, I came up with the following solution. It essentially relies on async for chunk in quart.request.body to write each chunk to disk if it belongs to an uploaded file (that happens at current_file.write(result)).

from multipart import PushMultipartParser, MultipartSegment, parse_options_header, MultiDict
async def stream_form(base_path:Path=None) -> Coroutine[None, None, MultiDict]:
 header = quart.request.headers.get('Content-Type')
 content_type, options = parse_options_header(header)
 boundary = options['boundary']
 form_fields = MultiDict()
 current_form_field = None
 current_file = None
 with ExitStack() as stack:
 parser = PushMultipartParser(boundary, strict=True)
 stack.enter_context(parser)
 async for chunk in quart.request.body:
 for result in parser.parse(chunk):
 if isinstance(result, MultipartSegment):
 if result.filename:
 target_path = base_path / result.filename
 current_file = open(target_path, 'wb')
 stack.enter_context(current_file)
 form_fields[result.name] = {
 'filename': result.filename,
 'content_type': result.content_type,
 'uploaded_to': target_path,
 }
 else:
 current_form_field = result.name
 elif result: # Result is a non-empty bytearray
 if current_form_field:
 form_fields[current_form_field] = result.decode()
 else:
 current_file.write(result)
 else: # end of segment
 current_form_field = None
 current_file = None
 if parser.closed:
 break
 return form_fields

(Excerpt from the complete app)

Using this function, even while sending 800Mb the worker could still process 75 read-update DB requests per second on another endpoint (otherwise it handles ~80req/s), consuming less than 100Mb of memory.

This function could be more generic by letting the caller provide a function to compute target_path. I'm wondering if that could be an interesting addition to Quart itself ? A new extension ? or just too specific ?

In any case: thanks for Quart ! It's the only framework that lets me implement a very weird architecture :)

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet
1 participant

AltStyle によって変換されたページ (->オリジナル) /