-
-
Notifications
You must be signed in to change notification settings - Fork 203
Stream big uploaded files directly to the file system #399
-
I'm writing an application that lets an user upload big files: more than 10Mb, often more that 100Mb, at most 1Gb.
I noticed that using request.files puts the whole file in memory. Not only this consumes a lot of memory, it also blocks the worker while it writes the files to disk. The quart-uploads extension does not provide a better method, as it relies on await request.files too. My expectations for an async framework were that it should be able to write such big files directly to disk, while they're uploaded. FastAPI has UploadFile, for example.
Using another multipart parser, I came up with the following solution. It essentially relies on async for chunk in quart.request.body to write each chunk to disk if it belongs to an uploaded file (that happens at current_file.write(result)).
from multipart import PushMultipartParser, MultipartSegment, parse_options_header, MultiDict async def stream_form(base_path:Path=None) -> Coroutine[None, None, MultiDict]: header = quart.request.headers.get('Content-Type') content_type, options = parse_options_header(header) boundary = options['boundary'] form_fields = MultiDict() current_form_field = None current_file = None with ExitStack() as stack: parser = PushMultipartParser(boundary, strict=True) stack.enter_context(parser) async for chunk in quart.request.body: for result in parser.parse(chunk): if isinstance(result, MultipartSegment): if result.filename: target_path = base_path / result.filename current_file = open(target_path, 'wb') stack.enter_context(current_file) form_fields[result.name] = { 'filename': result.filename, 'content_type': result.content_type, 'uploaded_to': target_path, } else: current_form_field = result.name elif result: # Result is a non-empty bytearray if current_form_field: form_fields[current_form_field] = result.decode() else: current_file.write(result) else: # end of segment current_form_field = None current_file = None if parser.closed: break return form_fields
(Excerpt from the complete app)
Using this function, even while sending 800Mb the worker could still process 75 read-update DB requests per second on another endpoint (otherwise it handles ~80req/s), consuming less than 100Mb of memory.
This function could be more generic by letting the caller provide a function to compute target_path. I'm wondering if that could be an interesting addition to Quart itself ? A new extension ? or just too specific ?
In any case: thanks for Quart ! It's the only framework that lets me implement a very weird architecture :)
Beta Was this translation helpful? Give feedback.