-
-
Notifications
You must be signed in to change notification settings - Fork 17
Uploading file takes more than expected #126
-
Hi there
First, thanks for this awesome plugin. Works like a charm and is (almost) exactly what I need.
So maybe I'm understanding something wrong of need to configure something but I have the following scenario:
- I have a simple upload form where the user can upload a big file (let's say up to a couple of hundred MBs)
- The upload in via django-s3file and the progress loader works like a charm
- Then the POST method is called at the end of the JS direct upload to S3
- My code receives the temporary file and on save transforms the key that then is stored in the DB
The part of my code looks like this to handle the upload:
form = UploadForm(request.POST, request.FILES)
if form.is_valid():
upload = request.FILES['upload']
obj = Obj()
obj.save()
obj.upload = upload # The upload_to needs to get the object id, so it's done in a second save call
obj.save()
So this works. But the thing is that the last step 4) apparently takes more time at obj.upload = upload
, the bigger the file is (a couple of hundred MBs took already something like 20-30s). So the user when they click "upload" see the upload progress but then have to wait still quite some time until the view actually loaded. I assume that this is because the file is "moved" on S3. But it probably is not a move but a copy, which would explain that the bigger the file is it takes longer.
So, somehow I this can't the the expected behaviour as it does upload the file directly to S3 but the advantage is only partially as the user now has to wait for the upload almost twice (upload + the time that it takes to copy the file).
Is this intended like that? Are there any good workaround or can I configure something in django-s3file or django-storages that the saving doesn't take that much time?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment 5 replies
-
Hi @christiankf,
Thanks for reaching out. This is an interesting case. I believe the delay you are experience is due to the app server loading the file from S3 and maybe even doing something to it. What causes the file to be pulled into your applications memory depends on your objects model. If you'd share that with me, I might be able to help. If this happens to be an ImageField
and you use dimension fields, that could be one cause, but you could have other behavior that might cause all this.
I hope that helps you a bit.
Best,
Joe
Beta Was this translation helpful? Give feedback.
All reactions
-
BTW, I hope you app server is in the same AWS data center as your S3 bucket. If those are on different GEO locations, the IO can become painfully slow.
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi Joe
Thanks for the response! (btw: I'm the same person as the post above; just used the wrong login).
I played around further and found the main issue (I believe). The thing was that the save command of AWSS3Storage
actually copies the file through the server making the call (I haven't checked the network but it took about the same amount of time like the upload itself). I fixed it for my case that it is actually using the boto copy
command instead of the upload_fileobj
(including a seek operation) command.
So in general the save
method does what it's supposed to do and it wouldn't make a difference if the file would not be there already but since your library does directly upload it, I kind of expected a move
command somewhere. But S3 doesn't even have such a command so I hoped copying a bucket does it in a way that doesn't require to stream the file content through my server and it seems to work like that.
So all in all I now use an own subclass for the storage like that and it works as expected.
from storages.backends.s3boto3 import S3Boto3Storage
class AWSS3Storage(S3Boto3Storage): # pylint: disable=abstract-method
default_acl = 'private'
file_overwrite = True
custom_domain = False
def _save(self, name, content):
# Basically copy the implementation of _save of S3Boto3Storage
# and replace the obj.upload_fileobj with a copy function
cleaned_name = self._clean_name(name)
name = self._normalize_name(cleaned_name)
params = self._get_write_parameters(name, content)
if (self.gzip and # pylint: disable=no-member
params['ContentType'] in self.gzip_content_types and # pylint: disable=no-member
'ContentEncoding' not in params):
content = self._compress_content(content)
params['ContentEncoding'] = 'gzip'
obj = self.bucket.Object(name)
#content.seek(0, os.SEEK_SET) # Disable unnecessary seek operation
#obj.upload_fileobj(content, ExtraArgs=params) # Disable upload function
# Copy the file instead uf uploading
obj.copy({'Bucket': self.bucket.name, 'Key': content.obj.key}, ExtraArgs=params)
return cleaned_name
Of course there's still the possibility that I missed something to configure right with your library but with the own storage implementation it works now for me and it makes sense that those commented out parts actually led to the issue I had. Even if I hadn't done anything else with the file that could cause a download of it (the code above really was all I have done to that point with it).
Regarding your PS: I can't control it too far but I think the same GEO should be doable. Regardless it should only be an issue when initially uploading the file and later basically only URL signing should be quick. There should be no "user facing IO" happening.
Thanks again for your answer, help and of course the neat library!
Best,
Christian
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi @drakon, interesting. That's an excellent find, maybe we should include your solution in this library, what do you think? Would you be interested in providing a patch? Best, Joe
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi Joe, yeah, sure, happy to try it. Honestly I'm not quite an experienced OS contributor but happy to try to do it. Just give me some time (I'm now on holidays) and patience when I put up the PR. :)
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 1
-
Sure, take your time. Consider this a great start to get into OSS. And, I've been doing this for a while, patience is what you need the most ;)
Beta Was this translation helpful? Give feedback.