问题
I am experiencing what looks like a memory leak with django-storages
using the S3Boto backend, when running default_storage.exists()
I'm following the docs here: http://django-storages.readthedocs.org/en/latest/backends/amazon-S3.html
Here is the relevant part of my settings file:
DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'
Here is what I do to repeat the issue:
./manage.py shell
from django.core.files.storage import default_storage
# Check default storage is right
default_storage.connection
>>> S3Connection:s3.amazonaws.com
# Check I can write to a file
file = default_storage.open('storage_test_2014', 'w')
file.write("does this work?")
file.close()
file2 = default_storage.open('storage_test_2014', 'r')
file2.read()
>>> 'does this work?'
# Run the exists command
default_storage.exists("asdfjkl") # This file doesn't exist - but the same thing happens no matter what I put here - even if I put 'storage_test_2014'
# Memory usage of the python process creeps up over the next 45 seconds, until it nears 100%
# iPython shell then crashes
>>> Killed
The only potential issue I've thought of is that my S3 bucket has 93,000 items in it - I'm wondering if .exists is just downloading the whole list of files in order to check? If this is the case, surely there must be another way? Unfortunately sorl-thumbnail uses this .exists() function when generating a new thumbnail, which causes thumbnail generation to be extremely slow.
回答1:
Update (Jan 23, 2017)
To avoid this, you can simply pass preload_metadata=False
when creating a Storage
, or set AWS_PRELOAD_METADATA = False
in settings.
Thanks @r3mot for this suggestion in the comments.
Original Answer
In fact, it's because S3BotoStorage.exists
makes a call to S3BotoStorage.entries
, which is as follows:
@property
def entries(self):
"""
Get the locally cached files for the bucket.
"""
if self.preload_metadata and not self._entries:
self._entries = dict((self._decode_name(entry.key), entry)
for entry in self.bucket.list(prefix=self.location))
The best way to handle this situation would be to subclass S3BotoStorage
as follows:
from storages.backends.s3boto import S3BotoStorage, parse_ts_extended
class MyS3BotoStorage(S3BotoStorage):
def exists(self, name):
name = self._normalize_name(self._clean_name(name))
k = self.bucket.new_key(self._encode_name(name))
return k.exists()
def size(self, name):
name = self._normalize_name(self._clean_name(name))
return self.bucket.get_key(self._encode_name(name)).size
def modified_time(self, name):
name = self._normalize_name(self._clean_name(name))
k = self.bucket.get_key(self._encode_name(name))
return parse_ts_extended(k.last_modified)
You'll have to just put this subclass in one of your app's modules, and reference it via dotted path in your settings module. The only drawback to this subclass is that each call to any of the 3 overridden methods will result in a web request, which might not be a big deal.
来源:https://stackoverflow.com/questions/21120825/why-does-default-storate-exists-with-django-storages-with-s3boto-backend-cause