How does one use magic to verify file type in a Django form clean method?

后端 未结 5 651
陌清茗
陌清茗 2021-01-13 14:00

I have written an email form class in Django with a FileField. I want to check the uploaded file for its type via checking its mimetype. Subsequently, I want to limit file t

相关标签:
5条回答
  • 2021-01-13 14:04

    Why no trying something like that in your view :

    m = magic.Magic()
    m.from_buffer(request.FILES['my_file_field'].read())
    

    Or use request.FILES in place of form.cleaned_data if django.forms.Form is really not an option.

    0 讨论(0)
  • 2021-01-13 14:05

    You can use django-safe-filefield package to validate that uploaded file extension match it MIME-type.

    from safe_filefield.forms import SafeFileField
    
    class MyForm(forms.Form):
    
        attachment = SafeFileField(
            allowed_extensions=('xls', 'xlsx', 'csv')
        )
    
    0 讨论(0)
  • 2021-01-13 14:16

    In case you're handling a file upload and concerned only about images, Django will set content_type for you (or rather for itself?):

    from django.forms import ModelForm
    from django.core.files import File
    from django.db import models
    class MyPhoto(models.Model):
        photo = models.ImageField(upload_to=photo_upload_to, max_length=1000)
    class MyForm(ModelForm):
        class Meta:
            model = MyPhoto
            fields = ['photo']
    photo = MyPhoto.objects.first()
    photo = File(open('1.jpeg', 'rb'))
    form = MyForm(files={'photo': photo})
    if form.is_valid():
        print(form.instance.photo.file.content_type)
    

    It doesn't rely on content type provided by the user. But django.db.models.fields.files.FieldFile.file is an undocumented property.

    Actually, initially content_type is set from the request, but when the form gets validated, the value is updated.

    Regarding non-images, doing request.FILES['name'].read() seems okay to me. First, that's what Django does. Second, files larger than 2.5 Mb by default are stored on a disk. So let me point you at the other answer here.


    For the curious, here's the stack trace that leads to updating content_type:

    django.forms.forms.BaseForm.is_valid: self.errors
    django.forms.forms.BaseForm.errors: self.full_clean()
    django.forms.forms.BaseForm.full_clean: self._clean_fields()
    django.forms.forms.BaseForm._clean_fiels: field.clean()
    django.forms.fields.FileField.clean: super().clean()
    django.forms.fields.Field.clean: self.to_python()
    django.forms.fields.ImageField.to_python

    0 讨论(0)
  • 2021-01-13 14:23

    Stan described good variant with buffer. Unfortunately the weakness of this method is reading file to the memory. Another option is using temporary stored file:

    import tempfile
    import magic
    with tempfile.NamedTemporaryFile() as tmp:
        for chunk in form.cleaned_data['file'].chunks():
            tmp.write(chunk)
        print(magic.from_file(tmp.name, mime=True))
    

    Also, you might want to check the file size:

    if form.cleaned_data['file'].size < ...:
        print(magic.from_buffer(form.cleaned_data['file'].read()))
    else:
        # store to disk (the code above)
    

    Additionally:

    Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).

    So you might want to handle it like so:

    import os
    tmp = tempfile.NamedTemporaryFile(delete=False)
    try:
        for chunk in form.cleaned_data['file'].chunks():
            tmp.write(chunk)
        print(magic.from_file(tmp.name, mime=True))
    finally:
        os.unlink(tmp.name)
        tmp.close()
    

    Also, you might want to seek(0) after read():

    if hasattr(f, 'seek') and callable(f.seek):
        f.seek(0)
    

    Where uploaded data is stored

    0 讨论(0)
  • 2021-01-13 14:25
    mime = magic.Magic(mime=True)
    
    attachment = form.cleaned_data['attachment']
    
    if hasattr(attachment, 'temporary_file_path'):
        # file is temporary on the disk, so we can get full path of it.
        mime_type = mime.from_file(attachment.temporary_file_path())
    else:
        # file is on the memory
        mime_type = mime.from_buffer(attachment.read())
    

    Also, you might want to seek(0) after read():

    if hasattr(f, 'seek') and callable(f.seek):
        f.seek(0)
    

    Example from Django code. Performed for image fields during validation.

    0 讨论(0)
提交回复
热议问题