How to find the mime type of a file in python?

前端未结

关注

 19  967

Let\'s say you want to save a bunch of files somewhere, for instance in BLOBs. Let\'s say you want to dish these files out via a web page and have the client automatically o

相关标签:

19条回答

长发绾君心

2020-11-22 15:41
I 've tried a lot of examples but with Django mutagen plays nicely.

Example checking if files is mp3
```
from mutagen.mp3 import MP3, HeaderNotFoundError  

try:
    audio = MP3(file)
except HeaderNotFoundError:
    raise ValidationError('This file should be mp3')
```
The downside is that your ability to check file types is limited, but it's a great way if you want not only check for file type but also to access additional information.
0 讨论(0)
发布评论:

提交评论
- 加载中...

有刺的猬

2020-11-22 15:42

This seems to be very easy

>>> from mimetypes import MimeTypes
>>> import urllib 
>>> mime = MimeTypes()
>>> url = urllib.pathname2url('Upload.xml')
>>> mime_type = mime.guess_type(url)
>>> print mime_type
('application/xml', None)

Please refer Old Post

Update - In python 3+ version, it's more convenient now:

import mimetypes
print(mimetypes.guess_type("sample.html"))

0 讨论(0)

逝去的感伤

2020-11-22 15:44

I'm surprised that nobody has mentioned it but Pygments is able to make an educated guess about the mime-type of, particularly, text documents.

Pygments is actually a Python syntax highlighting library but is has a method that will make an educated guess about which of 500 supported document types your document is. i.e. c++ vs C# vs Python vs etc

import inspect

def _test(text: str):
    from pygments.lexers import guess_lexer
    lexer = guess_lexer(text)
    mimetype = lexer.mimetypes[0] if lexer.mimetypes else None
    print(mimetype)

if __name__ == "__main__":
    # Set the text to the actual defintion of _test(...) above
    text = inspect.getsource(_test)
    print('Text:')
    print(text)
    print()
    print('Result:')
    _test(text)

Output:

Text:
def _test(text: str):
    from pygments.lexers import guess_lexer
    lexer = guess_lexer(text)
    mimetype = lexer.mimetypes[0] if lexer.mimetypes else None
    print(mimetype)


Result:
text/x-python

Now, it's not perfect, but if you need to be able to tell which of 500 document formats are being used, this is pretty darn useful.

0 讨论(0)

闹比i

2020-11-22 15:46

In Python 3.x and webapp with url to the file which couldn't have an extension or a fake extension. You should install python-magic, using

pip3 install python-magic

For Mac OS X, you should also install libmagic using

brew install libmagic

Code snippet

import urllib
import magic
from urllib.request import urlopen

url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.readline())
print(mime_type)

alternatively you could put a size into the read

import urllib
import magic
from urllib.request import urlopen

url = "http://...url to the file ..."
request = urllib.request.Request(url)
response = urlopen(request)
mime_type = magic.from_buffer(response.read(128))
print(mime_type)

0 讨论(0)

生来不讨喜

2020-11-22 15:47
2017 Update

No need to go to github, it is on PyPi under a different name:
```
pip3 install --user python-magic
# or:
sudo apt install python3-magic  # Ubuntu distro package
```
The code can be simplified as well:
```
>>> import magic

>>> magic.from_file('/tmp/img_3304.jpg', mime=True)
'image/jpeg'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-11-22 15:47
@toivotuo 's method worked best and most reliably for me under python3. My goal was to identify gzipped files which do not have a reliable .gz extension. I installed python3-magic.
```
import magic

filename = "./datasets/test"

def file_mime_type(filename):
    m = magic.open(magic.MAGIC_MIME)
    m.load()
    return(m.file(filename))

print(file_mime_type(filename))
```
for a gzipped file it returns: application/gzip; charset=binary

for an unzipped txt file (iostat data): text/plain; charset=us-ascii

for a tar file: application/x-tar; charset=binary

for a bz2 file: application/x-bzip2; charset=binary

and last but not least for me a .zip file: application/zip; charset=binary
0 讨论(0)
发布评论:

提交评论
- 加载中...