How to find the mime type of a file in python?

前端 未结 19 967
猫巷女王i
猫巷女王i 2020-11-22 15:19

Let\'s say you want to save a bunch of files somewhere, for instance in BLOBs. Let\'s say you want to dish these files out via a web page and have the client automatically o

相关标签:
19条回答
  • 2020-11-22 15:41

    I 've tried a lot of examples but with Django mutagen plays nicely.

    Example checking if files is mp3

    from mutagen.mp3 import MP3, HeaderNotFoundError  
    
    try:
        audio = MP3(file)
    except HeaderNotFoundError:
        raise ValidationError('This file should be mp3')
    

    The downside is that your ability to check file types is limited, but it's a great way if you want not only check for file type but also to access additional information.

    0 讨论(0)
  • 2020-11-22 15:42

    This seems to be very easy

    >>> from mimetypes import MimeTypes
    >>> import urllib 
    >>> mime = MimeTypes()
    >>> url = urllib.pathname2url('Upload.xml')
    >>> mime_type = mime.guess_type(url)
    >>> print mime_type
    ('application/xml', None)
    

    Please refer Old Post

    Update - In python 3+ version, it's more convenient now:

    import mimetypes
    print(mimetypes.guess_type("sample.html"))
    
    0 讨论(0)
  • 2020-11-22 15:44

    I'm surprised that nobody has mentioned it but Pygments is able to make an educated guess about the mime-type of, particularly, text documents.

    Pygments is actually a Python syntax highlighting library but is has a method that will make an educated guess about which of 500 supported document types your document is. i.e. c++ vs C# vs Python vs etc

    import inspect
    
    def _test(text: str):
        from pygments.lexers import guess_lexer
        lexer = guess_lexer(text)
        mimetype = lexer.mimetypes[0] if lexer.mimetypes else None
        print(mimetype)
    
    if __name__ == "__main__":
        # Set the text to the actual defintion of _test(...) above
        text = inspect.getsource(_test)
        print('Text:')
        print(text)
        print()
        print('Result:')
        _test(text)
    

    Output:

    Text:
    def _test(text: str):
        from pygments.lexers import guess_lexer
        lexer = guess_lexer(text)
        mimetype = lexer.mimetypes[0] if lexer.mimetypes else None
        print(mimetype)
    
    
    Result:
    text/x-python
    

    Now, it's not perfect, but if you need to be able to tell which of 500 document formats are being used, this is pretty darn useful.

    0 讨论(0)
  • 2020-11-22 15:46

    In Python 3.x and webapp with url to the file which couldn't have an extension or a fake extension. You should install python-magic, using

    pip3 install python-magic
    

    For Mac OS X, you should also install libmagic using

    brew install libmagic
    

    Code snippet

    import urllib
    import magic
    from urllib.request import urlopen
    
    url = "http://...url to the file ..."
    request = urllib.request.Request(url)
    response = urlopen(request)
    mime_type = magic.from_buffer(response.readline())
    print(mime_type)
    

    alternatively you could put a size into the read

    import urllib
    import magic
    from urllib.request import urlopen
    
    url = "http://...url to the file ..."
    request = urllib.request.Request(url)
    response = urlopen(request)
    mime_type = magic.from_buffer(response.read(128))
    print(mime_type)
    
    0 讨论(0)
  • 2020-11-22 15:47

    2017 Update

    No need to go to github, it is on PyPi under a different name:

    pip3 install --user python-magic
    # or:
    sudo apt install python3-magic  # Ubuntu distro package
    

    The code can be simplified as well:

    >>> import magic
    
    >>> magic.from_file('/tmp/img_3304.jpg', mime=True)
    'image/jpeg'
    
    0 讨论(0)
  • 2020-11-22 15:47

    @toivotuo 's method worked best and most reliably for me under python3. My goal was to identify gzipped files which do not have a reliable .gz extension. I installed python3-magic.

    import magic
    
    filename = "./datasets/test"
    
    def file_mime_type(filename):
        m = magic.open(magic.MAGIC_MIME)
        m.load()
        return(m.file(filename))
    
    print(file_mime_type(filename))
    

    for a gzipped file it returns: application/gzip; charset=binary

    for an unzipped txt file (iostat data): text/plain; charset=us-ascii

    for a tar file: application/x-tar; charset=binary

    for a bz2 file: application/x-bzip2; charset=binary

    and last but not least for me a .zip file: application/zip; charset=binary

    0 讨论(0)
提交回复
热议问题