Get size of a file before downloading in Python

前端 未结 8 1230
名媛妹妹
名媛妹妹 2020-12-02 06:56

I\'m downloading an entire directory from a web server. It works OK, but I can\'t figure how to get the file size before download to compare if it was updated on the server

相关标签:
8条回答
  • 2020-12-02 07:16

    @PabloG Regarding the local/server filesize difference

    Following is high-level illustrative explanation of why it may occur:

    The size on disk sometimes is different from the actual size of the data. It depends on the underlying file-system and how it operates on data. As you may have seen in Windows when formatting a flash drive you are asked to provice 'block/cluster size' and it varies [512b - 8kb]. When a file is written on the disk, it is storled in a 'sort-of linked list' of disk blocks. When a certain block is used to store part of a file, no other file contents will be stored in the same blok, so even if the chunk is no occupuing the entire block space, the block is rendered unusable by other files.

    Example: When the filesystem is divided on 512b blocks, and we need to store 600b file, two blocks will be occupied. The first block will be fully utilized, while the second block will have only 88b utilized and the remaining (512-88)b will be unusable resulting in 'file-size-on-disk' being 1024b. This is why Windows has different notations for 'file size' and 'size on disk'.

    NOTE: There are different pros & cons that come with smaller/bigger FS block, so do a better research before playing with your filesystem.

    0 讨论(0)
  • 2020-12-02 07:21

    The size of the file is sent as the Content-Length header. Here is how to get it with urllib:

    >>> site = urllib.urlopen("http://python.org")
    >>> meta = site.info()
    >>> print meta.getheaders("Content-Length")
    ['16535']
    >>>
    
    0 讨论(0)
  • 2020-12-02 07:25

    I have reproduced what you are seeing:

    import urllib, os
    link = "http://python.org"
    print "opening url:", link
    site = urllib.urlopen(link)
    meta = site.info()
    print "Content-Length:", meta.getheaders("Content-Length")[0]
    
    f = open("out.txt", "r")
    print "File on disk:",len(f.read())
    f.close()
    
    
    f = open("out.txt", "w")
    f.write(site.read())
    site.close()
    f.close()
    
    f = open("out.txt", "r")
    print "File on disk after download:",len(f.read())
    f.close()
    
    print "os.stat().st_size returns:", os.stat("out.txt").st_size
    

    Outputs this:

    opening url: http://python.org
    Content-Length: 16535
    File on disk: 16535
    File on disk after download: 16535
    os.stat().st_size returns: 16861
    

    What am I doing wrong here? Is os.stat().st_size not returning the correct size?


    Edit: OK, I figured out what the problem was:

    import urllib, os
    link = "http://python.org"
    print "opening url:", link
    site = urllib.urlopen(link)
    meta = site.info()
    print "Content-Length:", meta.getheaders("Content-Length")[0]
    
    f = open("out.txt", "rb")
    print "File on disk:",len(f.read())
    f.close()
    
    
    f = open("out.txt", "wb")
    f.write(site.read())
    site.close()
    f.close()
    
    f = open("out.txt", "rb")
    print "File on disk after download:",len(f.read())
    f.close()
    
    print "os.stat().st_size returns:", os.stat("out.txt").st_size
    

    this outputs:

    $ python test.py
    opening url: http://python.org
    Content-Length: 16535
    File on disk: 16535
    File on disk after download: 16535
    os.stat().st_size returns: 16535
    

    Make sure you are opening both files for binary read/write.

    // open for binary write
    open(filename, "wb")
    // open for binary read
    open(filename, "rb")
    
    0 讨论(0)
  • 2020-12-02 07:26

    For a python3 (tested on 3.5) approach I'd recommend:

    with urlopen(file_url) as in_file, open(local_file_address, 'wb') as out_file:
        print(in_file.getheader('Content-Length'))
        out_file.write(response.read())
    
    0 讨论(0)
  • 2020-12-02 07:26

    In Python3:

    >>> import urllib.request
    >>> site = urllib.request.urlopen("http://python.org")
    >>> print("FileSize: ", site.length)
    
    0 讨论(0)
  • 2020-12-02 07:40

    Using the returned-urllib-object method info(), you can get various information on the retrived document. Example of grabbing the current Google logo:

    >>> import urllib
    >>> d = urllib.urlopen("http://www.google.co.uk/logos/olympics08_opening.gif")
    >>> print d.info()
    
    Content-Type: image/gif
    Last-Modified: Thu, 07 Aug 2008 16:20:19 GMT  
    Expires: Sun, 17 Jan 2038 19:14:07 GMT 
    Cache-Control: public 
    Date: Fri, 08 Aug 2008 13:40:41 GMT 
    Server: gws 
    Content-Length: 20172 
    Connection: Close
    

    It's a dict, so to get the size of the file, you do urllibobject.info()['Content-Length']

    print f.info()['Content-Length']
    

    And to get the size of the local file (for comparison), you can use the os.stat() command:

    os.stat("/the/local/file.zip").st_size
    
    0 讨论(0)
提交回复
热议问题