Python EOF for multi byte requests of file.read()

后端 未结 2 1288
天命终不由人
天命终不由人 2020-12-16 19:23

The Python docs on file.read() state that An empty string is returned when EOF is encountered immediately. The documentation further states:

相关标签:
2条回答
  • 2020-12-16 20:03

    You are not thinking with your snake skin on... Python is not C.

    First, a review:

    • st=f.read() reads to EOF, or if opened as a binary, to the last byte;
    • st=f.read(n) attempts to reads n bytes and in no case more than n bytes;
    • st=f.readline() reads a line at a time, the line ends with '\n' or EOF;
    • st=f.readlines() uses readline() to read all the lines in a file and returns a list of the lines.

    If a file read method is at EOF, it returns ''. The same type of EOF test is used in the other 'file like" methods like StringIO, socket.makefile, etc. A return of less than n bytes from f.read(n) is most assuredly NOT a dispositive test for EOF! While that code may work 99.99% of the time, it is the times it does not work that would be very frustrating to find. Plus, it is bad Python form. The only use for n in this case is to put an upper limit on the size of the return.

    What are some of the reasons the Python file-like methods returns less than n bytes?

    1. EOF is certainly a common reason;
    2. A network socket may timeout on read yet remain open;
    3. Exactly n bytes may cause a break between logical multi-byte characters (such as \r\n in text mode and, I think, a multi-byte character in Unicode) or some underlying data structure not known to you;
    4. The file is in non-blocking mode and another process begins to access the file;
    5. Temporary non-access to the file;
    6. An underlying error condition, potentially temporary, on the file, disc, network, etc.
    7. The program received a signal, but the signal handler ignored it.

    I would rewrite your code in this manner:

    with open(filename,'rb') as f:
        while True:
            s=f.read(max_size)
            if not s: break
    
            # process the data in s...
    

    Or, write a generator:

    def blocks(infile, bufsize=1024):
        while True:
            try:
                data=infile.read(bufsize)
                if data:
                    yield data
                else:
                    break
            except IOError as (errno, strerror):
                print "I/O error({0}): {1}".format(errno, strerror)
                break
    
    f=open('somefile','rb')
    
    for block in blocks(f,2**16):
        # process a block that COULD be up to 65,536 bytes long
    
    0 讨论(0)
  • 2020-12-16 20:20

    Here's what my C compiler's documentation says for the fread() function:

    size_t fread( 
       void *buffer,
       size_t size,
       size_t count,
       FILE *stream 
    );
    

    fread returns the number of full items actually read, which may be less than count if an error occurs or if the end of the file is encountered before reaching count.

    So it looks like getting less than size means either an error has occurred or EOF has been reached -- so breaking out of the loop would be the correct thing to do.

    0 讨论(0)
提交回复
热议问题