Is this a bug? It demonstrates what happens when you use libtiff to extract an image from an open tiff file handle. It works in python 2.x and does not work in python 3.2.3
import os
# any file will work here, since it's not actually loading the tiff
# assuming it's big enough for the seek
filename = "/home/kostrom/git/wiredfool-pillow/Tests/images/multipage.tiff"
def test():
fp1 = open(filename, "rb")
buf1 = fp1.read(8)
fp1.seek(28)
fp1.read(2)
for x in range(16):
fp1.read(12)
fp1.read(4)
fd = os.dup(fp1.fileno())
os.lseek(fd, 28, os.SEEK_SET)
os.close(fd)
# this magically fixes it: fp1.tell()
fp1.seek(284)
expect_284 = fp1.tell()
print ("expected 284, actual %d" % expect_284)
test()
The output which I feel is in error is: expected 284, actual -504
Uncommenting the fp1.tell() does some ... side effect ... which stabilizes the py3 handle, and I don't know why. I'd also appreciate if someone can test other versions of python3.
os.dup
creates a duplicate file descriptor that refers to the same open file description. Therefore, os.lseek(fd, 28, SEEK_SET)
changes the seek position of the file underlying fp1
.
Python's file objects cache the file position to avoid repeated system calls. The side effect of this is that changing the file position without using the file object methods will desynchronize the cached position and the real position, leading to nonsense like you've observed.
Worse yet, because the files are internally buffered by Python, seeking outside the file methods could actually cause the returned file data to be incorrect, leading to corruption or other nasty stuff.
The documentation in bufferedio.c
notes that tell
can be used to reinitialize the cached value:
* The absolute position of the raw stream is cached, if possible, in the
`abs_pos` member. It must be updated every time an operation is done
on the raw stream. If not sure, it can be reinitialized by calling
_buffered_raw_tell(), which queries the raw stream (_buffered_raw_seek()
also does it). To read it, use RAW_TELL().
No, this is not a bug. The Python 3 io
library, which provides you with the file object from an open()
call, gives you a buffered file object. For binary files, you are given a (subclass of) io.BufferedIOBase
.
The Python 2 file object is far more primitive, although you can use the io
library there too.
By seeking at the OS level you are bypassing the buffer and are mucking up the internal state. Generally speaking, as the doctor said to the patient complaining that pinching his skin hurts: don't do that.
If you have a pressing need to do this anyway, at the very least use the underlying raw file object (a subclass of the io.RawIOBase
class) via the io.BufferedIO.raw
attribute:
fp1 = open(filename, "rb").raw
来源:https://stackoverflow.com/questions/25574575/is-this-a-python-3-file-bug