Is this a python 3 file bug?

半城伤御伤魂 提交于 2019-12-08 18:52:40

os.dup creates a duplicate file descriptor that refers to the same open file description. Therefore, os.lseek(fd, 28, SEEK_SET) changes the seek position of the file underlying fp1.

Python's file objects cache the file position to avoid repeated system calls. The side effect of this is that changing the file position without using the file object methods will desynchronize the cached position and the real position, leading to nonsense like you've observed.

Worse yet, because the files are internally buffered by Python, seeking outside the file methods could actually cause the returned file data to be incorrect, leading to corruption or other nasty stuff.

The documentation in bufferedio.c notes that tell can be used to reinitialize the cached value:

* The absolute position of the raw stream is cached, if possible, in the
  `abs_pos` member. It must be updated every time an operation is done
  on the raw stream. If not sure, it can be reinitialized by calling
  _buffered_raw_tell(), which queries the raw stream (_buffered_raw_seek()
  also does it). To read it, use RAW_TELL().

No, this is not a bug. The Python 3 io library, which provides you with the file object from an open() call, gives you a buffered file object. For binary files, you are given a (subclass of) io.BufferedIOBase.

The Python 2 file object is far more primitive, although you can use the io library there too.

By seeking at the OS level you are bypassing the buffer and are mucking up the internal state. Generally speaking, as the doctor said to the patient complaining that pinching his skin hurts: don't do that.

If you have a pressing need to do this anyway, at the very least use the underlying raw file object (a subclass of the io.RawIOBase class) via the io.BufferedIO.raw attribute:

fp1 = open(filename, "rb").raw
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!