I\'m trying to \"map\" a very large ascii file. Basically I read lines until I find a certain tag and then I want to know the position of that tag so that I can seek to it
To me, it looks like the buffer size is hard-coded in Cpython to be 8192. As far as I can tell, there is no way to get this number from the python interface other than to read a single line when you open the file, do a f.tell()
to figure out how much data python actually read and then seek back to the start of the file before continuing.
with open(datafile) as fin:
next(fin)
bufsize = fin.tell()
fin.seek(0)
ifin = dropwhile(lambda x:not x.startswith('Foo'), fin)
header = next(ifin)
position = fin.tell()
Of course, this fails in the event that the first line is longer than 8192 bytes long, but that's not of any real consequence for my application.