Unable to read huge (20GB) file from CPython

后端未结

关注

 2  1534

I have some CPython issue that I cannot understand. It all boils down to the fact that using the same code to read small text file works but cannot even read a single line from

相关标签:

2条回答

攒了一身酷

2021-01-22 12:28
```
def my_readline(fh,delim):
    return "".join(iter(lambda:fh.read(1),delim))

f = open(some_file)
line = my_readline(f,"\r")
```
should work if you can at least get .read(1) to work ... but if that doesnt work I dont know that anything will ... maybe use shell commands to split the file into smaller chunks somehow ... but I suspect beroe's answer is the real answer
0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2021-01-22 12:52
Although your "test" only prints one line, that does not mean it is only reading one line from the file. For me in a \r-delimited test file, I also only get one line of output. However if I read each line in using a for loop, it still only prints one line. Or if I try readline() a second time on a multi-line file, it doesn't give any more lines.

Try opening your file with the 'rU' parameter on the same file:
```
f =  open('filename', 'rU')
```
My tests of a file with several lines of \r-delimited text give:
```
f = open('test.txt','r')  # Opening the "wrong" way
for line in f:
    print line
```
Output:
```
abcdef
```
Then with rU:
```
f = open('test.txt','rU')
for line in f:
    print line
```
Output:
```
abcdef

abcdef

abcdef

abcdef

abcdef
```
EDIT: In support of Joran's explanation, this test pretty much shows it to be the case that the entire file is loading and the carriage return character is causing over-printing when you see only one line of output...
```
f = open('test.txt','r')     #  Opening the "wrong" way again
for line in f:
    print "XXX{}YYY".format(line)
```
Output gets overwritten...
```
YYYdefdef
```
0 讨论(0)
发布评论:

提交评论
- 加载中...