问题
with open(fn, 'rt') as f:
lines = f.readlines()
This reads CR LF text file (WinXP, Py 2.6) with LF line ends. So lines
contain '\n' ends. How to get lines as is:
- for CRLF file get lines with '\n\r' ends
- for LF file get lines with '\n' ends
回答1:
Instead of the built-in open()
function, use io.open(). This gives you more control over how newlines are handled with the newline
argument:
import io
with io.open(fn, 'rt', newline='') as f:
lines = f.readlines()
Setting newline
to the empty string, leaves universal newline support enabled but returns line endings untranslated; you can still use .readlines()
to find lines terminated with any of the legal line terminators but the data returned is exactly that found in the file:
On input, if newline is
None
, universal newlines mode is enabled. Lines in the input can end in'\n'
,'\r'
, or'\r\n'
, and these are translated into'\n'
before being returned to the caller. If it is''
, universal newlines mode is enabled, but line endings are returned to the caller untranslated.
Emphasis mine.
This is different from opening the file in binary mode, where .readlines()
will only split the file on \n
characters. For a file with \r
line endings or mixed line endings, this means that lines are not going to be split correctly.
Demo:
>>> import io
>>> open('test.txt', 'wb').write('One\nTwo\rThree\r\n')
>>> open('test.txt', 'rb').readlines()
['One\n', 'Two\rThree\r\n']
>>> io.open('test.txt', 'r', newline='').readlines()
[u'One\n', u'Two\r', u'Three\r\n']
Note that io.open()
also decodes file contents to unicode values.
来源:https://stackoverflow.com/questions/20350305/python-read-crlf-text-file-as-is-with-crlf