Python, read CRLF text file as is, with CRLF

你说的曾经没有我的故事 提交于 2019-12-04 00:29:01

问题


with open(fn, 'rt') as f:
    lines = f.readlines()

This reads CR LF text file (WinXP, Py 2.6) with LF line ends. So lines contain '\n' ends. How to get lines as is:

  • for CRLF file get lines with '\n\r' ends
  • for LF file get lines with '\n' ends

回答1:


Instead of the built-in open() function, use io.open(). This gives you more control over how newlines are handled with the newline argument:

import io

with io.open(fn, 'rt', newline='') as f:
    lines = f.readlines()

Setting newline to the empty string, leaves universal newline support enabled but returns line endings untranslated; you can still use .readlines() to find lines terminated with any of the legal line terminators but the data returned is exactly that found in the file:

On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated.

Emphasis mine.

This is different from opening the file in binary mode, where .readlines() will only split the file on \n characters. For a file with \r line endings or mixed line endings, this means that lines are not going to be split correctly.

Demo:

>>> import io
>>> open('test.txt', 'wb').write('One\nTwo\rThree\r\n')
>>> open('test.txt', 'rb').readlines()
['One\n', 'Two\rThree\r\n']
>>> io.open('test.txt', 'r', newline='').readlines()
[u'One\n', u'Two\r', u'Three\r\n']

Note that io.open() also decodes file contents to unicode values.



来源:https://stackoverflow.com/questions/20350305/python-read-crlf-text-file-as-is-with-crlf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!