Python recognizing \r as a line delimiter

元气小坏坏 提交于 2019-12-10 23:53:26

问题


I'm using import fileinput in a Python script running on an Ubuntu box.

I'm running the script on the command line with something along the lines of python myscript.py firstinputfile.txt secondinputfile.txt and inside myscript.py I am using for line in fileinput.input() to iterate over the lines. The problem I'm running into is that firstinputfile.txt and secondinputfile.txt both use Macintosh (\r) line endings, and fileinput.input() does not seem to be recognizing \r as a line delimiter.

Is there any way to force fileinput to recognize \r as a line delimiter?

I've considered preprocessing firstinputfile.txt and secondinputfile.txt to use \n line endings, but am hesitant for two reasons: i) I don't really want to emit additional files to manage and ii) I still want the input to fileinput to come from file arguments (not stdin after piping commands) so I can use fileinput.filename() and fileinput.filelineno().

Any suggestions?


回答1:


It turns out fileinput.input() supports an optional openhook parameter:

You can control how files are opened by providing an opening hook via the openhook parameter to fileinput.input() or FileInput(). The hook must be a function that takes two arguments, filename and mode, and returns an accordingly opened file-like object. Two useful hooks are already provided by this module.

Furthermore, the universal newline support document suggests that a file can be open to support Windows/Unix/Macintosh newlines with the rU mode:

Opening a file with the mode 'U' or 'rU' will open a file for reading in universal newline mode. All three line ending conventions will be translated to a "\n" in the strings returned by the various file methods such as read() and readline().

So, you can write a little function to pass as the openhook argument that will open the file in a manner which supports universal newlines:

def univ_file_read(name, mode):
    # WARNING: ignores mode argument passed to this function
    return open(name, 'rU')

Then, instead of:

for line in fileinput.input():

Use:

for line in fileinput.input(openhook=univ_file_read):

This seems to have done the trick for me, and \r is being recognized as a line delimiter now.



来源:https://stackoverflow.com/questions/13855414/python-recognizing-r-as-a-line-delimiter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!