问题
I'm using import fileinput
in a Python script running on an Ubuntu box.
I'm running the script on the command line with something along the lines of python myscript.py firstinputfile.txt secondinputfile.txt
and inside myscript.py
I am using for line in fileinput.input()
to iterate over the lines. The problem I'm running into is that firstinputfile.txt
and secondinputfile.txt
both use Macintosh (\r
) line endings, and fileinput.input()
does not seem to be recognizing \r
as a line delimiter.
Is there any way to force fileinput
to recognize \r
as a line delimiter?
I've considered preprocessing firstinputfile.txt
and secondinputfile.txt
to use \n
line endings, but am hesitant for two reasons: i) I don't really want to emit additional files to manage and ii) I still want the input to fileinput
to come from file arguments (not stdin
after piping commands) so I can use fileinput.filename()
and fileinput.filelineno()
.
Any suggestions?
回答1:
It turns out fileinput.input() supports an optional openhook
parameter:
You can control how files are opened by providing an opening hook via the openhook parameter to fileinput.input() or FileInput(). The hook must be a function that takes two arguments, filename and mode, and returns an accordingly opened file-like object. Two useful hooks are already provided by this module.
Furthermore, the universal newline support document suggests that a file can be open to support Windows/Unix/Macintosh newlines with the rU
mode:
Opening a file with the mode 'U' or 'rU' will open a file for reading in universal newline mode. All three line ending conventions will be translated to a "\n" in the strings returned by the various file methods such as read() and readline().
So, you can write a little function to pass as the openhook
argument that will open the file in a manner which supports universal newlines:
def univ_file_read(name, mode):
# WARNING: ignores mode argument passed to this function
return open(name, 'rU')
Then, instead of:
for line in fileinput.input():
Use:
for line in fileinput.input(openhook=univ_file_read):
This seems to have done the trick for me, and \r
is being recognized as a line delimiter now.
来源:https://stackoverflow.com/questions/13855414/python-recognizing-r-as-a-line-delimiter