问题
I am attempting to use the fileinput
module's inplace filtering feature to rewrite an input file in place.
Needed to set encoding (both for read and write) to latin-1
and attempted to pass openhook=fileinput.hook_encoded('latin-1')
to fileinput.input
but was thwarted by the error
ValueError: FileInput cannot use an opening hook in inplace mode
Upon closer inspection I see that the fileinput
documentation clearly states this: You cannot use inplace and openhook together
How can I get around this?
回答1:
As far as I know, there is no way around this with the fileinput
module. You can accomplish the same task with a combination of the codecs
module, os.rename()
, and os.remove()
:
import os
import codecs
input_name = 'some_file.txt'
tmp_name = 'tmp.txt'
with codecs.open(input_name, 'r', encoding='latin-1') as fi, \
codecs.open(tmp_name, 'w', encoding='latin-1') as fo:
for line in fi:
new_line = do_processing(line) # do your line processing here
fo.write(new_line)
os.remove(input_name) # remove original
os.rename(tmp_name, input_name) # rename temp to original name
You also have the option of specifying a new encoding for the output file if you want to change it, or leave it as latin-1
when opening the output file if you don't want it it to change.
I know this isn't the in-place modification you were looking for, but it will accomplish the task you were trying to do and is very flexible.
回答2:
This is very similar to the other answer, just done in function form so that it can be called multiple times with ease:
def inplace(orig_path, encoding='latin-1'):
"""Modify a file in-place, with a consistent encoding."""
new_path = orig_path + '.modified'
with codecs.open(orig_path, encoding=encoding) as orig:
with codecs.open(new_path, 'w', encoding=encoding) as new:
for line in orig:
yield line, new
os.rename(new_path, orig_path)
And this is what it looks like in action:
for line, new in inplace(path):
line = do_processing(line) # Use your imagination here.
new.write(line)
This is valid both as python2 and python3 and Does The Right Thing with your data as long as you specify the correct encoding (in my case I actually needed utf-8
everywhere, but your needs obviously vary).
回答3:
I'm not crazy about the existing solutions using rename
/remove
, because they oversimplify some of the file handling that the inplace
flag does - for example handling the file mode, handling a chmod
attribute, etc.
In my case, because I control the environment that my code is going to run in, I decided the only reasonable solution was to set my locale to a UTF8-using locale:
export LC_ALL=en_US.UTF-8
The effect is:
sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "/usr/lib64/python3.6/fileinput.py", line 250, in __next__
line = self._readline()
File "/usr/lib64/python3.6/fileinput.py", line 364, in _readline
return self._readline()
File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)'
sh-4.2> export LC_ALL=en_US.UTF-8
sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
done
sh-4.2#
The potential side-effects are changes to other file input & output, but I'm not worried about that here.
回答4:
If you don't mind using a pip library, the in_place
library supports encoding.
import in_place
with in_place.InPlace(filename, encoding="utf-8") as fp:
for line in fp:
fp.write(line)
来源:https://stackoverflow.com/questions/25203040/combining-inplace-filtering-and-the-setting-of-encoding-in-the-fileinput-module