Combining inplace filtering and the setting of encoding in the fileinput module

问题

I am attempting to use the fileinput module's inplace filtering feature to rewrite an input file in place.

Needed to set encoding (both for read and write) to latin-1 and attempted to pass openhook=fileinput.hook_encoded('latin-1') to fileinput.input but was thwarted by the error

ValueError: FileInput cannot use an opening hook in inplace mode

Upon closer inspection I see that the fileinput documentation clearly states this: You cannot use inplace and openhook together

How can I get around this?

回答1:

As far as I know, there is no way around this with the fileinput module. You can accomplish the same task with a combination of the codecs module, os.rename(), and os.remove():

import os
import codecs

input_name = 'some_file.txt'
tmp_name = 'tmp.txt'

with codecs.open(input_name, 'r', encoding='latin-1') as fi, \
     codecs.open(tmp_name, 'w', encoding='latin-1') as fo:

    for line in fi:
        new_line = do_processing(line) # do your line processing here
        fo.write(new_line)

os.remove(input_name) # remove original
os.rename(tmp_name, input_name) # rename temp to original name

You also have the option of specifying a new encoding for the output file if you want to change it, or leave it as latin-1 when opening the output file if you don't want it it to change.

I know this isn't the in-place modification you were looking for, but it will accomplish the task you were trying to do and is very flexible.

回答2:

This is very similar to the other answer, just done in function form so that it can be called multiple times with ease:

def inplace(orig_path, encoding='latin-1'):
    """Modify a file in-place, with a consistent encoding."""
    new_path = orig_path + '.modified'
    with codecs.open(orig_path, encoding=encoding) as orig:
        with codecs.open(new_path, 'w', encoding=encoding) as new:
            for line in orig:
                yield line, new
    os.rename(new_path, orig_path)

And this is what it looks like in action:

for line, new in inplace(path):
    line = do_processing(line)  # Use your imagination here.
    new.write(line)

This is valid both as python2 and python3 and Does The Right Thing with your data as long as you specify the correct encoding (in my case I actually needed utf-8 everywhere, but your needs obviously vary).

回答3:

I'm not crazy about the existing solutions using rename/remove, because they oversimplify some of the file handling that the inplace flag does - for example handling the file mode, handling a chmod attribute, etc.

In my case, because I control the environment that my code is going to run in, I decided the only reasonable solution was to set my locale to a UTF8-using locale:

export LC_ALL=en_US.UTF-8

The effect is:

sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "/usr/lib64/python3.6/fileinput.py", line 250, in __next__
    line = self._readline()
  File "/usr/lib64/python3.6/fileinput.py", line 364, in _readline
    return self._readline()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 227: ordinal not in range(128)'

sh-4.2> export LC_ALL=en_US.UTF-8
sh-4.2> python3.6 -c "import fileinput;
for line in fileinput.FileInput('DESCRIPTION', inplace=True): print(line.rstrip() + 'hi')
print('done')"
done

sh-4.2#

The potential side-effects are changes to other file input & output, but I'm not worried about that here.

回答4:

If you don't mind using a pip library, the in_place library supports encoding.

import in_place

with in_place.InPlace(filename, encoding="utf-8") as fp:
  for line in fp:
    fp.write(line)

来源：https://stackoverflow.com/questions/25203040/combining-inplace-filtering-and-the-setting-of-encoding-in-the-fileinput-module

标签

python

file-io

encoding

python-3.2