Python 2.7 - find and replace from text file, using dictionary, to new text file

后端 未结 3 850
别跟我提以往
别跟我提以往 2021-01-05 17:27

I am newbie to programming, and have been studying python in my spare time for the past few months. I decided I was going to try and create a little script that converts Ame

相关标签:
3条回答
  • 2021-01-05 17:53

    As all the good answers above, I wrote a new version which I think is more pythonic, wish this helps:

    # imported dictionary contains 1800 english:american spelling key:value pairs.
    mydict = {
        'color': 'colour',
    }
    
    
    def replace_all(text, mydict):
        for english, american in mydict.iteritems():
            text = text.replace(american, english)
        return text
    
    try:
        with open('new_output.txt', 'w') as new_file:
            with open('test_file.txt', 'r') as f:
                for line in f:
                    new_line = replace_all(line, mydict)
                    new_file.write(new_line)
    except:
        print "Can't open file!"
    

    Also you can see the answer I asked before, it contains many best practice advices: Loading large file (25k entries) into dict is slow in Python?

    Here is a few other tips about how to write python more python:) http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html

    Good luck:)

    0 讨论(0)
  • 2021-01-05 18:01

    The print statement adds a newline of its own, but your lines already have their own newlines. You can either strip the newline from your new_line, or use the lower-level

    output.write(new_line)
    

    instead (which writes exactly what you pass to it).

    For your second question, I think we need an actual example. replace() should indeed replace all occurrences.

    >>> "abc abc abcd ab".replace("abc", "def")
    'def def defd ab'
    

    I'm not sure what your third question is asking. If you want to replace the output file, do

    output = open('output_test_file.txt', 'w')
    

    'w' means you're opening the file for writing.

    0 讨论(0)
  • 2021-01-05 18:14

    The extra blank line you are seeing is because you are using print to write out a line that already includes a newline character at the end. Since print writes its own newline too, your output becomes double spaced. An easy fix is to use outfile.write(new_line) instead.

    As for the file modes, the issue is that you're opening the output file over and over. You should just open it once, at the start. Its usually a good idea to use with statements to handle opening files, since they'll take care of closing them for you when you're done with them.

    I don't undestand your other issue, with only some of the replacements happening. Is your dictionary missing the spellings for 'analyze' and 'utilize'?

    One suggestion I'd make is to not do your replacements line by line. You can read the whole file in at once with file.read() and then work on it as a single unit. This will probably be faster, since it won't need to loop as often over the items in your spelling dictionary (just once, rather than once per line):

    with open('test_file.txt', 'r') as in_file:
        text = in_file.read()
    
    with open('output_test_file.txt', 'w') as out_file:
        out_file.write(replace_all(text, spelling_dict))
    

    Edit:

    To make your code correctly handle words that contain other words (like "entire" containing "tire"), you probably need to abandon the simple str.replace approach in favor of regular expressions.

    Here's a quickly thrown together solution that uses re.sub, given a dictionary of spelling changes from American to British English (that is, in the reverse order of your current dictionary):

    import re
    
    #from english_american_dictionary import ame_to_bre_spellings
    ame_to_bre_spellings = {'tire':'tyre', 'color':'colour', 'utilize':'utilise'}
    
    def replacer_factory(spelling_dict):
        def replacer(match):
            word = match.group()
            return spelling_dict.get(word, word)
        return replacer
    
    def ame_to_bre(text):
        pattern = r'\b\w+\b'  # this pattern matches whole words only
        replacer = replacer_factory(ame_to_bre_spellings)
        return re.sub(pattern, replacer, text)
    
    def main():
        #with open('test_file.txt') as in_file:
        #    text = in_file.read()
        text = 'foo color, entire, utilize'
    
        #with open('output_test_file.txt', 'w') as out_file:
        #    out_file.write(ame_to_bre(text))
        print(ame_to_bre(text))
    
    if __name__ == '__main__':
        main()
    

    One nice thing about this code structure is that you can easily convert from British English spellings back to American English ones, if you pass a dictionary in the other order to the replacer_factory function.

    0 讨论(0)
提交回复
热议问题