Python 3.2 Replace all words in a text document that are a certain length?

问题

I need to replace all words in a text document that are of length 4 with a different word.

For example, if a text document contained the phrase "I like to eat very hot soup" the words "like", "very", and "soup" would be replaced with "something"

Then, instead of overwriting the original text document, it needs to create a new one with the changed phrase.

Here is what I have so far:

def replacement():  
    o = open("file.txt","a") #file.txt will be the file containing the changed phrase
    for line in open("y.txt"):  #y.txt is the original file
        line = line.replace("????","something")  #see below
        o.write(line + "\n")
    o.close()

I've tried changing "????" to something like

(str(len(line) == 4)

but that didn't work

回答1:

First lets make a function that returns something if it's given a word of length 4 and the word it was given otherwise:

def maybe_replace(word, length=4):
  if len(word) == length:
    return 'something'
  else:
    return word

Now lets walk through your for loop. In each iteration you have a line of your original file. Lets split that into words. Python gives us the split function that we can use:

   split_line = line.split()

The default is to split on whitespace, which is exactly what we want. There's more documentation if you want it.

Now we want to get the list of calling our maybe_replace function on every word:

  new_split_line = [maybe_replace(word) for word in split_line]

Now we can join these back up together using the join method:

  new_line = ' '.join(new_split_line)

And write it back to our file:

  o.write(new_line + '\n')

So our final function will be:

def replacement():  
  o = open("file.txt","a") #file.txt will be the file containing the changed phrase
  for line in open("y.txt"):  #y.txt is the original file
    split_line = line.split()
    new_split_line = [maybe_replace(word) for word in split_line]
    new_line = ' '.join(new_split_line)
    o.write(new_line + '\n')
  o.close()

回答2:

This will preserve extra spaces that you have, as other solutions using str.split() do not.

import re

exp = re.compile(r'\b(\w{4})\b')
replaceWord = 'stuff'
with open('infile.txt','r') as inF, open('outfile.txt','w') as outF:
    for line in inF:
        outF.write(exp.sub(replaceWord,line))

This uses regular expressions to replace the text. There are three main parts to the regular expression used here. The first matches the beginning of a word:

\b

The second part matches exactly four letters (all alphanumeric characters and _):

(\w{4})

The last part is like the first, it matches the end of a word

\b

回答3:

This seems like homework, so here are some key concepts.

When you read a file, you get lines as strings. You can split a line into a list by using a string method called .split(), like such. words = line.split(). This creates a list of words.

Now, a list is iterable, meaning you can use a for loop over it, and do an operation on one item of the list at a time. You want to check how long the word is, so you have to iterate over words with your loop, and do something with it. You've kind of close to figuring out how to check the length of a word using len(word).

You also need a place to store your final information as you go. Outside of the loop, you need to make a list for results, and .append() the words that you've checked as you go along.

Finally, you need to do this for each line in your file, meaning a second for loop that iterates over the file.

回答4:

with open('file.txt', 'a') as write_file:
    with open('y.txt') as read_file:
        for line in read_file.readlines():
            # Replace the needed words
            line = line.replace('????', 'something')
            write_file.write(line)

来源：https://stackoverflow.com/questions/13313778/python-3-2-replace-all-words-in-a-text-document-that-are-a-certain-length

标签

python

string

python-3.2