String replacement with dictionary, complications with punctuation

问题

I'm trying to write a function process(s,d) to replace abbreviations in a string with their full meaning by using a dictionary. where s is the string input and d is the dictionary. For example:

>>>d = {'ASAP':'as soon as possible'}
>>>s = "I will do this ASAP.  Regards, X"
>>>process(s,d)
>>>"I will do this as soon as possible.  Regards, X"

I have tried using the split function to separate the string and compare each part with the dictionary.

def process(s):
    return ''.join(d[ch] if ch in d else ch for ch in s)

However, it returns me the same exact string. I have a suspicion that the code doesn't work because of the full stop behind ASAP in the original string. If so, how do I ignore the punctuation and get ASAP to be replaced?

回答1:

Here is a way to do it with a single regex:

In [24]: d = {'ASAP':'as soon as possible', 'AFAIK': 'as far as I know'}

In [25]: s = 'I will do this ASAP, AFAIK.  Regards, X'

In [26]: re.sub(r'\b' + '|'.join(d.keys()) + r'\b', lambda m: d[m.group(0)], s)
Out[26]: 'I will do this as soon as possible, as far as I know.  Regards, X'

Unlike versions based on str.replace(), this observes word boundaries and therefore won't replace abbreviations that happen to appear in the middle of other words (e.g. "etc" in "fetch").

Also, unlike most (all?) other solutions presented thus far, it iterates over the input string just once, regardless of how many search terms there are in the dictionary.

回答2:

You can do something like this:

def process(s,d):
    for key in d:
        s = s.replace(key,d[key])
    return s

回答3:

Here is a working solution: use re.split(), and split by word boundaries (preserving the interstitial characters):

''.join( d.get( word, word ) for word in re.split( '(\W+)', s ) )

One significant difference that this code has from Vaughn's or Sheena's answer is that this code takes advantage of the O(1) lookup time of the dictionary, while their solutions look at every key in the dictionary. This means that when s is short and d is very large, their code will take significantly longer to run. Furthermore, parts of words will still be replaced in their solutions: if d = { "lol": "laugh out loud" } and s="lollipop" their solutions will incorrectly produce "laugh out loudlipop".

回答4:

use regular expressions:

re.sub(pattern,replacement,s)

In your application:

ret = s
for key in d:
    ret = re.sub(r'\b'+key+r'\b',d[key],ret)
return ret

\b matches the beginning or end of a word. Thanks Paul for the comment

回答5:

Instead of splitting by spaces, use:

split("\W")

It will split by anything that's not a character that would be part of a word.

回答6:

    python 3.2

    [s.replace(i,v) for i,v in d.items()]

回答7:

This is string replacement as well (+1 to @VaughnCato). This uses the reduce function to iterate through your dictionary, replacing any instances of the keys in the string with the values. s in this case is the accumulator, which is reduced (i.e. fed to the replace function) on every iteration, maintaining all past replacements (also, per @PaulMcGuire's point above, this replaces keys starting with the longest and ending with the shortest).

In [1]: d = {'ASAP':'as soon as possible', 'AFAIK': 'as far as I know'}

In [2]: s = 'I will do this ASAP, AFAIK.  Regards, X'

In [3]: reduce(lambda x, y: x.replace(y, d[y]), sorted(d, key=lambda i: len(i), reverse=True), s)
Out[3]: 'I will do this as soon as possible, as far as I know.  Regards, X'

As for why your function didn't return what you expected - when you iterate through s, you are actually iterating through the characters of the string - not the words. Your version could be tweaked by iterating over s.split() (which would be a list of the words), but you then run into an issue where the punctuation is causing words to not match your dictionary. You can get it to match by importing string and stripping out string.punctuation from each word, but that will remove the punctuation from the final string (so regex would be likely be the best option if replacement doesn't work).

来源：https://stackoverflow.com/questions/13814330/string-replacement-with-dictionary-complications-with-punctuation

标签

python

dictionary

replace

punctuation