Trying to count words in a string

后端 未结 7 790
醉话见心
醉话见心 2021-02-06 03:28

I\'m trying to analyze the contents of a string. If it has a punctuation mixed in the word I want to replace them with spaces.

For example, If Johnny.Appleseed!is:a*good

相关标签:
7条回答
  • 2021-02-06 03:48
    for ltr in ('!', '.', ...) # insert rest of punctuation
         stringss = strings.replace(ltr, ' ')
    return len(stringss.split(' '))
    
    0 讨论(0)
  • 2021-02-06 03:54

    I know that this is an old question but...How about this?

    string = "If Johnny.Appleseed!is:a*good&farmer"
    
    a = ["*",":",".","!",",","&"," "]
    new_string = ""
    
    for i in string:
       if i not in a:
          new_string += i
       else:
          new_string = new_string  + " "
    
    print(len(new_string.split(" ")))
    
    0 讨论(0)
  • 2021-02-06 03:54

    Simple loop based solution:

    strs = "Johnny.Appleseed!is:a*good&farmer"
    lis = []
    for c in strs:
        if c.isalnum() or c.isspace():
            lis.append(c)
        else:
            lis.append(' ')
    
    new_strs = "".join(lis)
    print new_strs           #print 'Johnny Appleseed is a good farmer'
    new_strs.split()         #prints ['Johnny', 'Appleseed', 'is', 'a', 'good', 'farmer']
    

    Better solution:

    Using regex:

    >>> import re
    >>> from string import punctuation
    >>> strs = "Johnny.Appleseed!is:a*good&farmer"
    >>> r = re.compile(r'[{}]'.format(punctuation))
    >>> new_strs = r.sub(' ',strs)
    >>> len(new_strs.split())
    6
    #using `re.split`:
    >>> strs = "Johnny.Appleseed!is:a*good&farmer"
    >>> re.split(r'[^0-9A-Za-z]+',strs)
    ['Johnny', 'Appleseed', 'is', 'a', 'good', 'farmer']
    
    0 讨论(0)
  • 2021-02-06 03:56

    How about using Counter from collections ?

    import re
    from collections import Counter
    
    words = re.findall(r'\w+', string)
    print (Counter(words))
    
    0 讨论(0)
  • 2021-02-06 04:01

    try this: it parses the word_list using re, then creates a dictionary of word:appearances

    import re
    word_list = re.findall(r"[\w']+", string)
    print {word:word_list.count(word) for word in word_list}
    
    0 讨论(0)
  • 2021-02-06 04:05

    Here's a one-line solution that doesn't require importing any libraries.
    It replaces non-alphanumeric characters (like punctuation) with spaces, and then splits the string.

    Inspired from "Python strings split with multiple separators"

    >>> s = 'Johnny.Appleseed!is:a*good&farmer'
    >>> words = ''.join(c if c.isalnum() else ' ' for c in s).split()
    >>> words
    ['Johnny', 'Appleseed', 'is', 'a', 'good', 'farmer']
    >>> len(words)
    6
    
    0 讨论(0)
提交回复
热议问题