How to find the count of a word in a string?

后端 未结 9 788
北恋
北恋 2020-12-01 13:07

I have a string \"Hello I am going to I with hello am\". I want to find how many times a word occur in the string. Example hello occurs 2 time. I tried this app

相关标签:
9条回答
  • 2020-12-01 13:19
    def countSub(pat,string):
        result = 0
        for i in range(len(string)-len(pat)+1):
              for j in range(len(pat)):
                  if string[i+j] != pat[j]:
                     break
              else:   
                     result+=1
        return result
    
    0 讨论(0)
  • 2020-12-01 13:21

    If you want to find the count of an individual word, just use count:

    input_string.count("Hello")
    

    Use collections.Counter and split() to tally up all the words:

    from collections import Counter
    
    words = input_string.split()
    wordCount = Counter(words)
    
    0 讨论(0)
  • 2020-12-01 13:21

    You can divide the string into elements and calculate their number

    count = len(my_string.split())

    0 讨论(0)
  • 2020-12-01 13:24

    Counter from collections is your friend:

    >>> from collections import Counter
    >>> counts = Counter(sentence.lower().split())
    
    0 讨论(0)
  • 2020-12-01 13:25

    Here is an alternative, case-insensitive, approach

    sum(1 for w in s.lower().split() if w == 'Hello'.lower())
    2
    

    It matches by converting the string and target into lower-case.

    ps: Takes care of the "am ham".count("am") == 2 problem with str.count() pointed out by @DSM below too :)

    0 讨论(0)
  • 2020-12-01 13:28

    The vector of occurrence counts of words is called bag-of-words.

    Scikit-learn provides a nice module to compute it, sklearn.feature_extraction.text.CountVectorizer. Example:

    import numpy as np
    from sklearn.feature_extraction.text import CountVectorizer
    
    vectorizer = CountVectorizer(analyzer = "word",   \
                                 tokenizer = None,    \
                                 preprocessor = None, \
                                 stop_words = None,   \
                                 min_df = 0,          \
                                 max_features = 50) 
    
    text = ["Hello I am going to I with hello am"]
    
    # Count
    train_data_features = vectorizer.fit_transform(text)
    vocab = vectorizer.get_feature_names()
    
    # Sum up the counts of each vocabulary word
    dist = np.sum(train_data_features.toarray(), axis=0)
    
    # For each, print the vocabulary word and the number of times it 
    # appears in the training set
    for tag, count in zip(vocab, dist):
        print count, tag
    

    Output:

    2 am
    1 going
    2 hello
    1 to
    1 with
    

    Part of the code was taken from this Kaggle tutorial on bag-of-words.

    FYI: How to use sklearn's CountVectorizerand() to get ngrams that include any punctuation as separate tokens?

    0 讨论(0)
提交回复
热议问题