How to find the count of a word in a string?

后端未结

关注

 9  788

I have a string \"Hello I am going to I with hello am\". I want to find how many times a word occur in the string. Example hello occurs 2 time. I tried this app

相关标签:

9条回答

无人共我

2020-12-01 13:19

def countSub(pat,string):
    result = 0
    for i in range(len(string)-len(pat)+1):
          for j in range(len(pat)):
              if string[i+j] != pat[j]:
                 break
          else:   
                 result+=1
    return result

0 讨论(0)

不知归路

2020-12-01 13:21
If you want to find the count of an individual word, just use count:
```
input_string.count("Hello")
```
Use collections.Counter and split() to tally up all the words:
```
from collections import Counter

words = input_string.split()
wordCount = Counter(words)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-12-01 13:21

You can divide the string into elements and calculate their number

count = len(my_string.split())

0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2020-12-01 13:24
Counter from collections is your friend:
```
>>> from collections import Counter
>>> counts = Counter(sentence.lower().split())
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2020-12-01 13:25
Here is an alternative, case-insensitive, approach
```
sum(1 for w in s.lower().split() if w == 'Hello'.lower())
2
```
It matches by converting the string and target into lower-case.

ps: Takes care of the "am ham".count("am") == 2 problem with str.count() pointed out by @DSM below too :)
0 讨论(0)
发布评论:

提交评论
- 加载中...

夕颜

2020-12-01 13:28

The vector of occurrence counts of words is called bag-of-words.

Scikit-learn provides a nice module to compute it, sklearn.feature_extraction.text.CountVectorizer. Example:

import numpy as np
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer(analyzer = "word",   \
                             tokenizer = None,    \
                             preprocessor = None, \
                             stop_words = None,   \
                             min_df = 0,          \
                             max_features = 50) 

text = ["Hello I am going to I with hello am"]

# Count
train_data_features = vectorizer.fit_transform(text)
vocab = vectorizer.get_feature_names()

# Sum up the counts of each vocabulary word
dist = np.sum(train_data_features.toarray(), axis=0)

# For each, print the vocabulary word and the number of times it 
# appears in the training set
for tag, count in zip(vocab, dist):
    print count, tag

Output:

2 am
1 going
2 hello
1 to
1 with

Part of the code was taken from this Kaggle tutorial on bag-of-words.

FYI: How to use sklearn's CountVectorizerand() to get ngrams that include any punctuation as separate tokens?

0 讨论(0)

1 2 下一页