发表新帖

发表新帖

How to preserve punctuation marks in Scikit-Learn text CountVectorizer or TfidfVectorizer?

后端未结

关注

 1  1842

Is there any way for me to preserve punctuation marks of !, ?, \" and \' from my text documents using text CountVectorizer or TfidfVectorizer param

相关标签:

1条回答

南旧

2020-12-19 10:04
You should customize the token_pattern parameter when you instantiate the vectorizer. For example:
```
vent = CountVectorizer(token_pattern=r"(?u)\b\w\w+\b|!|\?|\"|\'")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题