How to preserve punctuation marks in Scikit-Learn text CountVectorizer or TfidfVectorizer?

后端 未结 1 1842
有刺的猬
有刺的猬 2020-12-19 09:32

Is there any way for me to preserve punctuation marks of !, ?, \" and \' from my text documents using text CountVectorizer or TfidfVectorizer param

相关标签:
1条回答
  • 2020-12-19 10:04

    You should customize the token_pattern parameter when you instantiate the vectorizer. For example:

    vent = CountVectorizer(token_pattern=r"(?u)\b\w\w+\b|!|\?|\"|\'")
    
    0 讨论(0)
提交回复
热议问题