Is there any way for me to preserve punctuation marks of !, ?, \" and \' from my text documents using text CountVectorizer or TfidfVectorizer param
CountVectorizer
TfidfVectorizer
You should customize the token_pattern parameter when you instantiate the vectorizer. For example:
token_pattern
vent = CountVectorizer(token_pattern=r"(?u)\b\w\w+\b|!|\?|\"|\'")