Split Strings into words with multiple word boundary delimiters

前端 未结 30 2617
既然无缘
既然无缘 2020-11-21 05:09

I think what I want to do is a fairly common task but I\'ve found no reference on the web. I have text with punctuation, and I want a list of the words.

\"H         


        
30条回答
  •  臣服心动
    2020-11-21 05:57

    Another way to achieve this is to use the Natural Language Tool Kit (nltk).

    import nltk
    data= "Hey, you - what are you doing here!?"
    word_tokens = nltk.tokenize.regexp_tokenize(data, r'\w+')
    print word_tokens
    

    This prints: ['Hey', 'you', 'what', 'are', 'you', 'doing', 'here']

    The biggest drawback of this method is that you need to install the nltk package.

    The benefits are that you can do a lot of fun stuff with the rest of the nltk package once you get your tokens.

提交回复
热议问题