Best way to change words into numbers using specific word list

前端未结

关注

 3  712

广开言路 2021-01-14 20:58

I have a text file that contains tweets per line, that need to be altered for a machine learning format. Im using python and basic unix text manipulation (regex) to achieve

3条回答

失恋的感觉 (楼主)

2021-01-14 21:38

from string import punctuation as pnc
tokens = {':)', 'cool', 'happy', 'fun'}
tweets = ['this has been a fun day :)', 'i find python cool! it makes me happy']
for tweet in tweets:
    s = [(word in tokens or word.strip(pnc) in tokens) for word in tweet.split()]
    print(' '.join('1' if t else '0' for t in s))

Output:

0 0 0 0 1 0 1
0 0 0 1 0 0 0 1

The or in the 4th line is there to handle :), as suggested by @EOL.

There are still cases that will not be handled correctly, such as with cool :), I like it. The problem is inherent to the requirements.

0 讨论(0)

查看其它3个回答