I think what I want to do is a fairly common task but I\'ve found no reference on the web. I have text with punctuation, and I want a list of the words.
\"H
In Python 3, your can use the method from PY4E - Python for Everybody.
We can solve both these problems by using the string methods
lower
,punctuation
, andtranslate
. Thetranslate
is the most subtle of the methods. Here is the documentation fortranslate
:
your_string.translate(your_string.maketrans(fromstr, tostr, deletestr))
Replace the characters in
fromstr
with the character in the same position intostr
and delete all characters that are indeletestr
. Thefromstr
andtostr
can be empty strings and thedeletestr
parameter can be omitted.
Your can see the "punctuation":
In [10]: import string
In [11]: string.punctuation
Out[11]: '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
For your example:
In [12]: your_str = "Hey, you - what are you doing here!?"
In [13]: line = your_str.translate(your_str.maketrans('', '', string.punctuation))
In [14]: line = line.lower()
In [15]: words = line.split()
In [16]: print(words)
['hey', 'you', 'what', 'are', 'you', 'doing', 'here']
For more information, you can refer: