It seems like there should be a simpler way than:
import string
s = \"string. With. Punctuation?\" # Sample string
out = s.translate(string.maketrans(\"\",\
string.punctuation
misses loads of punctuation marks that are commonly used in the real world. How about a solution that works for non-ASCII punctuation?
import regex
s = u"string. With. Some・Really Weird、Non?ASCII。 「(Punctuation)」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()
Personally, I believe this is the best way to remove punctuation from a string in Python because:
\{S}
if you want to remove punctuation, but keep symbols like $
.\{Pd}
will only remove dashes.This uses Unicode character properties, which you can read more about on Wikipedia.