It seems like there should be a simpler way than:
import string
s = \"string. With. Punctuation?\" # Sample string
out = s.translate(string.maketrans(\"\",\
#FIRST METHOD
#Storing all punctuations in a variable
punctuation='!?,.:;"\')(_-'
newstring='' #Creating empty string
word=raw_input("Enter string: ")
for i in word:
if(i not in punctuation):
newstring+=i
print "The string without punctuation is",newstring
#SECOND METHOD
word=raw_input("Enter string: ")
punctuation='!?,.:;"\')(_-'
newstring=word.translate(None,punctuation)
print "The string without punctuation is",newstring
#Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage
Why none of you use this?
''.join(filter(str.isalnum, s))
Too slow?
string.punctuation
is ASCII only! A more correct (but also much slower) way is to use the unicodedata module:
# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with - «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s
You can generalize and strip other types of characters as well:
''.join(ch for ch in s if category(ch)[0] not in 'SP')
It will also strip characters like ~*+§$
which may or may not be "punctuation" depending on one's point of view.
For Python 3 str
or Python 2 unicode
values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None
is removed.
To remove (some?) punctuation then, use:
import string
remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)
The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None
based on the sequence of keys.
To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian's answer (Python 3 version):
import unicodedata
import sys
remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
if unicodedata.category(chr(i)).startswith('P'))
Here's one other easy way to do it using RegEx
import re
punct = re.compile(r'(\w+)')
sentence = 'This ! is : a # sample $ sentence.' # Text with punctuation
tokenized = [m.group() for m in punct.finditer(sentence)]
sentence = ' '.join(tokenized)
print(sentence)
'This is a sample sentence'
Not necessarily simpler, but a different way, if you are more familiar with the re family.
import re, string
s = "string. With. Punctuation?" # Sample string
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)