Best way to strip punctuation from a string

前端 未结 26 1814
日久生厌
日久生厌 2020-11-21 05:39

It seems like there should be a simpler way than:

import string
s = \"string. With. Punctuation?\" # Sample string 
out = s.translate(string.maketrans(\"\",\         


        
相关标签:
26条回答
  • 2020-11-21 05:58
    #FIRST METHOD
    #Storing all punctuations in a variable    
    punctuation='!?,.:;"\')(_-'
    newstring='' #Creating empty string
    word=raw_input("Enter string: ")
    for i in word:
         if(i not in punctuation):
                      newstring+=i
    print "The string without punctuation is",newstring
    
    #SECOND METHOD
    word=raw_input("Enter string: ")
    punctuation='!?,.:;"\')(_-'
    newstring=word.translate(None,punctuation)
    print "The string without punctuation is",newstring
    
    
    #Output for both methods
    Enter string: hello! welcome -to_python(programming.language)??,
    The string without punctuation is: hello welcome topythonprogramminglanguage
    
    0 讨论(0)
  • Why none of you use this?

     ''.join(filter(str.isalnum, s)) 
    

    Too slow?

    0 讨论(0)
  • 2020-11-21 06:01

    string.punctuation is ASCII only! A more correct (but also much slower) way is to use the unicodedata module:

    # -*- coding: utf-8 -*-
    from unicodedata import category
    s = u'String — with -  «punctation »...'
    s = ''.join(ch for ch in s if category(ch)[0] != 'P')
    print 'stripped', s
    

    You can generalize and strip other types of characters as well:

    ''.join(ch for ch in s if category(ch)[0] not in 'SP')
    

    It will also strip characters like ~*+§$ which may or may not be "punctuation" depending on one's point of view.

    0 讨论(0)
  • 2020-11-21 06:01

    For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

    To remove (some?) punctuation then, use:

    import string
    
    remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
    s.translate(remove_punct_map)
    

    The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None based on the sequence of keys.

    To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian's answer (Python 3 version):

    import unicodedata
    import sys
    
    remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                     if unicodedata.category(chr(i)).startswith('P'))
    
    0 讨论(0)
  • 2020-11-21 06:01

    Here's one other easy way to do it using RegEx

    import re
    
    punct = re.compile(r'(\w+)')
    
    sentence = 'This ! is : a # sample $ sentence.' # Text with punctuation
    tokenized = [m.group() for m in punct.finditer(sentence)]
    sentence = ' '.join(tokenized)
    print(sentence) 
    'This is a sample sentence'
    
    
    0 讨论(0)
  • 2020-11-21 06:03

    Not necessarily simpler, but a different way, if you are more familiar with the re family.

    import re, string
    s = "string. With. Punctuation?" # Sample string 
    out = re.sub('[%s]' % re.escape(string.punctuation), '', s)
    
    0 讨论(0)
提交回复
热议问题