Best way to strip punctuation from a string

前端 未结 26 1870
日久生厌
日久生厌 2020-11-21 05:39

It seems like there should be a simpler way than:

import string
s = \"string. With. Punctuation?\" # Sample string 
out = s.translate(string.maketrans(\"\",\         


        
26条回答
  •  北恋
    北恋 (楼主)
    2020-11-21 06:01

    string.punctuation is ASCII only! A more correct (but also much slower) way is to use the unicodedata module:

    # -*- coding: utf-8 -*-
    from unicodedata import category
    s = u'String — with -  «punctation »...'
    s = ''.join(ch for ch in s if category(ch)[0] != 'P')
    print 'stripped', s
    

    You can generalize and strip other types of characters as well:

    ''.join(ch for ch in s if category(ch)[0] not in 'SP')
    

    It will also strip characters like ~*+§$ which may or may not be "punctuation" depending on one's point of view.

提交回复
热议问题