Remove whitespace in Python using string.whitespace

后端 未结 5 594
夕颜
夕颜 2020-11-29 19:51

Python\'s string.whitespace is great:

>>> string.whitespace
\'\\t\\n\\x0b\\x0c\\r \'

How do I use this with a string without resor

相关标签:
5条回答
  • 2020-11-29 20:22

    Let's make some reasonable assumptions:

    (1) You really want to replace any run of whitespace characters with a single space (a run is of length 1 or greater).

    (2) You would like the same code to work with minimal changes under Python 2.X with unicode objects.

    (3) You don't want your code to assume things that are not guaranteed in the docs

    (4) You would like the same code to work with minimal changes with Python 3.X str objects.

    The currently selected answer has these problems:

    (a) changes " " * 3 to " " * 2 i.e. it removes duplicate spaces but not triplicate, quadruplicate, etc spaces. [fails requirement 1]

    (b) changes "foo\tbar\tzot" to "foobarzot" [fails requirement 1]

    (c) when fed a unicode object, gets TypeError: translate() takes exactly one argument (2 given) [fails requirement 2]

    (d) uses string.whitespace[:-1] [fails requirement 3; order of characters in string.whitespace is not guaranteed]

    (e) uses string.whitespace[:-1] [fails requirement 4; in Python 2.X, string.whitespace is '\t\n\x0b\x0c\r '; in Python 3.X, it is ' \t\n\r\x0b\x0c']

    The " ".join(s.split()) answer and the re.sub(r"\s+", " ", s) answer don't have these problems.

    0 讨论(0)
  • 2020-11-29 20:23

    There is a special-case shortcut for exactly this use case!

    If you call str.split without an argument, it splits on runs of whitespace instead of single characters. So:

    >>> ' '.join("Please \n don't \t hurt \x0b me.".split())
    "Please don't hurt me."
    
    0 讨论(0)
  • 2020-11-29 20:24

    You could use the translate method

    import string
    
    s = "Please \n don't \t hurt \x0b me."
    s = s.translate(None, string.whitespace[:-1]) # python 2.6 and up
    s = s.translate(string.maketrans('',''), string.whitespace[:-1]) # python 2.5, dunno further down
    >>> s
    "Please  don't  hurt  me."
    

    And then remove duplicate whitespace

    s.replace('  ', ' ')
    >>> s
    "Please don't hurt me."
    
    0 讨论(0)
  • 2020-11-29 20:35

    a starting point .. (although it's not shorter than manually assembling the whitespace circus) ..

    >>> from string import whitespace as ws
    >>> import re
    
    >>> p = re.compile('(%s)' % ('|'.join([c for c in ws])))
    >>> s = "Please \n don't \t hurt \x0b me."
    
    >>> p.sub('', s)
    "Pleasedon'thurtme."
    

    Or if you want to reduce whitespace to a maximum of one:

    >>> p1 = re.compile('(%s)' % ('|'.join([c for c in ws if not c == ' '])))
    >>> p2 = re.compile(' +')
    >>> s = "Please \n don't \t hurt \x0b me."
    
    >>> p2.sub(' ', p1.sub('', s))
    "Please don't hurt me."
    

    Third way, more compact:

    >>> import string
    
    >>> s = "Please \n don't \t hurt \x0b me."
    >>> s.translate(None, string.whitespace[])
    "Pleasedon'thurtme."
    
    >>> s.translate(None, string.whitespace[:5])
    "Please  don't  hurt  me."
    
    >>> ' '.join(s.translate(None, string.whitespace[:5]).split())
    "Please don't hurt me."
    
    0 讨论(0)
  • 2020-11-29 20:40

    What's wrong with the \s character class?

    >>> import re
    
    >>> pattern = re.compile(r'\s+')
    >>> re.sub(pattern, ' ', "Please \n don't \t hurt \x0b me.")
    "Please don't hurt me."
    
    0 讨论(0)
提交回复
热议问题