How to strip all whitespace from string

前端 未结 11 1844
暖寄归人
暖寄归人 2020-11-28 18:25

How do I strip all the spaces in a python string? For example, I want a string like strip my spaces to be turned into stripmyspaces, but I cannot s

相关标签:
11条回答
  • 2020-11-28 18:50

    As mentioned by Roger Pate following code worked for me:

    s = " \t foo \n bar "
    "".join(s.split())
    'foobar'
    

    I am using Jupyter Notebook to run following code:

    i=0
    ProductList=[]
    while i < len(new_list): 
       temp=''                            # new_list[i]=temp=' Plain   Utthapam  '
       #temp=new_list[i].strip()          #if we want o/p as: 'Plain Utthapam'
       temp="".join(new_list[i].split())  #o/p: 'PlainUtthapam' 
       temp=temp.upper()                  #o/p:'PLAINUTTHAPAM' 
       ProductList.append(temp)
       i=i+2
    
    0 讨论(0)
  • 2020-11-28 18:57

    The standard techniques to filter a list apply, although they are not as efficient as the split/join or translate methods.

    We need a set of whitespaces:

    >>> import string
    >>> ws = set(string.whitespace)
    

    The filter builtin:

    >>> "".join(filter(lambda c: c not in ws, "strip my spaces"))
    'stripmyspaces'
    

    A list comprehension (yes, use the brackets: see benchmark below):

    >>> import string
    >>> "".join([c for c in "strip my spaces" if c not in ws])
    'stripmyspaces'
    

    A fold:

    >>> import functools
    >>> "".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))
    'stripmyspaces'
    

    Benchmark:

    >>> from timeit import timeit
    >>> timeit('"".join("strip my spaces".split())')
    0.17734256500003198
    >>> timeit('"strip my spaces".translate(ws_dict)', 'import string; ws_dict = {ord(ws):None for ws in string.whitespace}')
    0.457635745999994
    >>> timeit('re.sub(r"\s+", "", "strip my spaces")', 'import re')
    1.017787621000025
    
    >>> SETUP = 'import string, operator, functools, itertools; ws = set(string.whitespace)'
    >>> timeit('"".join([c for c in "strip my spaces" if c not in ws])', SETUP)
    0.6484303600000203
    >>> timeit('"".join(c for c in "strip my spaces" if c not in ws)', SETUP)
    0.950212219999969
    >>> timeit('"".join(filter(lambda c: c not in ws, "strip my spaces"))', SETUP)
    1.3164566040000523
    >>> timeit('"".join(functools.reduce(lambda acc, c: acc if c in ws else acc+c, "strip my spaces"))', SETUP)
    1.6947649049999995
    
    0 讨论(0)
  • 2020-11-28 19:01

    For Python 3:

    >>> import re
    >>> re.sub(r'\s+', '', 'strip my \n\t\r ASCII and \u00A0 \u2003 Unicode spaces')
    'stripmyASCIIandUnicodespaces'
    >>> # Or, depending on the situation:
    >>> re.sub(r'(\s|\u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF)+', '', \
    ... '\uFEFF\t\t\t strip all \u000A kinds of \u200B whitespace \n')
    'stripallkindsofwhitespace'
    

    ...handles any whitespace characters that you're not thinking of - and believe us, there are plenty.

    \s on its own always covers the ASCII whitespace:

    • (regular) space
    • tab
    • new line (\n)
    • carriage return (\r)
    • form feed
    • vertical tab

    Additionally:

    • for Python 2 with re.UNICODE enabled,
    • for Python 3 without any extra actions,

    ...\s also covers the Unicode whitespace characters, for example:

    • non-breaking space,
    • em space,
    • ideographic space,

    ...etc. See the full list here, under "Unicode characters with White_Space property".

    However \s DOES NOT cover characters not classified as whitespace, which are de facto whitespace, such as among others:

    • zero-width joiner,
    • Mongolian vowel separator,
    • zero-width non-breaking space (a.k.a. byte order mark),

    ...etc. See the full list here, under "Related Unicode characters without White_Space property".

    So these 6 characters are covered by the list in the second regex, \u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF.

    Sources:

    • https://docs.python.org/2/library/re.html
    • https://docs.python.org/3/library/re.html
    • https://en.wikipedia.org/wiki/Unicode_character_property
    0 讨论(0)
  • 2020-11-28 19:03

    Try a regex with re.sub. You can search for all whitespace and replace with an empty string.

    \s in your pattern will match whitespace characters - and not just a space (tabs, newlines, etc). You can read more about it in the manual.

    0 讨论(0)
  • 2020-11-28 19:04

    The simplest is to use replace:

    "foo bar\t".replace(" ", "").replace("\t", "")
    

    Alternatively, use a regular expression:

    import re
    re.sub(r"\s", "", "foo bar\t")
    
    0 讨论(0)
  • 2020-11-28 19:04

    Remove the Starting Spaces in Python

    string1="    This is Test String to strip leading space"
    print string1
    print string1.lstrip()
    

    Remove the Trailing or End Spaces in Python

    string2="This is Test String to strip trailing space     "
    print string2
    print string2.rstrip()
    

    Remove the whiteSpaces from Beginning and end of the string in Python

    string3="    This is Test String to strip leading and trailing space      "
    print string3
    print string3.strip()
    

    Remove all the spaces in python

    string4="   This is Test String to test all the spaces        "
    print string4
    print string4.replace(" ", "")
    
    0 讨论(0)
提交回复
热议问题