Python Regular expression must strip whitespace except between quotes

后端 未结 5 1204
一个人的身影
一个人的身影 2020-12-11 18:43

I need a way to remove all whitespace from a string, except when that whitespace is between quotes.

result = re.sub(\'\".*?\"\', \"\", content)
相关标签:
5条回答
  • 2020-12-11 18:58

    Here is a one-liner version, based on @kindall's idea - yet it does not use regex at all! First split on ", then split() every other item and re-join them, that takes care of whitespaces:

    stripWS = lambda txt:'"'.join( it if i%2 else ''.join(it.split())
        for i,it in enumerate(txt.split('"'))  )
    

    Usage example:

    >>> stripWS('This is a string with some "text in quotes."')
    'Thisisastringwithsome"text in quotes."'
    
    0 讨论(0)
  • 2020-12-11 19:03

    I don't think you're going to be able to do that with a single regex. One way to do it is to split the string on quotes, apply the whitespace-stripping regex to every other item of the resulting list, and then re-join the list.

    import re
    
    def stripwhite(text):
        lst = text.split('"')
        for i, item in enumerate(lst):
            if not i % 2:
                lst[i] = re.sub("\s+", "", item)
        return '"'.join(lst)
    
    print stripwhite('This is a string with some "text in quotes."')
    
    0 讨论(0)
  • 2020-12-11 19:10

    You can use shlex.split for a quotation-aware split, and join the result using " ".join. E.g.

    print " ".join(shlex.split('Hello "world     this    is" a    test'))
    
    0 讨论(0)
  • 2020-12-11 19:12

    Oli, resurrecting this question because it had a simple regex solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

    Here's the small regex:

    "[^"]*"|(\s+)
    

    The left side of the alternation matches complete "quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expression on the left.

    Here is working code (and an online demo):

    import re
    subject = 'Remove Spaces Here "But Not Here" Thank You'
    regex = re.compile(r'"[^"]*"|(\s+)')
    def myreplacement(m):
        if m.group(1):
            return ""
        else:
            return m.group(0)
    replaced = regex.sub(myreplacement, subject)
    print(replaced)
    

    Reference

    1. How to match pattern except in situations s1, s2, s3
    2. How to match a pattern unless...
    0 讨论(0)
  • 2020-12-11 19:15

    Here little longish version with check for quote without pair. Only deals with one style of start and end string (adaptable for example for example start,end='()')

    start, end = '"', '"'
    
    for test in ('Hello "world this is" atest',
                 'This is a string with some " text inside in quotes."',
                 'This is without quote.',
                 'This is sentence with bad "quote'):
        result = ''
    
        while start in test :
            clean, _, test = test.partition(start)
            clean = clean.replace(' ','') + start
            inside, tag, test = test.partition(end)
            if not tag:
                raise SyntaxError, 'Missing end quote %s' % end
            else:
                clean += inside + tag # inside not removing of white space
            result += clean
        result += test.replace(' ','')
        print result
    
    0 讨论(0)
提交回复
热议问题