collapsing whitespace in a string

后端 未结 6 1844
隐瞒了意图╮
隐瞒了意图╮ 2021-01-05 03:59

I have a string that kind of looks like this:

\"stuff   .  // : /// more-stuff .. .. ...$%$% stuff -> DD\"

and I want to strip off all p

相关标签:
6条回答
  • 2021-01-05 04:38
    result = rex.sub(' ', string) # this produces a string with tons of whitespace padding
    result = rex.sub('', result) # this reduces all those spaces
    

    Because you typo'd and forgot to use rex_s for the second call instead. Also, you need to substitute at least one space back in or you'll end up with any multiple-space gap becoming no gap at all, instead of a single-space gap.

    result = rex.sub(' ', string) # this produces a string with tons of whitespace padding
    result = rex_s.sub(' ', result) # this reduces all those spaces
    
    0 讨论(0)
  • 2021-01-05 04:40
    s = "$$$aa1bb2 cc-dd ee_ff ggg."
    re.sub(r'\W+', ' ', s).upper()
    # ' AA1BB2 CC DD EE_FF GGG '
    

    Is _ punctuation?

    re.sub(r'[_\W]+', ' ', s).upper()
    # ' AA1BB2 CC DD EE FF GGG '
    

    Don't want the leading and trailing space?

    re.sub(r'[_\W]+', ' ', s).strip().upper()
    # 'AA1BB2 CC DD EE FF GGG'
    
    0 讨论(0)
  • 2021-01-05 04:46

    One can use regular expression to substitute reoccurring white spaces. White space is given by \s with \s+ meaning: at least one.

    import re
    rex = re.compile(r'\s+')
    test = "     x  y z           z"
    res = rex.sub(' ', test)
    print(f">{res}<")
    

    > x y z z<

    Note this also affects/includes carriage return, etc.

    0 讨论(0)
  • 2021-01-05 04:48

    Here's a single-step approach (but the uppercasing actually uses a string method -- much simpler!):

    rex = re.compile(r'\W+')
    result = rex.sub(' ', strarg).upper()
    

    where strarg is the string argument (don't use names that shadow builtins or standard library modules, please).

    0 讨论(0)
  • 2021-01-05 04:48

    Do you have to use regular expressions? Do you feel you must do it in one line?

    >>> import string
    >>> s = "stuff   .  // : /// more-stuff .. .. ...$%$% stuff -> DD"
    >>> s2 = ''.join(c for c in s if c in string.letters + ' ')
    >>> ' '.join(s2.split())
    'stuff morestuff stuff DD'
    
    0 讨论(0)
  • 2021-01-05 04:56

    works in python3 this will retain the same whitespace character you collapsed. So if you have a tab and a space next to each other they wont collapse into a single character.

    def collapse_whitespace_characters(raw_text):
        ret = ''
        if len(raw_text) > 1:
            prev_char = raw_text[0]
            ret += prev_char
            for cur_char in raw_text[1:]:
                if not cur_char.isspace() or cur_char != prev_char:
                    ret += cur_char
                prev_char = cur_char
        else:
            ret = raw_text
        return ret
    

    this one will collapse whitespace sets into the first whitespace character it sees

    def collapse_whitespace(raw_text):
        ret = ''
        if len(raw_text) > 1:
            prev_char = raw_text[0]
            ret += prev_char
            for cur_char in raw_text[1:]:
                if not cur_char.isspace() or \
                        (cur_char.isspace() and not prev_char.isspace()):
                    ret += cur_char
                prev_char = cur_char
        else:
            ret = raw_text
        return ret
    

    >>> collapse_whitespace_characters('we like    spaces  and\t\t TABS   AND WHATEVER\xa0\xa0IS')
    'we like spaces and\t TABS\tAND WHATEVER\xa0IS'

    >>> collapse_whitespace('we like    spaces  and\t\t TABS   AND WHATEVER\xa0\xa0IS')
    'we like spaces and\tTABS\tAND WHATEVER\xa0IS'

    for punctuation

    def collapse_punctuation(raw_text):
        ret = ''
        if len(raw_text) > 1:
            prev_char = raw_text[0]
            ret += prev_char
            for cur_char in raw_text[1:]:
                if cur_char.isalnum() or cur_char != prev_char:
                    ret += cur_char
                prev_char = cur_char
        else:
            ret = raw_text
        return ret
    

    to actually answer the question

    orig = 'stuff   .  // : /// more-stuff .. .. ...$%$% stuff -> DD'
    collapse_whitespace(''.join([(c.upper() if c.isalnum() else ' ') for c in orig]))
    

    as said, the regexp would be something like

    re.sub('\W+', ' ', orig).upper()
    
    0 讨论(0)
提交回复
热议问题