Is there a simple way to remove multiple spaces in a string?

后端未结

关注

 29  1430

Suppose this string:

The   fox jumped   over    the log.

Turning into:

相关标签:

29条回答

醉酒成梦

2020-11-22 08:53
Because @pythonlarry asked here are the missing generator based versions

The groupby join is easy. Groupby will group elements consecutive with same key. And return pairs of keys and list of elements for each group. So when the key is an space an space is returne else the entire group.
```
from itertools import groupby
def group_join(string):
  return ''.join(' ' if chr==' ' else ''.join(times) for chr,times in groupby(string))
```
The group by variant is simple but very slow. So now for the generator variant. Here we consume an iterator, the string, and yield all chars except chars that follow an char.
```
def generator_join_generator(string):
  last=False
  for c in string:
    if c==' ':
      if not last:
        last=True
        yield ' '
    else:
      last=False
    yield c

def generator_join(string):
  return ''.join(generator_join_generator(string))
```
So i meassured the timings with some other lorem ipsum.
- while_replace 0.015868543065153062
- re_replace 0.22579886706080288
- proper_join 0.40058281796518713
- group_join 5.53206754301209
- generator_join 1.6673167790286243
With Hello and World separated by 64KB of spaces
- while_replace 2.991308711003512
- re_replace 0.08232860406860709
- proper_join 6.294375243945979
- group_join 2.4320066600339487
- generator_join 6.329648651066236
Not forget the original sentence
- while_replace 0.002160938922315836
- re_replace 0.008620491018518806
- proper_join 0.005650000995956361
- group_join 0.028368217987008393
- generator_join 0.009435956948436797
Interesting here for nearly space only strings group join is not that worse Timing showing always median from seven runs of a thousand times each.
0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2020-11-22 08:54
Similar to the previous solutions, but more specific: replace two or more spaces with one:
```
>>> import re
>>> s = "The   fox jumped   over    the log."
>>> re.sub('\s{2,}', ' ', s)
'The fox jumped over the log.'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

执笔经年

2020-11-22 08:56

A simple soultion

>>> import re
>>> s="The   fox jumped   over    the log."
>>> print re.sub('\s+',' ', s)
The fox jumped over the log.

0 讨论(0)

Happy的楠姐

2020-11-22 08:56
```
import re
string = re.sub('[ \t\n]+', ' ', 'The     quick brown                \n\n             \t        fox')
```
This will remove all the tabs, new lines and multiple white spaces with single white space.
0 讨论(0)
发布评论:

提交评论
- 加载中...
不思量自难忘°

2020-11-22 08:56
To remove white space, considering leading, trailing and extra white space in between words, use:
```
(?<=\s) +|^ +(?=\s)| (?= +[\n\0])
```
The first or deals with leading white space, the second or deals with start of string leading white space, and the last one deals with trailing white space.

For proof of use, this link will provide you with a test.

https://regex101.com/r/meBYli/4

This is to be used with the re.split function.
0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2020-11-22 08:57
In some cases it's desirable to replace consecutive occurrences of every whitespace character with a single instance of that character. You'd use a regular expression with backreferences to do that.

(\s)\1{1,} matches any whitespace character, followed by one or more occurrences of that character. Now, all you need to do is specify the first group (\1) as the replacement for the match.

Wrapping this in a function:
```
import re

def normalize_whitespace(string):
    return re.sub(r'(\s)\1{1,}', r'\1', string)
```
```
>>> normalize_whitespace('The   fox jumped   over    the log.')
'The fox jumped over the log.'
>>> normalize_whitespace('First    line\t\t\t \n\n\nSecond    line')
'First line\t \nSecond line'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 5 下一页