Deleting specific control characters(\n \r \t) from a string

后端未结

关注

 6  1253

I have quite large amount of text which include control charachters like \\n \\t and \\r. I need to replace them with a simple space--> \" \". What is the fastest way to do this

相关标签:

6条回答

感动是毒

2021-02-20 11:50
using regex
```
re.sub(r'\s+', ' ', '1\n2\r3\t4')
```
without regex
```
>>> ' '.join('1\n\n2\r3\t4'.split())
'1 2 3 4'
>>>
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
执念已碎

2021-02-20 11:51
You may also try regular expressions:
```
import re
regex = re.compile(r'[\n\r\t]')
regex.sub(' ', my_str)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2021-02-20 11:55
```
>>> re.sub(r'[\t\n\r]', ' ', '1\n2\r3\t4')
'1 2 3 4'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2021-02-20 11:57
my_string is the string where you want to delete specific control characters. As strings are immutable in python, after substitute operation you need to assign it to another string or reassign it:
```
my_string = re.sub(r'[\n\r\t]*', '', my_string)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2021-02-20 12:03
If you want to normalise whitespace (replace runs of one or more whitespace characters by a single space, and strip leading and trailing whitespace) this can be accomplished by using string methods:
```
>>> text = '   foo\tbar\r\nFred  Nurke\t Joe Smith\n\n'
>>> ' '.join(text.split())
'foo bar Fred Nurke Joe Smith'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

你的背包

2021-02-20 12:07

I think the fastest way is to use str.translate():

import string
s = "a\nb\rc\td"
print s.translate(string.maketrans("\n\t\r", "   "))

prints

a b c d

EDIT: As this once again turned into a discussion about performance, here some numbers. For long strings, translate() is way faster than using regular expressions:

s = "a\nb\rc\td " * 1250000

regex = re.compile(r'[\n\r\t]')
%timeit t = regex.sub(" ", s)
# 1 loops, best of 3: 1.19 s per loop

table = string.maketrans("\n\t\r", "   ")
%timeit s.translate(table)
# 10 loops, best of 3: 29.3 ms per loop

That's about a factor 40.

0 讨论(0)