问题
We have:
>>> str
'exit\r\ndrwxr-xr-x 2 root root 0 Jan 1 2000
\x1b[1;34mbin\x1b[0m\r\ndrwxr-xr-x 3 root root
0 Jan 1 2000 \x1b[1;34mlib\x1b[0m\r\ndrwxr-xr-x 10 root
root 0 Jan 1 1970 \x1b[1;34mlocal\x1b[0m\r\ndrwxr-xr-x
2 root root 0 Jan 1 2000 \x1b[1;34msbin\x1b[0m\r\ndrwxr-xr-x
5 root root 0 Jan 1 2000 \x1b[1;34mshare\x1b[0m\r\n# exit\r\n'
>>> print str
exit
drwxr-xr-x 2 root root 0 Jan 1 2000 bin
drwxr-xr-x 3 root root 0 Jan 1 2000 lib
drwxr-xr-x 10 root root 0 Jan 1 1970 local
drwxr-xr-x 2 root root 0 Jan 1 2000 sbin
drwxr-xr-x 5 root root 0 Jan 1 2000 share
# exit
I want to get rid of all the '\xblah[0m' nonsense using regexp. I've tried
re.sub(str, r'(\x.*m)', '')
But that hasn't done the trick. Any ideas?
回答1:
You have a few issues:
You're passing arguments to re.sub in the wrong order wrong. It should be:
re.sub(regexp_pattern, replacement, source_string)
The string doesn't contain "\x". That "\x1b" is the escape character, and it's a single character.
As interjay pointed out, you want ".*?" rather than ".*", because otherwise it will match everything from the first escape through the last "m".
The correct call to re.sub is:
print re.sub('\x1b.*?m', '', s)
Alternatively, you could use:
print re.sub('\x1b[^m]*m', '', s)
回答2:
You need the following changes:
- Escape the backslash
- Switch to non-greedy matching. Otherwise, everything between the first
\x
and the lastm
will be removed, which will be a problem when there is more than one occurrence. - The order of arguments is incorrect
Result:
re.sub(r'(\\x.*?m)', '', str)
回答3:
These are ANSI terminal codes. They're signalled by an ESC (byte 27, seen in Python as \x1B
) followed by [
, then some ;
-separated parameters and finally a letter to specify which command it is. (m
is a colour change.)
The parameters are usually numbers so for this simple case you could get rid of them with:
ansisequence= re.compile(r'\x1B\[[^A-Za-z]*[A-Za-z]')
ansisequence.sub('', string)
Technically for some (non-colour-related) control codes they could be general strings, which makes the parsing annoying. It's rare you'd meet these, but if you did I guess you'd have to use something complicated like:
\x1B\[((\d+|"[^"]*")(;(\d+|"[^"]*"))*)?[A-Za-z]
Best would be to persuade whatever's generating the string that you're not an ANSI terminal so it shouldnt include colour codes in its output.
回答4:
Try running ls --color=never -l
instead, and you won't get the ANSI escape codes in the first place.
回答5:
Here is a pyparsing solution to your problem, with a general parsing expression for those pesky escape sequences. By transforming the initial string with a suppressed expression, this returns a string stripped of all matches of the expression.
s = \
'exit\r\ndrwxr-xr-x 2 root root 0 Jan 1 2000 ' \
'\x1b[1;34mbin\x1b[0m\r\ndrwxr-xr-x 3 root root ' \
'0 Jan 1 2000 \x1b[1;34mlib\x1b[0m\r\ndrwxr-xr-x 10 root ' \
'root 0 Jan 1 1970 \x1b[1;34mlocal\x1b[0m\r\ndrwxr-xr-x ' \
'2 root root 0 Jan 1 2000 \x1b[1;34msbin\x1b[0m\r\ndrwxr-xr-x ' \
'5 root root 0 Jan 1 2000 \x1b[1;34mshare\x1b[0m\r\n# exit\r\n' \
from pyparsing import (Literal, Word, nums, Combine,
delimitedList, oneOf, alphas, Suppress)
ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + delimitedList(integer,';') + oneOf(list(alphas)))
s_prime = Suppress(escapeSeq).transformString(s)
print s_prime
This prints your desired output, as stored in s_prime
.
来源:https://stackoverflow.com/questions/1833873/python-regex-escape-characters