I have a python script that's trying to interpret a trace of data written to and read from stdout and stdin, respectively. The problem is that this data is riddled with ANSI escapes I don't care about. These escapes are JSON encoded, so they look like "\033[A" and "\033]0;". I don't actually need to interpret the codes, but I do need to know how many characters are included in each (you'll notice the first sequence is 6 characters while the second is 7). Is there a straightforward way to filter out these codes from the strings I have?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
由 翻译强力驱动
问题:
回答1:
The complete regexp for Control Sequences (aka ANSI Escape Sequences) is
/(\x9B|\x1B\[)[0-?]*[ -\/]*[@-~]/
Refer to ECMA-48 Section 5.4 and ANSI escape code
回答2:
Another variant:
def strip_ansi_codes(s): """ >>> import blessings >>> term = blessings.Terminal() >>> foo = 'hidden'+term.clear_bol+'foo'+term.color(5)+'bar'+term.color(255)+'baz' >>> repr(strip_ansi_codes(foo)) u'hiddenfoobarbaz' """ return re.sub(r'\x1b\[([0-9,A-Z]{1,2}(;[0-9]{1,2})?(;[0-9]{3})?)?[m|K]?', '', s)
回答3:
#!/usr/bin/env python import re ansi_pattern = '\033\[((?:\d|;)*)([a-zA-Z])' ansi_eng = re.compile(ansi_pattern) def strip_escape(string=''): lastend = 0 matches = [] newstring = str(string) for match in ansi_eng.finditer(string): start = match.start() end = match.end() matches.append(match) matches.reverse() for match in matches: start = match.start() end = match.end() string = string[0:start] + string[end:] return string if __name__ == '__main__': import sys import os lname = sys.argv[-1] fname = os.path.basename(__file__) if lname != fname: with open(lname, 'r') as fd: for line in fd.readlines(): print strip_escape(line).rstrip() else: USAGE = '%s ' % fname print USAGE
回答4:
This worked for me:
re.sub(r'\x1b\[[\d;]+m', '', s)
回答5:
It's far from perfect, but this regex may get you somwhere:
import re text = r'begin \033[A middle \033]0; end' print re.sub(r'\\[0-9]+(\[|\])[0-9]*;?[A-Z]?', '', text)
It already removes your two examples correctly.
回答6:
FWIW, this Python regex seemed to work for me. I don't actually know if it's accurate, but empirically it seems to work:
r'\\033[\[\]]([0-9]{1,2}([;@][0-9]{0,2})*)*[mKP]?'