How to split line at non-printing ascii character in Python

三世轮回 提交于 2019-12-03 21:41:13
miku

You can use re.split.

>>> import re
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']

Adjust the pattern to only include the characters you want to keep.

See also: stripping-non-printable-characters-from-a-string-in-python


Example (w/ the long minus):

>>> # \xe2\x80\x93 represents a long dash (or long minus)
>>> s = 'hello – world'
>>> s
'hello \xe2\x80\x93 world'
>>> import re
>>> re.split("\xe2\x80\x93", s)
['hello ', ' world']

Or, the same with unicode:

>>> # \u2013 represents a long dash, long minus or so called en-dash
>>> s = u'hello – world'
>>> s
u'hello \u2013 world'
>>> import re
>>> re.split(u"\u2013", s)
[u'hello ', u' world']
_, _, your_result= your_input_string.partition('\x97')

or

your_result= your_input_string.partition('\x97')[2]

If your_input_string does not contain a '\x97', then your_result will be empty. If your_input_string contains multiple '\x97' characters, your_result will contain everything after the first '\x97' character, including other '\x97' characters.

Just use the string/unicode split method (They don't really care about the string you split upon (other than it is a constant. If you want to use a Regex then use re.split)

To get the split string either escape it like the other people have shown "\x97"

or

use chr(0x97) for strings (0-255) or unichr(0x97) for unicode

so an example would be

'will not be split'.split(chr(0x97))

'will be split here:\x97 and this is the second string'.split(chr(0x97))
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!