Split a string and keep the delimiters as part of the split string chunks, not as separate list elements

夙愿已清 提交于 2020-06-28 09:21:22

问题


This is a spin-off from In Python, how do I split a string and keep the separators?

rawByteString = b'\\!\x00\x00\x00\x00\x00\x00\\!\x00\x00\x00\x00\x00\x00'

How can I split this rawByteString into parts using "\\!" as the delimiter without dropping the delimiters, so that I get:

[b'\\!\x00\x00\x00\x00\x00\x00', b'\\!\x00\x00\x00\x00\x00\x00']

I do not want to use [b'\\!' + x for x in rawByteString.split(b'\\!')][1:] as that would use string.split() and is just a workaround, that is why this question is tagged with the "re" module.


回答1:


You may use

re.split(rb'(?!\A)(?=\\!)', rawByteString)
re.split(rb'(?!^)(?=\\!)', rawByteString)

See a sample regex demo (the string input changed since null bytes cannot be part of a string).

Regex details

  • (?!^) / (?!\A) / (?<!^) - a position other than start of string
  • (?=\\!) - a position not immediately followed with a backslash + !

NOTES

  • Since you use a byte string, the b prefix is required when defining the pattern string literal
  • r makes the string literal a raw string literal so that we do not have to double escape backslashes and can use \\ to match a single \ in the string.

See Python demo:

import re
rawByteString = b'\\!\x00\x00\x00\x00\x00\x00\\!\x00\x00\x00\x00\x00\x00'
print ( re.split(rb'(?!\A)(?=\\!)', rawByteString) )

Output:

[b'\\!\x00\x00\x00\x00\x00\x00', b'\\!\x00\x00\x00\x00\x00\x00']


来源:https://stackoverflow.com/questions/62591863/split-a-string-and-keep-the-delimiters-as-part-of-the-split-string-chunks-not-a

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!