Can't use ^ to say “all but”

给你一囗甜甜゛ 提交于 2021-02-02 09:45:44

问题


I have a text in which I want to get only the hexadecimal codes. Like: "thisissometextthisistext\x64\x6f\x6e\x74\x74\x72\x61\x6e\x73\x6c\x61\x74\x65somemoretextoverhere"

It's possible to get the hex codes with \x.. But it doesn't seems I can do something like (^\x..) to select everything but the hex codes.

Any workarounds?


回答1:


You may use a (?s)((?:\\x[a-fA-F0-9]{2})+)|. regex (that will match and capture into Group 1 any 1+ sequences of hex values OR will just match any other char including a line break char) and replace with a conditional replacement pattern (?{1}$1\n:) (that will reinsert the hex value chain or will replace the match with an empty string):

Find What:      (?s)((?:\\x[a-fA-F0-9]{2})+)|.
Replace With: (?{1}$1\n:)

Regex Details:

  • (?s) - same as . matches newline option ON
  • ((?:\\x[a-fA-F0-9]{2})+) - Group 1 capturing one or more sequences of
    • \\x - a \\x
    • [a-fA-F0-9]{2} - 2 letters from a to f or digits
  • | - or
  • . - any single char.

Replacement pattern:

  • (?{1} - if Group 1 matches:
    • $1\n - replace with its contents + a newline
    • : - else replace with an empty string
  • ) - end of the replacement pattern.



回答2:


try ^.*?((\\x[a-f0-9]{2})+).*$ and replace with $1

and it should just leave the hex code

then after replace




回答3:


If you are already able to find the hexcodes with your regex, couldn't you just use that information to delete all of the hexcodes from the string (or from a clone of the string if you need to preserve the original) and you would be left with all text except for hexcodes.




回答4:


^ acts as a negation token only inside (and at the beginning) of a character class, you can't use it to negate substrings of several characters.

To select all that isn't \xhh you can use this pattern:

\G(?:\\x[a-f0-9]{2})*+\K(?=.|\n)[^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)*

it matches the \xhhs first and removes them from the match using the \K feature (that removes all on the left). The other part of the pattern [^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)* matches all that isn't a \xhh. Since this subpattern can match the empty string at the end of the string, I added the lookahead (?=.|\n) to ensure there's at least one character. \G forces all matches to be contigous. In other words it matches the position at the end of the previous match.



来源:https://stackoverflow.com/questions/45001953/cant-use-to-say-all-but

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!