问题
I have a text in which I want to get only the hexadecimal codes. Like: "thisissometextthisistext\x64\x6f\x6e\x74\x74\x72\x61\x6e\x73\x6c\x61\x74\x65somemoretextoverhere"
It's possible to get the hex codes with \x.. But it doesn't seems I can do something like (^\x..) to select everything but the hex codes.
Any workarounds?
回答1:
You may use a (?s)((?:\\x[a-fA-F0-9]{2})+)|.
regex (that will match and capture into Group 1 any 1+ sequences of hex values OR will just match any other char including a line break char) and replace with a conditional replacement pattern (?{1}$1\n:)
(that will reinsert the hex value chain or will replace the match with an empty string):
Find What: (?s)((?:\\x[a-fA-F0-9]{2})+)|.
Replace With: (?{1}$1\n:)
Regex Details:
(?s)
- same as.
matches newline option ON((?:\\x[a-fA-F0-9]{2})+)
- Group 1 capturing one or more sequences of\\x
- a\\x
[a-fA-F0-9]{2}
- 2 letters froma
tof
or digits
|
- or.
- any single char.
Replacement pattern:
(?{1}
- if Group 1 matches:$1\n
- replace with its contents + a newline:
- else replace with an empty string
)
- end of the replacement pattern.
回答2:
try ^.*?((\\x[a-f0-9]{2})+).*$
and replace with $1
and it should just leave the hex code
then after replace
回答3:
If you are already able to find the hexcodes with your regex, couldn't you just use that information to delete all of the hexcodes from the string (or from a clone of the string if you need to preserve the original) and you would be left with all text except for hexcodes.
回答4:
^
acts as a negation token only inside (and at the beginning) of a character class, you can't use it to negate substrings of several characters.
To select all that isn't \xhh
you can use this pattern:
\G(?:\\x[a-f0-9]{2})*+\K(?=.|\n)[^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)*
it matches the \xhh
s first and removes them from the match using the \K
feature (that removes all on the left). The other part of the pattern [^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)*
matches all that isn't a \xhh
. Since this subpattern can match the empty string at the end of the string, I added the lookahead (?=.|\n)
to ensure there's at least one character.
\G
forces all matches to be contigous. In other words it matches the position at the end of the previous match.
来源:https://stackoverflow.com/questions/45001953/cant-use-to-say-all-but