backreference

Backreferences Syntax in Replacement Strings (Why Dollar Sign?)

允我心安 提交于 2019-11-27 20:51:00
问题 In Java, and it seems in a few other languages, backreferences in the pattern are preceded by a backslash (e.g. \1 , \2 , \3 , etc), but in a replacement string they preceded by a dollar sign (e.g. $1 , $2 , $3 , and also $0 ). Here's a snippet to illustrate: System.out.println( "left-right".replaceAll("(.*)-(.*)", "\\2-\\1") // WRONG!!! ); // prints "2-1" System.out.println( "left-right".replaceAll("(.*)-(.*)", "$2-$1") // CORRECT! ); // prints "right-left" System.out.println( "You want

How do backreferences in regexes make backtracking required?

别来无恙 提交于 2019-11-27 11:39:20
问题 I read http://swtch.com/~rsc/regexp/regexp1.html and in it the author says that in order to have backreferences in regexs, one needs backtracking when matching, and that makes the worst-case complexity exponential. But I don't see exactly why backreferences introduce the need for backtracking. Can someone explain why, and perhaps provide an example (regex and input)? 回答1: To get directly at your question, you should make a short study of the Chomsky Hierarchy. This is an old and beautiful way

How to match a regex with backreference in Go?

和自甴很熟 提交于 2019-11-27 09:07:26
I need to match a regex that uses backreferences (e.g. \1) in my Go code. That's not so easy because in Go, the official regexp package uses the RE2 engine , one that have chosen to not support backreferences (and some other lesser-known features) so that there can be a guarantee of linear-time execution, therefore avoiding regex denial-of-service attacks . Enabling backreferences support is not an option with RE2. In my code, there is no risk of malicious exploitation by attackers, and I need backreferences. What should I do? Regular Expressions are great for working with regular grammars,

Backreferences in lookbehind

久未见 提交于 2019-11-27 08:43:43
Can you use backreferences in a lookbehind? Let's say I want to split wherever behind me a character is repeated twice. String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK! String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS! System.out.println(java.util.Arrays.toString( "Bazooka killed the poor aardvark (yummy!)" .split(REGEX2) )); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]" Using REGEX2 (where the backreference is in a lookahead nested inside a lookbehind) works, but REGEX1 gives this error at run-time: Look-behind group does not have an obvious maximum length near index 8 (?<=(.)

python re.sub - alternative replacement patterns

六眼飞鱼酱① 提交于 2019-11-27 08:20:28
问题 I want to provide alternative replacement patterns to re.sub. Let's say i've got two search patterns as alternatives, like this: re.sub(r"[A-Z]+|[a-z]+", replacementpattern, string) and instead of providing one replacement pattern I would like to somehow catch which search pattern alternative was matched and provide alternative replacement patterns. Is this possible? Thanks. PS. code specifics here are irrelevant, it's a general question. 回答1: You can pass a function to re.sub() . In the

Extract capture group matches from regular expressions? (or: where is gregexec?)

泄露秘密 提交于 2019-11-27 02:42:23
问题 Given a regular expression containing capture groups (parentheses) and a string, how can I obtain all the substrings matching the capture groups, i.e., the substrings usually referenced by "\1", "\2"? Example: consider a regex capturing digits preceded by "xy": s <- "xy1234wz98xy567" r <- "xy(\\d+)" Desired result: [1] "1234" "567" First attempt: gregexpr : regmatches(s,gregexpr(r,s)) #[[1]] #[1] "xy1234" "xy567" Not what I want because it returns the substrings matching the entire pattern.

JavaScript - string regex backreferences

假装没事ソ 提交于 2019-11-27 00:33:26
You can backreference like this in JavaScript: var str = "123 $test 123"; str = str.replace(/(\$)([a-z]+)/gi, "$2"); This would (quite silly) replace "$test" with "test". But imagine I'd like to pass the resulting string of $2 into a function, which returns another value. I tried doing this, but instead of getting the string "test", I get "$2". Is there a way to achieve this? // Instead of getting "$2" passed into somefunc, I want "test" // (i.e. the result of the regex) str = str.replace(/(\$)([a-z]+)/gi, somefunc("$2")); Like this: str.replace(regex, function(match, $1, $2, offset, original)

Circumvent the sed backreference limit \1 through \9

谁说我不能喝 提交于 2019-11-26 20:37:23
问题 The sed manual clearly states that the available backreferences available for the replacement string in a substitute are numbered \1 through \9. I'm trying to parse a log file that has 10 fields. I have the regex formed for it but the tenth match (and anything after) isn't accessible. Does anyone have an elegant way to circumvent this limitation in KSH (or any language that perhaps I can port to shell scripting)? 回答1: Can you user perl -pe 's/(match)(str)/$2$1/g;' in place of sed? The way to

preg_replace: add number after backreference

别来无恙 提交于 2019-11-26 19:08:51
Situation I want to use preg_replace() to add a digit '8' after each of [aeiou] . Example from abcdefghij to a8bcde8fghi8j Question How should I write the replacement string? // input string $in = 'abcdefghij'; // this obviously won't work ----------↓ $out = preg_replace( '/([aeiou])/', '\18', $in); This is just an example , so suggesting str_replace() is not a valid answer. I want to know how to have number after backreference in the replacement string. The solution is to wrap the backreference in ${} . $out = preg_replace( '/([aeiou])/', '${1}8', $in); which will output a8bcde8fghi8j See the

Backreferences in lookbehind

大兔子大兔子 提交于 2019-11-26 17:45:56
问题 Can you use backreferences in a lookbehind? Let's say I want to split wherever behind me a character is repeated twice. String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK! String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS! System.out.println(java.util.Arrays.toString( "Bazooka killed the poor aardvark (yummy!)" .split(REGEX2) )); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]" Using REGEX2 (where the backreference is in a lookahead nested inside a lookbehind) works, but REGEX1 gives