backreference | 易学教程

strange behavior of parenthesis in python regex

阅读更多关于 strange behavior of parenthesis in python regex

问题 I'm writing a python regex that looks through a text document for quoted strings (quotes of airline pilots recorded from blackboxes). I started by trying to write a regex with the following rules: Return what is between quotes. if it opens with single, only return if it closes with single. if it opens with double, only return if it closes with double. For instance I don't want to match "hi there', or 'hi there", but "hi there" and 'hi there'. I use a testing page which contains things like:

How can I find, increment and replace in php?

阅读更多关于 How can I find, increment and replace in php?

问题 I have strings in the form \d+_\d+ and I want to add 1 to the second number. Since my explanation is so very clear, let me give you a few examples: 1234567_2 should become 1234567_3 1234_10 should become 1234_11 Here is my first attempt: $new = preg_replace("/(\d+)_(\d+)/", "$1_".((int)$2)+1, $old); This results in a syntax error: Parse error: syntax error, unexpected T_LNUMBER, expecting T_VARIABLE or '$' in [...] on line 201 Here is my second attempt $new = preg_replace("/(\d+)_(\d+)/", "$1

How can I find, increment and replace in php?

阅读更多关于 How can I find, increment and replace in php?

I have strings in the form \d+_\d+ and I want to add 1 to the second number. Since my explanation is so very clear, let me give you a few examples: 1234567_2 should become 1234567_3 1234_10 should become 1234_11 Here is my first attempt: $new = preg_replace("/(\d+)_(\d+)/", "$1_".((int)$2)+1, $old); This results in a syntax error: Parse error: syntax error, unexpected T_LNUMBER, expecting T_VARIABLE or '$' in [...] on line 201 Here is my second attempt $new = preg_replace("/(\d+)_(\d+)/", "$1_".("$2"+1), $old); This transforms $old = 1234567_2 into $new = 1234567_1, which is not the desired

Do backreferences need to come after the group they reference?

阅读更多关于 Do backreferences need to come after the group they reference?

问题 While running some tests for this answer, I noticed the following unexpected behavior. This will remove all occurrences of <tag> after the first: var input = "<text><text>extra<words><text><words><something>"; Regex.Replace(input, @"(<[^>]+>)(?<=\1.*\1)", ""); // <text>extra<words><something> But this will not: Regex.Replace(input, @"(?<=\1.*)(<[^>]+>)", ""); // <text><text>extra<words><text><words><something> Similarly, this will remove all occurences of <tag> before the last: Regex.Replace

Do backreferences need to come after the group they reference?

阅读更多关于 Do backreferences need to come after the group they reference?

While running some tests for this answer , I noticed the following unexpected behavior. This will remove all occurrences of <tag> after the first: var input = "<text><text>extra<words><text><words><something>"; Regex.Replace(input, @"(<[^>]+>)(?<=\1.*\1)", ""); // <text>extra<words><something> But this will not: Regex.Replace(input, @"(?<=\1.*)(<[^>]+>)", ""); // <text><text>extra<words><text><words><something> Similarly, this will remove all occurences of <tag> before the last: Regex.Replace(input, @"(<[^>]+>)(?=.*\1)", ""); // extra<text><words><something> But this will not: Regex.Replace

How to use back reference with stringi package?

阅读更多关于 How to use back reference with stringi package?

问题 In R I can use \\1 to reference to a capturing group. However, when using the stringi package, this doesn't work as expected. library(stringi) fileName <- "hello-you.lst" (fileName <- stri_replace_first_regex(fileName, "(.*)\\.lst$", "\\1")) [1] "1" Expected output: hello-you . In the documentation I couldn't find anything concerning this problem. 回答1: You need to use $1 instead of \\1 in the replacement string: library(stringi) fileName <- "hello-you.lst" fileName <- stri_replace_first_regex

How to use back reference with stringi package?

阅读更多关于 How to use back reference with stringi package?

In R I can use \\1 to reference to a capturing group. However, when using the stringi package, this doesn't work as expected. library(stringi) fileName <- "hello-you.lst" (fileName <- stri_replace_first_regex(fileName, "(.*)\\.lst$", "\\1")) [1] "1" Expected output: hello-you . In the documentation I couldn't find anything concerning this problem. You need to use $1 instead of \\1 in the replacement string: library(stringi) fileName <- "hello-you.lst" fileName <- stri_replace_first_regex(fileName, "(.*)\\.lst$", "$1") [1] "hello-you" From the doc , stri_*_regex uses ICU's regular expressions

Python regex subsitution: separate backreference from digit

阅读更多关于 Python regex subsitution: separate backreference from digit

In a regex replacement pattern, a backreference looks like \1 . If you want to include a digit after that backreference, this will fail because the digit is considered to be part of the backreference number: # replace all twin digits by zeroes, but retain white space in between re.sub(r"\d(\s*)\d", r"0\10", "0 1") >>> sre_constants.error: invalid group reference Substitution pattern r"0\1 0" would work fine but in the failing example back-reference \1 is interpreted as \10 . How can the digit '0' be separated from the back-reference \1 that precedes it? You can use \g<1> , as mentioned in the

How to apply a function on a backreference?

阅读更多关于 How to apply a function on a backreference?

Say I have strings like the following: old_string = "I love the number 3 so much" I would like to spot the integer numbers (in the example above, there is only one number, 3 ), and replace them with a value larger by 1, i.e., the desired result should be new_string = "I love the number 4 so much" In Python, I can use: r = re.compile(r'([0-9])+') new_string = r.sub(r'\19', s) to append a 9 at the end of the integer numbers matched. However, I would like to apply something more general on \1 . If I define a function: def f(i): return i + 1 How do I apply f() on \1 , so that I can replace the

General approach for (equivalent of) “backreferences within character class”?

阅读更多关于 General approach for (equivalent of) “backreferences within character class”?

In Perl regexes, expressions like \1 , \2 , etc. are usually interpreted as "backreferences" to previously captured groups, but not so when the \1 , \2 , etc. appear within a character class. In the latter case, the \ is treated as an escape character (and therefore \1 is just 1 , etc.). Therefore, if (for example) one wanted to match a string (of length greater than 1) whose first character matches its last character, but does not appear anywhere else in the string, the following regex will not do: /\A # match beginning of string; (.) # match and capture first character (referred to