non-greedy

remove text between delimiters, multiple times on each line

情到浓时终转凉″ 提交于 2019-12-01 22:53:40
I need to remove text between the delimiters "<" and ">", but there are multiple instances of these on each line of my text file. For example, I want to turn this: person 1, person 2<email2@mail.com>, person 3<email3@mail.com>, person 4<email4@mail.com>` Into this: person 1, person 2, person 3, person 4 I've tried to use a few things, including sed: sed -e 's/<.*>//' filename.csv but this removes everything between the first < and the last > giving the result person 1, person 2 . you can use a negated character class in your regex: sed 's/<[^>]*>//g' filename.csv If you want to join the dark

non-greedy matching in Scala RegexParsers

China☆狼群 提交于 2019-12-01 03:09:02
Suppose I'm writing a rudimentary SQL parser in Scala. I have the following: class Arith extends RegexParsers { def selectstatement: Parser[Any] = selectclause ~ fromclause def selectclause: Parser[Any] = "(?i)SELECT".r ~ tokens def fromclause: Parser[Any] = "(?i)FROM".r ~ tokens def tokens: Parser[Any] = rep(token) //how to make this non-greedy? def token: Parser[Any] = "(\\s*)\\w+(\\s*)".r } When trying to match selectstatement against SELECT foo FROM bar , how do I prevent the selectclause from gobbling up the entire phrase due to the rep(token) in ~ tokens ? In other words, how do I

Which would be better non-greedy regex or negated character class?

限于喜欢 提交于 2019-11-29 11:43:00
I need to match @anything_here@ from a string @anything_here@dhhhd@shdjhjs@ . So I'd used following regex. ^@.*?@ or ^@[^@]*@ Both way it's work but I would like to know which one would be a better solution. Regex with non-greedy repetition or regex with negated character class? Sebastian Proske Negated character classes should usually be prefered over lazy matching, if possible. If the regex is successful, ^@[^@]*@ can match the content between @ s in a single step, while ^@.*?@ needs to expand for each character between @ s. When failing (for the case of no ending @ ) most regex engines will

Greedy, Non-Greedy, All-Greedy Matching in C# Regex

前提是你 提交于 2019-11-29 09:08:17
How can I get all the matches in the following example: // Only "abcd" is matched MatchCollection greedyMatches = Regex.Matches("abcd", @"ab.*"); // Only "ab" is matched MatchCollection lazyMatches = Regex.Matches("abcd", @"ab.*?"); // How can I get all matches: "ab", "abc", "abcd" P.S.: I want to have the all matches in a generic manner. The example above is just an example. Tseng You could use something like: MatchCollection nonGreedyMatches = Regex.Matches("abcd", @"(((ab)c)d)"); Then you should have three backreferences with ab, abc and abcd. But, to be honest, this kind of regex doesn't

How to non-greedy multiple lookbehind matches

孤人 提交于 2019-11-29 05:22:00
Source: <prefix><content1><suffix1><prefix><content2><suffix2> Engine: PCRE RegEx1: (?<=<prefix>)(.*)(?=<suffix1>) RegEx2: (?<=<prefix>)(.*)(?=<suffix2>) Result1: <content1> Result2: <content1><suffix1><prefix><content2> The desired result for RegEx2 is just <content2> but it is obviously greedy. How do I make RegEx2 non-greedy and use only the last matching lookbehind? [I hope I have translated this correctly from the NoteTab syntax. I don't do much RegEx coding. The <prefix>, <content> & <suffix> terms are just meant to represent arbitrary strings. Only the "<" in the "?<=" lookbehind

Regex Non-Greedy (Lazy)

我们两清 提交于 2019-11-28 10:50:24
I'm attempting to non-greedily parse out TD tags. I'm starting with something like this: <TD>stuff<TD align="right">More stuff<TD align="right>Other stuff<TD>things<TD>more things I'm using the below as my regex: Regex.Split(tempS, @"\<TD[.\s]*?\>"); The records return as below: "" "stuff<TD align="right">More stuff<TD align="right>Other stuff" "things" "more things" Why is it not splitting that first full result (the one starting with "stuff")? How can I adjust the regex to split on all instances of the TD tag with or without parameters? The regex you want is <TD[^>]*> : < # Match opening tag

Greedy, Non-Greedy, All-Greedy Matching in C# Regex

核能气质少年 提交于 2019-11-28 02:32:53
问题 How can I get all the matches in the following example: // Only "abcd" is matched MatchCollection greedyMatches = Regex.Matches("abcd", @"ab.*"); // Only "ab" is matched MatchCollection lazyMatches = Regex.Matches("abcd", @"ab.*?"); // How can I get all matches: "ab", "abc", "abcd" P.S.: I want to have the all matches in a generic manner. The example above is just an example. 回答1: You could use something like: MatchCollection nonGreedyMatches = Regex.Matches("abcd", @"(((ab)c)d)"); Then you

Why is this simple .*? non-greedy regex being greedy?

落爺英雄遲暮 提交于 2019-11-27 15:48:32
I have a very simple regex similar to this: HOHO.*?_HO_ With this test string... fiwgu_HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO___HO_fbguyev I expect it to match just _HOHO___HO_ (shortest match, non-greedy) Instead it matches _HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO___HO_ (longest match, looks greedy). Why? How can I make it match the shortest match? Adding and removing the ? gives the same result. Edit - better test string that shows why [^HOHO] doesn't work: fiwgu_HOHO_HOHO_HOHOrgh_HOHO_feh_HOHO_H_O_H_O_HO_fbguye All I can think of is that maybe it is matching multiple times - but there's only one match for

Posix regular expression non-greedy

随声附和 提交于 2019-11-27 06:12:19
问题 Is there a way to use a non-greedy regular expression in C like one can use in Perl? I tried several things, but it's actually not working. I'm currently using this regex that matches an IP address and the corresponding HTTP request, but it's greedy although I'm using the *?: ([0-9]{1,3}(\\.[0-9]{1,3}){3})(.*?)HTTP/1.1 In this example, it always matches the whole string: #include <regex.h> #include <stdio.h> int main() { int a, i; regex_t re; regmatch_t pm; char *mpages = "TEST 127.0.0.1 GET

Regex Non-Greedy (Lazy)

戏子无情 提交于 2019-11-27 03:56:37
问题 I'm attempting to non-greedily parse out TD tags. I'm starting with something like this: <TD>stuff<TD align="right">More stuff<TD align="right>Other stuff<TD>things<TD>more things I'm using the below as my regex: Regex.Split(tempS, @"\<TD[.\s]*?\>"); The records return as below: "" "stuff<TD align="right">More stuff<TD align="right>Other stuff" "things" "more things" Why is it not splitting that first full result (the one starting with "stuff")? How can I adjust the regex to split on all