lookahead

Nested regex lookahead and lookbehind

僤鯓⒐⒋嵵緔 提交于 2019-12-04 02:53:44
I am having problems with the nested '+'/'-' lookahead/lookbehind in regex. Let's say that I want to change the '*' in a string with '%' and let's say that '\' escapes the next character. (Turning a regex to sql like command ^^). So the string '*test*' should be changed to '%test%' , '\\*test\\*' -> '\\%test\\%' , but '\*test\*' and '\\\*test\\\*' should stay the same. I tried: (?<!\\)(?=\\\\)*\* but this doesn't work (?<!\\)((?=\\\\)*\*) ... (?<!\\(?=\\\\)*)\* ... (?=(?<!\\)(?=\\\\)*)\* ... What is the correct regex that will match the '*'s in examples given above? What is the difference

How to implement LOOP in a FORTH-like language interpreter written in C

痞子三分冷 提交于 2019-12-02 22:59:51
I'm writing a simple stack-based language in C and was wondering how I should go about implementing a loop structure of some kind, and/or lookahead symbols. Since the code is a bit long for this page (over 200 lines) I've put it in a GitHub repository . EDIT: The main program is in file stack.c . EDIT: The code just takes in input in words , kind of like FORTH. It uses scanf and works left to right. Then it uses a series of if s and strcmp s to decide what to do. That's really it. The Forth approach is to add a separate loop stack alongside the data stack. You then define operations that work

Regex to match only innermost delimited sequence

大憨熊 提交于 2019-12-01 10:31:53
I have a string that contains sequences delimited by multiple characters: << and >> . I need a regular expression to only give me the innermost sequences. I have tried lookaheads but they don't seem to work in the way I expect them to. Here is a test string: 'do not match this <<but match this>> not this <<BUT NOT THIS <<this too>> IT HAS CHILDREN>> <<and <also> this>>' It should return: but match this this too and <also> this As you can see with the third result, I can't just use /<<[^>]+>>/ because the string may have one character of the delimiters, but not two in a row. I'm fresh out of

Regex to match only innermost delimited sequence

为君一笑 提交于 2019-12-01 06:19:08
问题 I have a string that contains sequences delimited by multiple characters: << and >> . I need a regular expression to only give me the innermost sequences. I have tried lookaheads but they don't seem to work in the way I expect them to. Here is a test string: 'do not match this <<but match this>> not this <<BUT NOT THIS <<this too>> IT HAS CHILDREN>> <<and <also> this>>' It should return: but match this this too and <also> this As you can see with the third result, I can't just use /<<[^>]+>>/

Regular expression: matching words between white space

一笑奈何 提交于 2019-12-01 05:10:01
Im trying to do something fairly simple with regular expression in python... thats what i thought at least. What i want to do is matching words from a string if its preceded and followed by a whitespace. If its at the beginning of the string there is no whitespace required before - if its at the end, dont't search for whitespace either. Example: "WordA WordB WordC-WordD WordE" I want to match WordA WordB WordE . I only came up with overcomplicated way of doing this... (?<=(?<=^)|(?<=\s))\w+(?=(?=\s)|(?=$)) It seems to me there has to be a simple way for such a simple problem.... I figured i

REGEX - Matching any character which repeats n times

我与影子孤独终老i 提交于 2019-11-30 11:28:25
How to match any character which repeats n times? Example: for input: abcdbcdcdd for n=1: .......... for n=2: ......... for n=3: .. ..... for n=4: . . .. for n=5: no matches After several hours my best is this expression (\w)(?=(?:.*\1){n-1,}) //where n is variable which uses lookahead. However the problem with this expression is this: for input: abcdbcdcdd for n=1 .......... for n=2 ... .. . for n=3 .. . for n=4 . for n=5 no matches As you can see, when lookahead matches for a character, let's look for n=4 line, d 's lookahead assertion satisfied and first d matched by regex. But remaining d

How to use regex lookahead to limit the total length of input string

纵然是瞬间 提交于 2019-11-30 05:05:26
问题 I have this regular expression and want to add the rule which limit the total length is no more than 15 chars. I saw some lookahead examples but they're not quite clear. Can you help me to modify this expression to support the new rule. ^([A-Z]+( )*[A-Z]+)+$ 回答1: Actually, all this can be simplified a lot: ^[A-Z][A-Z ]{0,13}[A-Z]$ does exactly what you want. Or at least what your current regex does (plus the length restriction). This especially avoids problems with catastrophic backtracking

Regex with negative lookahead across multiple lines

喜夏-厌秋 提交于 2019-11-29 15:44:48
For the past few hours I've been trying to match address(es) from the following sample data and I can't get it to work: medicalHistory None address 24 Lewin Street, KUBURA, NSW, Australia email MaryBeor@spambob.com address 16 Yarra Street, LAWRENCE, VIC, Australia name Mary Beor medicalHistory None phone 00000000000000000000353336907 birthday 26-11-1972 My plan was to find anything that starts with "address", is followed by any space followed by characters, numbers commas and newlines and ends with newline followed by a character. I came up with the following (and many variations of it):

StackOverflowError when matching large input using RegEx

喜欢而已 提交于 2019-11-29 11:45:53
I got StackOverflowError when matching the result using a RegEx pattern. The pattern is (\d\*?(;(?=\d))?)+ . This regex is used to validate the input: 12345;4342;234*;123*;344324 The input is a string consists of values (only digits) separated by ; . Each value could include one * at the end (used as wildcard for other matching). There is no ; at the end of the string. The problem is that this regex works fine which small number of values. But when the numbers of values is too large (over 300), it will cause StackOverflowError . final String TEST_REGEX = "(\\d\\*?(;(?=\\d))?)+"; // Generate

Why is lookahead (sometimes) faster than capturing?

大城市里の小女人 提交于 2019-11-29 10:47:40
This question is inspired by this other one . Comparing s/,(\d)/$1/ to s/,(?=\d)// : the former uses a capture group to replace only the digit but not the comma, the latter uses a lookahead to determine whether the comma is succeeded by a digit. Why is the latter sometimes faster, as discussed in this answer ? The two approaches do different things and have different kinds of overhead costs. When you capture, perl has to make a copy of the captured text. Look-ahead matches without consuming; it has to mark the location where it starts. You can see what's happening by using the re 'debug'