Regex with negative lookahead across multiple lines

前端 未结 2 1398
攒了一身酷
攒了一身酷 2020-12-19 11:38

For the past few hours I\'ve been trying to match address(es) from the following sample data and I can\'t get it to work:

medicalHistory      None
address            


        
相关标签:
2条回答
  • 2020-12-19 11:51

    The problem with your regex is that + is greedy and goes until it finds a character out of that group, the @ in the first case and - in the second.

    Another approach is to use a non-greedy quantifier and a positive look-ahead for a newline followed by a word-character, like (python version):

    re.findall(r'address\s+.*?(?=\n\w)', s, re.DOTALL)
    

    It yields:

    ['address             24 Lewin Street, KUBURA, \n                NSW, Australia',
     'address             16 Yarra Street, \n                                     LAWRENCE, VIC, Australia']
    
    0 讨论(0)
  • 2020-12-19 12:10

    I would do it this way:

    address\s+((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+
    

    See it here on Regexr.

    This ((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+ is the important part, where I say, match the next character from [0-9a-zA-Z, \r\n\t], if (?![\r\n]+\w) is not following. This is matching what you expect.

    In both your cases the regex stopped matching because of a character that is not included in your character class. If you want to go that way than you would need to combine a lazy quantifier and a positive lookahead:

    address\s+([0-9a-zA-Z, \n\r\t]+?)(?=\r\w)
    

    [0-9a-zA-Z, \n\r\t]+? is matching as less as possible till the condition (?=\r\w) is true.

    See it here at Regexr

    0 讨论(0)
提交回复
热议问题