Regex includes two matches in first match

a 夏天 提交于 2021-02-07 23:52:51

问题


I have this regex that tries to find individual STEP-lines and divides it into three goups of reference number, class and properties:

#14=IFCEXTRUDEDAREASOLID(#28326,#17,#9,3657.6);

becomes

[['14'], ['IFCEXTRUDEDAREASOLID'], ['#28326,#17,#9,3657.6']]

Sometimes these lines have arbitrary line breaks, especially among the properties, so I put some \s in the regex. This however makes for an interesting bug. The pattern now matches TWO rows into every match.

How can I adjust the regex to only catch one row even if they have line breaks? And just for curiosity, why does it stop after the second line and not continuing until last line?


回答1:


The reason why you now match 2 lines every time is that \s matches any whitespace, and if there is a line break after a line matched, the \s* will grab them all.

Use

/^#(\d+)\s*=\s*([a-zA-Z0-9]+)\s*\(((?:'[^']*'|[^;'])+)\);/gm

See this regex demo

Details:

  • ^ - start of a line
  • # - a hash symbol
  • (\d+) - Group 1: one or more digits
  • \s*=\s* - a = enclosed with optional whitespaces
  • ([a-zA-Z0-9]+) - Group 2 capturing 1+ alphanumerics
  • \s*\( - 0+ whitespaces and a (
  • ((?:'[^']*'|[^;'])+) - Group 3 capturing either '...' substrings ('[^']*', with no ' inside allowed) or (|) 1+ chars other than ; and ' ([^;']+)
  • \); - a ); sequence

A negated character class solution suggested by Maverick_Mrt is good for specific cases, but once the text captured with ([\s\S]*?) contains the negated char, the match will get failed.




回答2:


You can try this:

#(\d+)\s*=\s*([a-z0-9]+)\s*\([^;]*\);

Your updated link



来源:https://stackoverflow.com/questions/41715407/regex-includes-two-matches-in-first-match

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!