Difference of regex in python and google app script (backend engine related?)

…衆ロ難τιáo~ 提交于 2019-12-01 12:00:28

问题


I tried the same regular expression both in python (3.6, jupyter notebook) and Google app script, but it seems like "non-capturing group" is not working in the app script case.

# python script:
import re
text='<a class=""email"" href=""mailto:SOisAwesome@hello.edu"">'
regex='(?:<a class=""email"" href=""mailto:)(.+?@hello\.edu)(?:"">)'
match=re.search(regex,text)
print(match.group(1))
# result is 'SOisAwesome@hello.edu'

// Google app script
function myFunction() {
  string='<a class=""email"" href=""mailto:SOisAwesome@hello.edu"">'
  regex=new RegExp('(?:<a class=""email"" href=""mailto:)(.+?@hello\.edu)(?:"">)')
  Match=regex.exec(string)
  Logger.log(Match[1])
  // result is 'a class=""email"" href=""mailto:SOisAwesome@hello.edu'
}

If I am not mistaken, regular expression engine in Google app script should support non-capturing groups (referring to https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines, I suppose I should be looking at "JavaScript (ECMAScript)" and "Shy groups"?), can anyone explain what I'm missing here?

Thanks in advance!


回答1:


First of all, you need to use \\ before . in the GAS regex declaration, as the literal backslash forms a regex escape sequence.

Now, it seems that GAS non-capturing group implementation is buggy.

If you run your regex in GAS and print the Match object, you will see

[18-01-26 08:49:07:198 CET] [<a class=""email"" href=""mailto:SOisAwesome@hello.edu"">, 
a class=""email"" href=""mailto:SOisAwesome@hello.edu, "">]

That means, the non-capturing group got "merged" with the first capturing group skipping the first char.

Here are some more experiments:

Logger.log(new RegExp("(?:;\\w+):(\\d+)").exec(";er:34")); // = null, expected [;er:34, 34]
Logger.log(new RegExp("(?:e\\w+):(\\d+)").exec(";er:34")); // = null, expected [er:34, 34]
Logger.log(new RegExp("(?:\\w+):(\\d+)").exec(";er:34"));  // =  [er:34, 34], as expected

To fix the issue, you may remove the non-capturing parentheses, as \d = (?:\d).



来源:https://stackoverflow.com/questions/48452959/difference-of-regex-in-python-and-google-app-script-backend-engine-related

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!