this regular expression fail to parse all valid floating numbers

风格不统一 提交于 2021-02-11 11:32:38

问题


I am trying to find all floating number (could be in exponential forms with -/+ prefix or not). For example, the following is the valid format: -1.2 +1.2 .2 -3 3E4 -3e5 e-5

The source of text contains several numbers separated with space or comma. I need to use regular expression to tell

  1. tell if there is any invalid number (e.g. 1.2 3.2 s3) s3 is not a valid one
  2. list every single valid number

I have no idea how to get (1) done but for (2), I am using boost::regex and the following code

wstring strre("[-+]?\\b[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?\\b");
wstring src("1.2 -3.4 3.2 3 2 1e-3 3e3");
boost::wregex regexp(strre);
boost::match_results<std::wstring::const_iterator> what; 
regex_search(src, what, regexp, boost::match_continuous);
wcout << "RE: " << strre << endl << endl;
wcout << "SOURCE: [" << src << "]" << endl;

for (int i=0; i<what.size(); i++)
  wcout << "OUTPUT: [" << wstring(what[i].first, what[i].second) << "]"<< endl;

But this code only show me the first number (1.2). I also try boost::match_all, boost::match_default, the same result.

ADDITIONAL INFO: Hi all, let's not worry about double backslash issue, it is correctly expressed in my code (because in my testing code, I read the string from a text not by explicit string). Anyway, I modify the code as follow

wstring strre("[-+]?\\b[0-9]*\\.?[0-9]+(?:[eE][-+]?[0-9]+)?\\b");
boost::wregex regexp(strre);
boost::match_results<std::wstring::const_iterator> what; 
wcout << "RE: " << strre << endl << endl;
while (src.length()>0)
{
  wcout << "SOURCE: [" << src << "]" << endl;
  regex_search(src, what, regexp, boost::match_default);
  wcout << "OUTPUT: [" << wstring(what[0].first, what[0].second) << endl;
  src = wstring(what[0].second, src.end());
}

Now, it is correctly show everything single numbers but I have to run regex_search several time due to it only give one number at a time. Well, I just don't understand why regex_search won't give me all results instead. Is that any way to run the search once and get all the results back?


回答1:


You normally have to double-escape backslash things in a C++ string. So your "\." turns into just .. You would need it to be "\\.", etc. Similarly, your "\b" becomes not a word-boundary but rather a literal backspace! Fix the same way: "\\b".

Also, where’s the doc for that strre class? Are you sure it understands the language you are using?

Apparently the new C++ standard has raw string literals. These work like `backticked` strings in Go, or like 'single-quoted' strings or /patterns/ in Perl. See this answer for details.

EDIT

Here’s a somewhat fancier pattern for detecting floating-point literals, but which uses no backslashes:

 [+-]?(?=[.]?[0-9])[0-9]*(?:[.][0-9]*)?(?:[Ee][+-]?[0-9]+)?

Note that it does require lookaheads, which EREs don’t support. You should probably use the PCRE library, which does. Broken down, that’s

[+-]?                   # optional leading sign
(?=[.]?[0-9])           # lookahead for a digit, maybe with an intervening dot
[0-9]*                  # maybe some digits
(?:[.][0-9]*)?          # maybe a (dot plus maybe some digits)
(?:[Ee][+-]?[0-9]+)?    # maybe an exponent, which may have a sign and must have digits

Pattern courtesy of Perl’s Regexp::Common library.



来源:https://stackoverflow.com/questions/9853950/this-regular-expression-fail-to-parse-all-valid-floating-numbers

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!