Inconsistent behavior of std::regex

末鹿安然 提交于 2019-12-10 18:23:40

问题


I have the following issue where:

  • std::regex behaves differently if I pass the result of boost::filesystem::path::string() vs storing the result in a intermediate string variable. The first will return a match that is truncated and which is later not accepted by std::stoull (throws invalid_argument exception) while the second works perfectly.

See the following commands that explain more the issue:

[nix-shell:~]$ ls -l foo
total 0
-rw-r--r-- 1 amine users 0 Aug 10 16:55 008
-rw-r--r-- 1 amine users 0 Aug 10 15:47 2530047398992289207

[nix-shell:~]$ cat test-1.cpp
#include <iostream>
#include <regex>
#include <string>
#include <boost/filesystem.hpp>

int main() {
  std::regex expression{R"(([0-9]+))"};
  boost::filesystem::path cacheDir("/home/amine/foo");
  for (const auto& entry : boost::filesystem::directory_iterator{cacheDir})
  {
      std::smatch match;
      auto result = std::regex_match(entry.path().filename().string(), match, expression);
      std::cout << "Result: " << result << std::endl
        << "Length: " << match[1].length() << std::endl
        << "Match: " << match[1] << std::endl
        << "Filename: " << entry.path().filename().string() << std::endl
        << std::endl;

      std::stoull(match[1], 0);
  }
  return 0;
}
[nix-shell:~]$ g++ -o test1 test-1.cpp -lboost_filesystem -O0 -g

[nix-shell:~]$ ./test1
Result: 1
Length: 19
Match: 98992289207
Filename: 2530047398992289207

terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoull
Aborted

[nix-shell:~]$ cat test-2.cpp
#include <iostream>
#include <regex>
#include <string>
#include <boost/filesystem.hpp>

int main() {
  std::regex expression{R"(([0-9]+))"};
  boost::filesystem::path cacheDir("/home/amine/foo");
  for (const auto& entry : boost::filesystem::directory_iterator{cacheDir})
  {
      std::smatch match;
      auto what = entry.path().filename().string();
      auto result = std::regex_match(what, match, expression);
      std::cout << "Result: " << result << std::endl
        << "Length: " << match[1].length() << std::endl
        << "Match: " << match[1] << std::endl
        << "Filename: " << entry.path().filename().string() << std::endl
        << std::endl;

      std::stoull(match[1], 0);
  }
  return 0;
}
[nix-shell:~]$ g++ -o test2 test-2.cpp -lboost_filesystem -O0 -g

[nix-shell:~]$ ./test2
Result: 1
Length: 19
Match: 2530047398992289207
Filename: 2530047398992289207

Result: 1
Length: 3
Match: 008
Filename: 008

So my questions are:

  • Why is the result of std::regex truncated when directly using boost::filesystem::path::string().
  • And let's assume it's fine if the result in the match variable is truncated, why would std::stoull throw an exception with it?

回答1:


You have unfortunately have fallen into a trap. In C++11 the overload of std::regex_match you are calling is

template< class STraits, class SAlloc, 
          class Alloc, class CharT, class Traits >
bool regex_match( const std::basic_string<CharT,STraits,SAlloc>& s,
                  std::match_results<
                      typename std::basic_string<CharT,STraits,SAlloc>::const_iterator,
                      Alloc
                  >& m,
                  const std::basic_regex<CharT,Traits>& e,
                  std::regex_constants::match_flag_type flags = 
                      std::regex_constants::match_default );

and since it takes a const& to a std::string you can pass it a temporary string. Unfortunately for you std::regex_match is not designed to work with a temporary string. This is why you get unexpected behavior. You try to reference data that has gone out of scope.

C++14 fixed this by adding

template< class STraits, class SAlloc, 
          class Alloc, class CharT, class Traits >
bool regex_match( const std::basic_string<CharT,STraits,SAlloc>&&,
                  std::match_results<
                      typename std::basic_string<CharT,STraits,SAlloc>::const_iterator,
                      Alloc
                  >&,
                  const std::basic_regex<CharT,Traits>&,
                  std::regex_constants::match_flag_type flags = 
                      std::regex_constants::match_default ) = delete;

so you could no longer pass a temporary string.

If you cannot use C++14 then you will need to make sure you do not pass a temporary string to std::regex_match



来源:https://stackoverflow.com/questions/51792370/inconsistent-behavior-of-stdregex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!