问题
I have the following issue where:
std::regex
behaves differently if I pass the result ofboost::filesystem::path::string()
vs storing the result in a intermediate string variable. The first will return a match that is truncated and which is later not accepted bystd::stoull
(throws invalid_argument exception) while the second works perfectly.
See the following commands that explain more the issue:
[nix-shell:~]$ ls -l foo
total 0
-rw-r--r-- 1 amine users 0 Aug 10 16:55 008
-rw-r--r-- 1 amine users 0 Aug 10 15:47 2530047398992289207
[nix-shell:~]$ cat test-1.cpp
#include <iostream>
#include <regex>
#include <string>
#include <boost/filesystem.hpp>
int main() {
std::regex expression{R"(([0-9]+))"};
boost::filesystem::path cacheDir("/home/amine/foo");
for (const auto& entry : boost::filesystem::directory_iterator{cacheDir})
{
std::smatch match;
auto result = std::regex_match(entry.path().filename().string(), match, expression);
std::cout << "Result: " << result << std::endl
<< "Length: " << match[1].length() << std::endl
<< "Match: " << match[1] << std::endl
<< "Filename: " << entry.path().filename().string() << std::endl
<< std::endl;
std::stoull(match[1], 0);
}
return 0;
}
[nix-shell:~]$ g++ -o test1 test-1.cpp -lboost_filesystem -O0 -g
[nix-shell:~]$ ./test1
Result: 1
Length: 19
Match: 98992289207
Filename: 2530047398992289207
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoull
Aborted
[nix-shell:~]$ cat test-2.cpp
#include <iostream>
#include <regex>
#include <string>
#include <boost/filesystem.hpp>
int main() {
std::regex expression{R"(([0-9]+))"};
boost::filesystem::path cacheDir("/home/amine/foo");
for (const auto& entry : boost::filesystem::directory_iterator{cacheDir})
{
std::smatch match;
auto what = entry.path().filename().string();
auto result = std::regex_match(what, match, expression);
std::cout << "Result: " << result << std::endl
<< "Length: " << match[1].length() << std::endl
<< "Match: " << match[1] << std::endl
<< "Filename: " << entry.path().filename().string() << std::endl
<< std::endl;
std::stoull(match[1], 0);
}
return 0;
}
[nix-shell:~]$ g++ -o test2 test-2.cpp -lboost_filesystem -O0 -g
[nix-shell:~]$ ./test2
Result: 1
Length: 19
Match: 2530047398992289207
Filename: 2530047398992289207
Result: 1
Length: 3
Match: 008
Filename: 008
So my questions are:
- Why is the result of
std::regex
truncated when directly usingboost::filesystem::path::string()
. - And let's assume it's fine if the result in the match variable is truncated, why would
std::stoull
throw an exception with it?
回答1:
You have unfortunately have fallen into a trap. In C++11 the overload of std::regex_match
you are calling is
template< class STraits, class SAlloc,
class Alloc, class CharT, class Traits >
bool regex_match( const std::basic_string<CharT,STraits,SAlloc>& s,
std::match_results<
typename std::basic_string<CharT,STraits,SAlloc>::const_iterator,
Alloc
>& m,
const std::basic_regex<CharT,Traits>& e,
std::regex_constants::match_flag_type flags =
std::regex_constants::match_default );
and since it takes a const&
to a std::string
you can pass it a temporary string. Unfortunately for you std::regex_match
is not designed to work with a temporary string. This is why you get unexpected behavior. You try to reference data that has gone out of scope.
C++14 fixed this by adding
template< class STraits, class SAlloc,
class Alloc, class CharT, class Traits >
bool regex_match( const std::basic_string<CharT,STraits,SAlloc>&&,
std::match_results<
typename std::basic_string<CharT,STraits,SAlloc>::const_iterator,
Alloc
>&,
const std::basic_regex<CharT,Traits>&,
std::regex_constants::match_flag_type flags =
std::regex_constants::match_default ) = delete;
so you could no longer pass a temporary string.
If you cannot use C++14 then you will need to make sure you do not pass a temporary string to std::regex_match
来源:https://stackoverflow.com/questions/51792370/inconsistent-behavior-of-stdregex