I am writing an application that reads some data from a simple text file. The data files, that I am interested in, have lines in the following form:
Mem(100) = 1
First of all, remember to #include <regex>
.
C++ std::regex_match
works like the regular expressions in other languages.
Let's start with a simple example:
std::string str = "Mem(100)=120";
std::regex regex("^Mem\\([0-9]+\\)=[0-9]+$");
std::cout << std::regex_match(str, regex) << std::endl;
In this case, our regex is ^Mem\([0-9]+\)=[0-9]+$
.
Let's take a look at what it does:
^
at the beginning tells C++ this is where the line starts, so AMem(1)=2
should not match.$
at the end tells C++ this is where the line ends, so Mem(1)=2x
should not match.\\(
is a literal (
character. (
has a very special meaning in regular expressions, so we escape it \(
. However, the \
character has a special meaning in C++ strings, so we use \\(
to tell C++ to pass the \(
to the regular expression engine.[0-9]
matches a digit. \\d
should also work, but then again maybe not.[0-9]+
means at least one digit. If Mem()
is acceptable, then use [0-9]*
instead.As you can see, this is just like the regular expressions you'd find in other languages (such as Java or C# ).
Now, to consider whitespace, use std::regex regex("^\\s*Mem\\([0-9]+\\)\\s*=\\s*[0-9]+\\s*$");
Note that \s
includes \t
, so no need to specify both. If it didn't, you'd use (\s|\t)
or [\s\t]
, not (\s,\t)
.
Finally, to include float numbers, we first need to think if Mem(1) = 1.
(that is, a dot without a number after it) is acceptable.
If it isn't, then the .23
in 1.23
is optional. In regexes, we use ?
to indicate that.
std::regex regex("^[\\s]*Mem\\([0-9]+\\)\\s*=\\s*[0-9]+(\\.[0-9]+)?\\s*$");
Note that we use \.
instead of just .
. .
has a special meaning in regular expressions - it matches any character - so we need to escape it.
If you have a compiler that supports raw strings (e.g. Visual Studio 2013, GCC 4.5, Clang 3.0), you can simplify the regex string:
std::regex regex(R"(^[\s]*Mem\([0-9]+\)\s*=\s*[0-9]+(\.[0-9]+)?\s*$)")
To extract information about the matched string, you can use std::smatch
and groups.
Let's start with a small change:
std::string str = " Mem(100)=120";
std::regex regex("^[\\s]*Mem\\(([0-9]+)\\)\\s*=\\s*([0-9]+(\\.[0-9]+)?)\\s*$");
std::smatch m;
std::cout << std::regex_match(str, m, regex) << std::endl;
Note three things:
smatch
. This class stores extra result info about the match.[0-9]*
. This defines a group. Groups tell the regex engine to keep track of whatever is within them.Very importantly the parenthesis that define groups are NOT escaped since we don't want them to match actual parenthesis characters. We actually want the special regex meaning.
Now that we have the groups, we can use them:
for (auto result : m) {
std::cout << result << std::endl;
}
This will first print the whole string, then the number in Mem()
, then the final number.
In other words, m[0]
gives us the whole match, m[1]
gives us the first group, m[2]
gives us the second group and m[3]
would give us the third group if we had one.