This is an oddball issue I\'ve encountered (and probably have seen before but never paid attention to).
Here\'s the gist of the code:
my $url = \'htt
The gist of it is that matches done with /g save the position of the last match, so that the next time that string is matched, the regex will start from there. In scalar context, this is generally done to get multiple successive matches in a while loop; In list context, /g returns all the matched (but not overlapping) results. You can read more about this on perlretut, under Global Matching, and on perlop, under Regexp-Quote-Like-Operators.
You can see the current position with the pos function. You can also set the position by using pos as an lvalue: pos($string) = 0;
will reset the position to the beginning of the string.
There isn't much reason to use /g in scalar context outside of a loop, as you can get the exact same functionality using the \G assertion.
..of course, then nobody remembers how \G works and you are back at square one, but that's another topic.
The /g
modifier, in scalar context, doesn't do what you think it does. Get rid of it.
As perlretut explains, /g
in scalar context cycles over each match in turn. It's designed for use in a loop, like so:
while ($str =~ /pattern/g) {
# match on each occurence of 'pattern' in $str in turn
}
The other way to use /g
is in list context:
my @results = $str =~ /pattern/g; # collect each occurence of 'pattern' within $str into @results
If you're using /g
in scalar context and you're not iterating over it, you're almost certainly not using it right.
m//g does not reset the position. You need to do that manually. See this for reference: http://perldoc.perl.org/functions/pos.html
I believe you just set pos to 0 or undef and it will work.
To quote perlop on Regexp Quote Like Operators:
In scalar context, each execution of
m//g
finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using thepos()
function; see pos. A failed match normally resets the search position to the beginning of the string, but you can avoid that by adding the/c
modifier (e.g.m//gc
). Modifying the target string also resets the search position.
So in scalar context (which you're using), /g
does not mean "search from the beginning", it means "search starting from the string's pos". "Search from the beginning" is the default (without /g
).
/g
is normally used when you want to find all matches for a regex in a string, instead of just the first match. In list context, it does that by returning a list of all the matches. In scalar context it does that by starting the search from where the previous search left off (usually done in a loop).