I am trying to make a regex to get all the possible consecutive 4 digit numbers from a 10 digit number. Like
num = \"2345678901\";
<
Do you absolutely need to use Regex? The same operation can be achieved much more quickly using a simple loop.
private IEnumerable<string> getnums(string num)
{
for (int i = 0; i < num.Length - 3; i++)
{
yield return num.Substring(i, 4);
}
}
private IEnumerable<string> DoIt(string num)
{
var res = Regex.Matches(num, @"(?=(\d{4}))")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
return (IEnumerable<string>)res;
}
On average the simple loop takes about half the time of the RegEx version.
static void Main(string[] args)
{
var num = "2345678901";
Stopwatch timer = new Stopwatch();
timer.Start();
foreach (var number in getnums(num))
{
// Yum yum numbers
}
timer.Stop();
Console.WriteLine(timer.Elapsed.Ticks);
timer.Reset();
timer.Start();
foreach (var number in DoIt(num))
{
// Yum yum numbers
}
timer.Stop();
Console.WriteLine(timer.Elapsed.Ticks);
}
You need to use (?=(\d{4}))
regex to match overlapping matches.
See the regex demo
The regexes you are using are all consuming the 4 digit chunks of text, and thus the overlapping values are not matched. With (?=...)
positive lookahead, you can test each position inside the input string, and capture 4 digit chunks from those positions, without consuming the characters (i.e. without moving the regex engine pointer to the location after these 4 digit chunks).
C# demo:
var data = "2345678901";
var res = Regex.Matches(data, @"(?=(\d{4}))")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
Console.WriteLine(string.Join("\n", res));