For example, a user entered \"I love this post!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\"
the consecutive duplicate exclamation mark \"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!\" should b
The following regular expression would detect repeating chars. You could up the number or limit this to specific characters to make it more robust.
int threshold = 3;
string stringToMatch = "thisstringrepeatsss";
string pattern = "(\\d)\\" + threshold + " + ";
Regex r = new Regex(pattern);
Match m = r.Match(stringToMatch);
while(m.Success)
{
Console.WriteLine("character passes threshold " + m.ToString());
m = m.NextMatch();
}
Here's and example of a function that searches for a sequence of consecutive chars of a specified length and also ignores white space characters:
public static bool HasConsecutiveChars(string source, int sequenceLength)
{
if (string.IsNullOrEmpty(source))
return false;
if (source.Length == 1)
return false;
int charCount = 1;
for (int i = 0; i < source.Length - 1; i++)
{
char c = source[i];
if (Char.IsWhiteSpace(c))
continue;
if (c == source[i+1])
{
charCount++;
if (charCount >= sequenceLength)
return true;
}
else
charCount = 1;
}
return false;
}
Edit fixed range bug :/
Use LINQ! (For everything, not just this)
string test = "aabb";
return test.Where((item, index) => index > 0 && item.Equals(test.ElementAt(index)));
// returns "abb", where each of these items has the previous letter before it
OR
string test = "aabb";
return test.Where((item, index) => index > 0 && item.Equals(test.ElementAt(index))).Any();
// returns true
The better way i my opinion is create a array, each element in array is responsible for one character pair on string next to each other, eg first aa, bb, cc, dd. This array construct with 0 on each element.
Solve of this problem is a for on this string and update array values. You can next analyze this array for what you want.
Example: For string: bbaaaccccdab, your result array would be { 2, 1, 3 }, because 'aa' can find 2 times, 'bb' can find one time (at start of string), 'cc' can find three times.
Why 'cc' three times? Because 'cc'cc & c'cc'c & cc'cc'.
Here is a quick solution I crafted with some extra duplicates thrown in for good measure. As others pointed out in the comments, some duplicates are going to be completely legitimate, so you may want to narrow your criteria to punctuation instead of mere characters.
string input = "I loove this post!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!aa";
int index = -1;
int count =1;
List<string> dupes = new List<string>();
for (int i = 0; i < input.Length-1; i++)
{
if (input[i] == input[i + 1])
{
if (index == -1)
index = i;
count++;
}
else if (index > -1)
{
dupes.Add(input.Substring(index, count));
index = -1;
count = 1;
}
}
if (index > -1)
{
dupes.Add(input.Substring(index, count));
}
Can be done in O(n)
easily: for each character, if the previous character is the same as the current, increment a temporary count. If it's different, reset your temporary count. At each step, update your global if needed.
For abbccc
you get:
a => temp = 1, global = 1
b => temp = 1, global = 1
b => temp = 2, global = 2
c => temp = 1, global = 2
c => temp = 2, global = 2
c => temp = 3, global = 3
=> c appears three times. Extend it to get the position, then you should be able to print the "ccc" substring.
You can extend this to give you the starting position fairly easily, I'll leave that to you.