Consider the requirement to strip invalid characters from a string. The characters just need to be removed and replace with blank or string.Empty
.
if you still want to do it in a LINQy way:
public static string CleanUp(this string orig)
{
var badchars = new HashSet<char>() { '!', '@', '#', '$', '%', '_' };
return new string(orig.Where(c => !badchars.Contains(c)).ToArray());
}
This is pretty clean. Restricts it to valid characters instead of removing invalid ones. You should split it to constants probably:
string clean = new string(@"Sour!ce Str&*(@ing".Where(c =>
@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ,.".Contains(c)).ToArray()
I don't know about the readability of it, but a regular expression could do what you need it to:
someString = Regex.Replace(someString, @"[!@#$%_]", "");
The - Seems like you fixed this problem. string
class is immutable (although a reference type), hence all its static methods are designed to return a new string
variable. Calling someString.Replace
without assigning it to anything will not have any effect in your program.
The main issue with your suggested algorithm is that it repeatedly assigning many new string
variables, potentially causing a big performance hit. LINQ doesn't really help things here. (I doesn't make the code significantly shorter and certainly not any more readable, in my opinion.)
Try the following extension method. The key is the use of StringBuilder
, which means only one block of memory is assigned for the result during execution.
private static readonly HashSet<char> badChars =
new HashSet<char> { '!', '@', '#', '$', '%', '_' };
public static string CleanString(this string str)
{
var result = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
if (!badChars.Contains(str[i]))
result.Append(str[i]);
}
return result.ToString();
}
This algorithm also makes use of the .NET 3.5 'HashSet' class to give O(1)
look up time for detecting a bad char. This makes the overall algorithm O(n)
rather than the O(nm)
of your posted one (m
being the number of bad chars); it also is lot a better with memory usage, as explained above.
Something to consider -- if this is for passwords (say), you want to scan for and keep good characters, and assume everything else is bad. Its easier to correctly filter or good things, then try to guess all bad things.
For Each Character If Character is Good -> Keep it (copy to out buffer, whatever.)
jeff
This one is faster than HashSet<T>
. Also, if you have to perform this action often, please consider the foundations for this question I asked here.
private static readonly bool[] BadCharValues;
static StaticConstructor()
{
BadCharValues = new bool[char.MaxValue+1];
char[] badChars = { '!', '@', '#', '$', '%', '_' };
foreach (char c in badChars)
BadCharValues[c] = true;
}
public static string CleanString(string str)
{
var result = new StringBuilder(str.Length);
for (int i = 0; i < str.Length; i++)
{
if (!BadCharValues[str[i]])
result.Append(str[i]);
}
return result.ToString();
}