C#: Removing common invalid characters from a string: improve this algorithm

前端未结

关注

 9  878

Consider the requirement to strip invalid characters from a string. The characters just need to be removed and replace with blank or string.Empty.

相关标签:

9条回答

离开以前

2020-12-29 07:30

if you still want to do it in a LINQy way:

public static string CleanUp(this string orig)
{
    var badchars = new HashSet<char>() { '!', '@', '#', '$', '%', '_' };

    return new string(orig.Where(c => !badchars.Contains(c)).ToArray());
}

0 讨论(0)

借酒劲吻你

2020-12-29 07:32
This is pretty clean. Restricts it to valid characters instead of removing invalid ones. You should split it to constants probably:
```
string clean = new string(@"Sour!ce Str&*(@ing".Where(c => 
@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ,.".Contains(c)).ToArray()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
忘了有多久

2020-12-29 07:33
I don't know about the readability of it, but a regular expression could do what you need it to:
```
someString = Regex.Replace(someString, @"[!@#$%_]", "");
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2020-12-29 07:34
The string class is immutable (although a reference type), hence all its static methods are designed to return a new string variable. Calling someString.Replace without assigning it to anything will not have any effect in your program. - Seems like you fixed this problem.

The main issue with your suggested algorithm is that it repeatedly assigning many new string variables, potentially causing a big performance hit. LINQ doesn't really help things here. (I doesn't make the code significantly shorter and certainly not any more readable, in my opinion.)

Try the following extension method. The key is the use of StringBuilder, which means only one block of memory is assigned for the result during execution.
```
private static readonly HashSet<char> badChars = 
    new HashSet<char> { '!', '@', '#', '$', '%', '_' };

public static string CleanString(this string str)
{
    var result = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        if (!badChars.Contains(str[i]))
            result.Append(str[i]);
    }
    return result.ToString();
}
```
This algorithm also makes use of the .NET 3.5 'HashSet' class to give O(1) look up time for detecting a bad char. This makes the overall algorithm O(n) rather than the O(nm) of your posted one (m being the number of bad chars); it also is lot a better with memory usage, as explained above.
0 讨论(0)
发布评论:

提交评论
- 加载中...
暗喜

2020-12-29 07:34

Something to consider -- if this is for passwords (say), you want to scan for and keep good characters, and assume everything else is bad. Its easier to correctly filter or good things, then try to guess all bad things.

For Each Character If Character is Good -> Keep it (copy to out buffer, whatever.)

jeff

0 讨论(0)
发布评论:

提交评论
- 加载中...

长情又很酷

2020-12-29 07:35

This one is faster than HashSet<T>. Also, if you have to perform this action often, please consider the foundations for this question I asked here.

private static readonly bool[] BadCharValues;

static StaticConstructor()
{
    BadCharValues = new bool[char.MaxValue+1];
    char[] badChars = { '!', '@', '#', '$', '%', '_' };
    foreach (char c in badChars)
        BadCharValues[c] = true;
}

public static string CleanString(string str)
{
    var result = new StringBuilder(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        if (!BadCharValues[str[i]])
            result.Append(str[i]);
    }
    return result.ToString();
}

0 讨论(0)

1 2 下一页