I need a robust and simple way to remove illegal path and file characters from a simple string. I\'ve used the below code but it doesn\'t seem to do anything, what am I miss
The best way to remove illegal character from user input is to replace illegal character using Regex class, create method in code behind or also it validate at client side using RegularExpression control.
public string RemoveSpecialCharacters(string str)
{
return Regex.Replace(str, "[^a-zA-Z0-9_]+", "_", RegexOptions.Compiled);
}
OR
<asp:RegularExpressionValidator ID="regxFolderName"
runat="server"
ErrorMessage="Enter folder name with a-z A-Z0-9_"
ControlToValidate="txtFolderName"
Display="Dynamic"
ValidationExpression="^[a-zA-Z0-9_]*$"
ForeColor="Red">
Most solutions above combine illegal chars for both path and filename which is wrong (even when both calls currently return the same set of chars). I would first split the path+filename in path and filename, then apply the appropriate set to either if them and then combine the two again.
wvd_vegt
Throw an exception.
if ( fileName.IndexOfAny(Path.GetInvalidFileNameChars()) > -1 )
{
throw new ArgumentException();
}
I've rolled my own method, which seems to be a lot faster of other posted here (especially the regex which is so sloooooow) but I didn't tested all methods posted.
https://dotnetfiddle.net/haIXiY
The first method (mine) and second (also mine, but old one) also do an added check on backslashes, so the benchmark are not perfect, but anyways it's just to give you an idea.
Result on my laptop (for 100 000 iterations):
StringHelper.RemoveInvalidCharacters 1: 451 ms
StringHelper.RemoveInvalidCharacters 2: 7139 ms
StringHelper.RemoveInvalidCharacters 3: 2447 ms
StringHelper.RemoveInvalidCharacters 4: 3733 ms
StringHelper.RemoveInvalidCharacters 5: 11689 ms (==> Regex!)
The fastest method:
public static string RemoveInvalidCharacters(string content, char replace = '_', bool doNotReplaceBackslashes = false)
{
if (string.IsNullOrEmpty(content))
return content;
var idx = content.IndexOfAny(InvalidCharacters);
if (idx >= 0)
{
var sb = new StringBuilder(content);
while (idx >= 0)
{
if (sb[idx] != '\\' || !doNotReplaceBackslashes)
sb[idx] = replace;
idx = content.IndexOfAny(InvalidCharacters, idx+1);
}
return sb.ToString();
}
return content;
}
Method doesn't compile "as is" dur to InvalidCharacters
property, check the fiddle for full code
The original question asked to "remove illegal characters":
public string RemoveInvalidChars(string filename)
{
return string.Concat(filename.Split(Path.GetInvalidFileNameChars()));
}
You may instead want to replace them:
public string ReplaceInvalidChars(string filename)
{
return string.Join("_", filename.Split(Path.GetInvalidFileNameChars()));
}
This answer was on another thread by Ceres, I really like it neat and simple.
These are all great solutions, but they all rely on Path.GetInvalidFileNameChars
, which may not be as reliable as you'd think. Notice the following remark in the MSDN documentation on Path.GetInvalidFileNameChars:
The array returned from this method is not guaranteed to contain the complete set of characters that are invalid in file and directory names. The full set of invalid characters can vary by file system. For example, on Windows-based desktop platforms, invalid path characters might include ASCII/Unicode characters 1 through 31, as well as quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0) and tab (\t).
It's not any better with Path.GetInvalidPathChars method. It contains the exact same remark.