I need a robust and simple way to remove illegal path and file characters from a simple string. I\'ve used the below code but it doesn\'t seem to do anything, what am I miss
For starters, Trim only removes characters from the beginning or end of the string. Secondly, you should evaluate if you really want to remove the offensive characters, or fail fast and let the user know their filename is invalid. My choice is the latter, but my answer should at least show you how to do things the right AND wrong way:
StackOverflow question showing how to check if a given string is a valid file name. Note you can use the regex from this question to remove characters with a regular expression replacement (if you really need to do this).
I think it is much easier to validate using a regex and specifiing which characters are allowed, instead of trying to check for all bad characters. See these links: http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html
Also, do a search for "regular expression editor"s, they help a lot. There are some around which even output the code in c# for you.
This will do want you want, and avoid collisions
static string SanitiseFilename(string key)
{
var invalidChars = Path.GetInvalidFileNameChars();
var sb = new StringBuilder();
foreach (var c in key)
{
var invalidCharIndex = -1;
for (var i = 0; i < invalidChars.Length; i++)
{
if (c == invalidChars[i])
{
invalidCharIndex = i;
}
}
if (invalidCharIndex > -1)
{
sb.Append("_").Append(invalidCharIndex);
continue;
}
if (c == '_')
{
sb.Append("__");
continue;
}
sb.Append(c);
}
return sb.ToString();
}
You can remove illegal chars using Linq like this:
var invalidChars = Path.GetInvalidFileNameChars();
var invalidCharsRemoved = stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray();
EDIT
This is how it looks with the required edit mentioned in the comments:
var invalidChars = Path.GetInvalidFileNameChars();
string invalidCharsRemoved = new string(stringWithInvalidChars
.Where(x => !invalidChars.Contains(x))
.ToArray());
I use regular expressions to achieve this. First, I dynamically build the regex.
string regex = string.Format(
"[{0}]",
Regex.Escape(new string(Path.GetInvalidFileNameChars())));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);
Then I just call removeInvalidChars.Replace to do the find and replace. This can obviously be extended to cover path chars as well.
If you remove or replace with a single character the invalid characters, you can have collisions:
<abc -> abc
>abc -> abc
Here is a simple method to avoid this:
public static string ReplaceInvalidFileNameChars(string s)
{
char[] invalidFileNameChars = System.IO.Path.GetInvalidFileNameChars();
foreach (char c in invalidFileNameChars)
s = s.Replace(c.ToString(), "[" + Array.IndexOf(invalidFileNameChars, c) + "]");
return s;
}
The result:
<abc -> [1]abc
>abc -> [2]abc