I need a robust and simple way to remove illegal path and file characters from a simple string. I\'ve used the below code but it doesn\'t seem to do anything, what am I miss
I use Linq to clean up filenames. You can easily extend this to check for valid paths as well.
private static string CleanFileName(string fileName)
{
return Path.GetInvalidFileNameChars().Aggregate(fileName, (current, c) => current.Replace(c.ToString(), string.Empty));
}
Some comments indicate this method is not working for them so I've included a link to a DotNetFiddle snippet so you may validate the method.
https://dotnetfiddle.net/nw1SWY
Or you can just do
[YOUR STRING].Replace('\\', ' ').Replace('/', ' ').Replace('"', ' ').Replace('*', ' ').Replace(':', ' ').Replace('?', ' ').Replace('<', ' ').Replace('>', ' ').Replace('|', ' ').Trim();
Try something like this instead;
string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string invalid = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
foreach (char c in invalid)
{
illegal = illegal.Replace(c.ToString(), "");
}
But I have to agree with the comments, I'd probably try to deal with the source of the illegal paths, rather than try to mangle an illegal path into a legitimate but probably unintended one.
Edit: Or a potentially 'better' solution, using Regex's.
string illegal = "\"M\"\\a/ry/ h**ad:>> a\\/:*?\"| li*tt|le|| la\"mb.?";
string regexSearch = new string(Path.GetInvalidFileNameChars()) + new string(Path.GetInvalidPathChars());
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(regexSearch)));
illegal = r.Replace(illegal, "");
Still, the question begs to be asked, why you're doing this in the first place.
I wrote this monster for fun, it lets you roundtrip:
public static class FileUtility
{
private const char PrefixChar = '%';
private static readonly int MaxLength;
private static readonly Dictionary<char,char[]> Illegals;
static FileUtility()
{
List<char> illegal = new List<char> { PrefixChar };
illegal.AddRange(Path.GetInvalidFileNameChars());
MaxLength = illegal.Select(x => ((int)x).ToString().Length).Max();
Illegals = illegal.ToDictionary(x => x, x => ((int)x).ToString("D" + MaxLength).ToCharArray());
}
public static string FilenameEncode(string s)
{
var builder = new StringBuilder();
char[] replacement;
using (var reader = new StringReader(s))
{
while (true)
{
int read = reader.Read();
if (read == -1)
break;
char c = (char)read;
if(Illegals.TryGetValue(c,out replacement))
{
builder.Append(PrefixChar);
builder.Append(replacement);
}
else
{
builder.Append(c);
}
}
}
return builder.ToString();
}
public static string FilenameDecode(string s)
{
var builder = new StringBuilder();
char[] buffer = new char[MaxLength];
using (var reader = new StringReader(s))
{
while (true)
{
int read = reader.Read();
if (read == -1)
break;
char c = (char)read;
if (c == PrefixChar)
{
reader.Read(buffer, 0, MaxLength);
var encoded =(char) ParseCharArray(buffer);
builder.Append(encoded);
}
else
{
builder.Append(c);
}
}
}
return builder.ToString();
}
public static int ParseCharArray(char[] buffer)
{
int result = 0;
foreach (char t in buffer)
{
int digit = t - '0';
if ((digit < 0) || (digit > 9))
{
throw new ArgumentException("Input string was not in the correct format");
}
result *= 10;
result += digit;
}
return result;
}
}
Here is my small contribution. A method to replace within the same string without creating new strings or stringbuilders. It's fast, easy to understand and a good alternative to all mentions in this post.
private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}
private static string ReplaceInvalidChars(string fileName, string newValue)
{
char newChar = newValue[0];
char[] chars = fileName.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
char c = chars[i];
if (InvalidCharsHash.Contains(c))
chars[i] = newChar;
}
return new string(chars);
}
You can call it like this:
string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
string legal = ReplaceInvalidChars(illegal);
and returns:
_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
It's worth to note that this method will always replace invalid chars with a given value, but will not remove them. If you want to remove invalid chars, this alternative will do the trick:
private static string RemoveInvalidChars(string fileName, string newValue)
{
char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
bool remove = newChar == char.MinValue;
char[] chars = fileName.ToCharArray();
char[] newChars = new char[chars.Length];
int i2 = 0;
for (int i = 0; i < chars.Length; i++)
{
char c = chars[i];
if (InvalidCharsHash.Contains(c))
{
if (!remove)
newChars[i2++] = newChar;
}
else
newChars[i2++] = c;
}
return new string(newChars, 0, i2);
}
BENCHMARK
I executed timed test runs with most methods found in this post, if performance is what you are after. Some of these methods don't replace with a given char, since OP was asking to clean the string. I added tests replacing with a given char, and some others replacing with an empty char if your intended scenario only needs to remove the unwanted chars. Code used for this benchmark is at the end, so you can run your own tests.
Note: Methods Test1
and Test2
are both proposed in this post.
First Run
replacing with '_', 1000000 iterations
Results:
============Test1===============
Elapsed=00:00:01.6665595
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test2===============
Elapsed=00:00:01.7526835
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test3===============
Elapsed=00:00:05.2306227
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test4===============
Elapsed=00:00:14.8203696
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test5===============
Elapsed=00:00:01.8273760
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test6===============
Elapsed=00:00:05.4249985
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test7===============
Elapsed=00:00:07.5653833
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test8===============
Elapsed=00:12:23.1410106
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test9===============
Elapsed=00:00:02.1016708
Result=_M ____a_ry_ h__ad___ a_________ li_tt_le__ la_mb._
============Test10===============
Elapsed=00:00:05.0987225
Result=M ary had a little lamb.
============Test11===============
Elapsed=00:00:06.8004289
Result=M ary had a little lamb.
Second Run
removing invalid chars, 1000000 iterations
Note: Test1 will not remove, only replace.
Results:
============Test1===============
Elapsed=00:00:01.6945352
Result= M a ry h ad a li tt le la mb.
============Test2===============
Elapsed=00:00:01.4798049
Result=M ary had a little lamb.
============Test3===============
Elapsed=00:00:04.0415688
Result=M ary had a little lamb.
============Test4===============
Elapsed=00:00:14.3397960
Result=M ary had a little lamb.
============Test5===============
Elapsed=00:00:01.6782505
Result=M ary had a little lamb.
============Test6===============
Elapsed=00:00:04.9251707
Result=M ary had a little lamb.
============Test7===============
Elapsed=00:00:07.9562379
Result=M ary had a little lamb.
============Test8===============
Elapsed=00:12:16.2918943
Result=M ary had a little lamb.
============Test9===============
Elapsed=00:00:02.0770277
Result=M ary had a little lamb.
============Test10===============
Elapsed=00:00:05.2721232
Result=M ary had a little lamb.
============Test11===============
Elapsed=00:00:05.2802903
Result=M ary had a little lamb.
BENCHMARK RESULTS
Methods Test1
, Test2
and Test5
are the fastest. Method Test8
is the slowest.
CODE
Here's the complete code of the benchmark:
private static HashSet<char> _invalidCharsHash;
private static HashSet<char> InvalidCharsHash
{
get { return _invalidCharsHash ?? (_invalidCharsHash = new HashSet<char>(Path.GetInvalidFileNameChars())); }
}
private static string _invalidCharsValue;
private static string InvalidCharsValue
{
get { return _invalidCharsValue ?? (_invalidCharsValue = new string(Path.GetInvalidFileNameChars())); }
}
private static char[] _invalidChars;
private static char[] InvalidChars
{
get { return _invalidChars ?? (_invalidChars = Path.GetInvalidFileNameChars()); }
}
static void Main(string[] args)
{
string testPath = "\"M <>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
int max = 1000000;
string newValue = "";
TimeBenchmark(max, Test1, testPath, newValue);
TimeBenchmark(max, Test2, testPath, newValue);
TimeBenchmark(max, Test3, testPath, newValue);
TimeBenchmark(max, Test4, testPath, newValue);
TimeBenchmark(max, Test5, testPath, newValue);
TimeBenchmark(max, Test6, testPath, newValue);
TimeBenchmark(max, Test7, testPath, newValue);
TimeBenchmark(max, Test8, testPath, newValue);
TimeBenchmark(max, Test9, testPath, newValue);
TimeBenchmark(max, Test10, testPath, newValue);
TimeBenchmark(max, Test11, testPath, newValue);
Console.Read();
}
private static void TimeBenchmark(int maxLoop, Func<string, string, string> func, string testString, string newValue)
{
var sw = new Stopwatch();
sw.Start();
string result = string.Empty;
for (int i = 0; i < maxLoop; i++)
result = func?.Invoke(testString, newValue);
sw.Stop();
Console.WriteLine($"============{func.Method.Name}===============");
Console.WriteLine("Elapsed={0}", sw.Elapsed);
Console.WriteLine("Result={0}", result);
Console.WriteLine("");
}
private static string Test1(string fileName, string newValue)
{
char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
char[] chars = fileName.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
if (InvalidCharsHash.Contains(chars[i]))
chars[i] = newChar;
}
return new string(chars);
}
private static string Test2(string fileName, string newValue)
{
char newChar = string.IsNullOrEmpty(newValue) ? char.MinValue : newValue[0];
bool remove = newChar == char.MinValue;
char[] chars = fileName.ToCharArray();
char[] newChars = new char[chars.Length];
int i2 = 0;
for (int i = 0; i < chars.Length; i++)
{
char c = chars[i];
if (InvalidCharsHash.Contains(c))
{
if (!remove)
newChars[i2++] = newChar;
}
else
newChars[i2++] = c;
}
return new string(newChars, 0, i2);
}
private static string Test3(string filename, string newValue)
{
foreach (char c in InvalidCharsValue)
{
filename = filename.Replace(c.ToString(), newValue);
}
return filename;
}
private static string Test4(string filename, string newValue)
{
Regex r = new Regex(string.Format("[{0}]", Regex.Escape(InvalidCharsValue)));
filename = r.Replace(filename, newValue);
return filename;
}
private static string Test5(string filename, string newValue)
{
return string.Join(newValue, filename.Split(InvalidChars));
}
private static string Test6(string fileName, string newValue)
{
return InvalidChars.Aggregate(fileName, (current, c) => current.Replace(c.ToString(), newValue));
}
private static string Test7(string fileName, string newValue)
{
string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
return Regex.Replace(fileName, regex, newValue, RegexOptions.Compiled);
}
private static string Test8(string fileName, string newValue)
{
string regex = string.Format("[{0}]", Regex.Escape(InvalidCharsValue));
Regex removeInvalidChars = new Regex(regex, RegexOptions.Singleline | RegexOptions.Compiled | RegexOptions.CultureInvariant);
return removeInvalidChars.Replace(fileName, newValue);
}
private static string Test9(string fileName, string newValue)
{
StringBuilder sb = new StringBuilder(fileName.Length);
bool changed = false;
for (int i = 0; i < fileName.Length; i++)
{
char c = fileName[i];
if (InvalidCharsHash.Contains(c))
{
changed = true;
sb.Append(newValue);
}
else
sb.Append(c);
}
if (sb.Length == 0)
return newValue;
return changed ? sb.ToString() : fileName;
}
private static string Test10(string fileName, string newValue)
{
if (!fileName.Any(c => InvalidChars.Contains(c)))
{
return fileName;
}
return new string(fileName.Where(c => !InvalidChars.Contains(c)).ToArray());
}
private static string Test11(string fileName, string newValue)
{
string invalidCharsRemoved = new string(fileName
.Where(x => !InvalidChars.Contains(x))
.ToArray());
return invalidCharsRemoved;
}
For file names:
var cleanFileName = string.Join("", fileName.Split(Path.GetInvalidFileNameChars()));
For full paths:
var cleanPath = string.Join("", path.Split(Path.GetInvalidPathChars()));
Note that if you intend to use this as a security feature, a more robust approach would be to expand all paths and then verify that the user supplied path is indeed a child of a directory the user should have access to.