I want to include a batch file rename functionality in my application. A user can type a destination filename pattern and (after replacing some wildcards in the pattern) I n
For .Net Frameworks prior to 3.5 this should work:
Regular expression matching should get you some of the way. Here's a snippet using the System.IO.Path.InvalidPathChars
constant;
bool IsValidFilename(string testName)
{
Regex containsABadCharacter = new Regex("["
+ Regex.Escape(System.IO.Path.InvalidPathChars) + "]");
if (containsABadCharacter.IsMatch(testName)) { return false; };
// other checks for UNC, drive-path format, etc
return true;
}
For .Net Frameworks after 3.0 this should work:
http://msdn.microsoft.com/en-us/library/system.io.path.getinvalidpathchars(v=vs.90).aspx
Regular expression matching should get you some of the way. Here's a snippet using the System.IO.Path.GetInvalidPathChars()
constant;
bool IsValidFilename(string testName)
{
Regex containsABadCharacter = new Regex("["
+ Regex.Escape(new string(System.IO.Path.GetInvalidPathChars())) + "]");
if (containsABadCharacter.IsMatch(testName)) { return false; };
// other checks for UNC, drive-path format, etc
return true;
}
Once you know that, you should also check for different formats, eg c:\my\drive
and \\server\share\dir\file.ext
I got this idea from someone. - don't know who. Let the OS do the heavy lifting.
public bool IsPathFileNameGood(string fname)
{
bool rc = Constants.Fail;
try
{
this._stream = new StreamWriter(fname, true);
rc = Constants.Pass;
}
catch (Exception ex)
{
MessageBox.Show(ex.Message, "Problem opening file");
rc = Constants.Fail;
}
return rc;
}
This is what I use:
public static bool IsValidFileName(this string expression, bool platformIndependent)
{
string sPattern = @"^(?!^(PRN|AUX|CLOCK\$|NUL|CON|COM\d|LPT\d|\..*)(\..+)?$)[^\x00-\x1f\\?*:\"";|/]+$";
if (platformIndependent)
{
sPattern = @"^(([a-zA-Z]:|\\)\\)?(((\.)|(\.\.)|([^\\/:\*\?""\|<>\. ](([^\\/:\*\?""\|<>\. ])|([^\\/:\*\?""\|<>]*[^\\/:\*\?""\|<>\. ]))?))\\)*[^\\/:\*\?""\|<>\. ](([^\\/:\*\?""\|<>\. ])|([^\\/:\*\?""\|<>]*[^\\/:\*\?""\|<>\. ]))?$";
}
return (Regex.IsMatch(expression, sPattern, RegexOptions.CultureInvariant));
}
The first pattern creates a regular expression containing the invalid/illegal file names and characters for Windows platforms only. The second one does the same but ensures that the name is legal for any platform.
Windows filenames are pretty unrestrictive, so really it might not even be that much of an issue. The characters that are disallowed by Windows are:
\ / : * ? " < > |
You could easily write an expression to check if those characters are present. A better solution though would be to try and name the files as the user wants, and alert them when a filename doesn't stick.
Also the destination file system is important.
Under NTFS, some files can not be created in specific directories. E.G. $Boot in root
If you're only trying to check if a string holding your file name/path has any invalid characters, the fastest method I've found is to use Split()
to break up the file name into an array of parts wherever there's an invalid character. If the result is only an array of 1, there are no invalid characters. :-)
var nameToTest = "Best file name \"ever\".txt";
bool isInvalidName = nameToTest.Split(System.IO.Path.GetInvalidFileNameChars()).Length > 1;
var pathToTest = "C:\\My Folder <secrets>\\";
bool isInvalidPath = pathToTest.Split(System.IO.Path.GetInvalidPathChars()).Length > 1;
I tried running this and other methods mentioned above on a file/path name 1,000,000 times in LinqPad.
Using Split()
is only ~850ms.
Using Regex("[" + Regex.Escape(new string(System.IO.Path.GetInvalidPathChars())) + "]")
is around 6 seconds.
The more complicated regular expressions fair MUCH worse, as do some of the other options, like using the various methods on the Path
class to get file name and let their internal validation do the job (most likely due to the overhead of exception handling).
Granted it's not very often you need to validation 1 million file names, so a single iteration is fine for most of these methods anyway. But it's still pretty efficient and effective if you're only looking for invalid characters.