Is there a built-in mechanism in .NET to match patterns other than Regular Expressions? I\'d like to match using UNIX style (glob) wildcards (* = any number of any characte
Based on previous posts, I threw together a C# class:
using System;
using System.Text.RegularExpressions;
public class FileWildcard
{
Regex mRegex;
public FileWildcard(string wildcard)
{
string pattern = string.Format("^{0}$", Regex.Escape(wildcard)
.Replace(@"\*", ".*").Replace(@"\?", "."));
mRegex = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
}
public bool IsMatch(string filenameToCompare)
{
return mRegex.IsMatch(filenameToCompare);
}
}
Using it would go something like this:
FileWildcard w = new FileWildcard("*.txt");
if (w.IsMatch("Doug.Txt"))
Console.WriteLine("We have a match");
The matching is NOT the same as the System.IO.Directory.GetFiles() method, so don't use them together.
I wrote a solution that does it. It does not depend on any library and it does not support "!" or "[]" operators. It supports the following search patterns:
C:\Logs\*.txt
C:\Logs\**\*P1?\**\asd*.pdf
/// <summary>
/// Finds files for the given glob path. It supports ** * and ? operators. It does not support !, [] or ![] operators
/// </summary>
/// <param name="path">the path</param>
/// <returns>The files that match de glob</returns>
private ICollection<FileInfo> FindFiles(string path)
{
List<FileInfo> result = new List<FileInfo>();
//The name of the file can be any but the following chars '<','>',':','/','\','|','?','*','"'
const string folderNameCharRegExp = @"[^\<\>:/\\\|\?\*" + "\"]";
const string folderNameRegExp = folderNameCharRegExp + "+";
//We obtain the file pattern
string filePattern = Path.GetFileName(path);
List<string> pathTokens = new List<string>(Path.GetDirectoryName(path).Split('\\', '/'));
//We obtain the root path from where the rest of files will obtained
string rootPath = null;
bool containsWildcardsInDirectories = false;
for (int i = 0; i < pathTokens.Count; i++)
{
if (!pathTokens[i].Contains("*")
&& !pathTokens[i].Contains("?"))
{
if (rootPath != null)
rootPath += "\\" + pathTokens[i];
else
rootPath = pathTokens[i];
pathTokens.RemoveAt(0);
i--;
}
else
{
containsWildcardsInDirectories = true;
break;
}
}
if (Directory.Exists(rootPath))
{
//We build the regular expression that the folders should match
string regularExpression = rootPath.Replace("\\", "\\\\").Replace(":", "\\:").Replace(" ", "\\s");
foreach (string pathToken in pathTokens)
{
if (pathToken == "**")
{
regularExpression += string.Format(CultureInfo.InvariantCulture, @"(\\{0})*", folderNameRegExp);
}
else
{
regularExpression += @"\\" + pathToken.Replace("*", folderNameCharRegExp + "*").Replace(" ", "\\s").Replace("?", folderNameCharRegExp);
}
}
Regex globRegEx = new Regex(regularExpression, RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
string[] directories = Directory.GetDirectories(rootPath, "*", containsWildcardsInDirectories ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
foreach (string directory in directories)
{
if (globRegEx.Matches(directory).Count > 0)
{
DirectoryInfo directoryInfo = new DirectoryInfo(directory);
result.AddRange(directoryInfo.GetFiles(filePattern));
}
}
}
return result;
}
Just for the sake of completeness. Since 2016 in dotnet core
there is a new nuget package called Microsoft.Extensions.FileSystemGlobbing
that supports advanced globing paths. (Nuget Package)
some examples might be, searching for wildcard nested folder structures and files which is very common in web development scenarios.
wwwroot/app/**/*.module.js
wwwroot/app/**/*.js
This works somewhat similar with what .gitignore
files use to determine which files to exclude from source control.
https://www.nuget.org/packages/Glob.cs
https://github.com/mganss/Glob.cs
A GNU Glob for .NET.
You can get rid of the package reference after installing and just compile the single Glob.cs source file.
And as it's an implementation of GNU Glob it's cross platform and cross language once you find another similar implementation enjoy!
I don't know if the .NET framework has glob matching, but couldn't you replace the * with .*? and use regexes?
If you use VB.Net, you can use the Like statement, which has Glob like syntax.
http://www.getdotnetcode.com/gdncstore/free/Articles/Intoduction%20to%20the%20VB%20NET%20Like%20Operator.htm