A way to use RegEx to find a set of filenames paths in a string

后端 未结 3 1157
长情又很酷
长情又很酷 2021-02-09 18:12

Good morning guys

Is there a good way to use regular expression in C# in order to find all filenames and their paths within a string variable?

For e

相关标签:
3条回答
  • 2021-02-09 18:41

    Here's something I came up with:

    using System;
    using System.Text.RegularExpressions;
    
    public class Test
    {
    
        public static void Main()
        {
            string s = @"Hello John these are the files you have to send us today: 
                C:\projects\orders20101130.docx also we would like you to send 
                C:\some\file.txt, C:\someother.file and d:\some file\with spaces.ext  
    
                Thank you";
    
            Extract(s);
    
        }
    
        private static readonly Regex rx = new Regex
            (@"[a-z]:\\(?:[^\\:]+\\)*((?:[^:\\]+)\.\w+)", RegexOptions.IgnoreCase);
    
        static void Extract(string text)
        {
            MatchCollection matches = rx.Matches(text);
    
            foreach (Match match in matches)
            {
                Console.WriteLine("'{0}'", match.Value);
            }
        }
    
    }
    

    Produces: (see on ideone)

    'C:\projects\orders20101130.docx', file: 'orders20101130.docx'
    'C:\some\file.txt', file: 'file.txt'
    'C:\someother.file', file: 'someother.file'
    'd:\some file\with spaces.ext', file: 'with spaces.ext'
    

    The regex is not extremely robust (it does make a few assumptions) but it worked for your examples as well.


    Here is a version of the program if you use <file> tags. Change the regex and Extract to:

    private static readonly Regex rx = new Regex
        (@"<file>(.+?)</file>", RegexOptions.IgnoreCase);
    
    static void Extract(string text)
    {
        MatchCollection matches = rx.Matches(text);
    
        foreach (Match match in matches)
        {
            Console.WriteLine("'{0}'", match.Groups[1]);
        }
    }
    

    Also available on ideone.

    0 讨论(0)
  • 2021-02-09 18:46

    If you use <file> tag and the final text could be represented as well formatted xml document (as far as being inner xml, i.e. text without root tags), you probably can do:

    var doc = new XmlDocument();
    doc.LoadXml(String.Concat("<root>", input, "</root>"));
    
    var files = doc.SelectNodes("//file"):
    

    or

    var doc = new XmlDocument();
    
    doc.AppendChild(doc.CreateElement("root"));
    doc.DocumentElement.InnerXml = input;
    
    var nodes = doc.SelectNodes("//file");
    

    Both method really works and are highly object-oriented, especially the second one.

    And will bring rather more performance.

    See also - Don't parse (X)HTML using RegEx

    0 讨论(0)
  • 2021-02-09 18:52

    If you put some constraints on your filename requirements, you can use code similar to this:

    string s = @"Hello John
    
    these are the files you have to send us today: C:\Development\Projects 2010\Accounting\file20101130.csv, C:\Development\Projects 2010\Accounting\orders20101130.docx
    
    also we would like you to send C:\Development\Projects 2010\Accounting\customersupdated.xls
    
    thank you";
    
    Regex regexObj = new Regex(@"\b[a-z]:\\(?:[^<>:""/\\|?*\n\r\0-\37]+\\)*[^<>:""/\\|?*\n\r\0-\37]+\.[a-z0-9\.]{1,5}", RegexOptions.IgnorePatternWhitespace|RegexOptions.IgnoreCase);
    MatchCollection fileNameMatchCollection = regexObj.Matches(s);
    foreach (Match fileNameMatch in fileNameMatchCollection)
    {
        MessageBox.Show(fileNameMatch.Value);
    }
    

    In this case, I limited extensions to a length of 1-5 characters. You can obviously use another value or restrict the characters allowed in filename extensions further. The list of valid characters is taken from the MSDN article Naming Files, Paths, and Namespaces.

    0 讨论(0)
提交回复
热议问题