Best way to parse string of email addresses

后端 未结 13 2682
悲哀的现实
悲哀的现实 2021-02-14 04:10

So i am working with some email header data, and for the to:, from:, cc:, and bcc: fields the email address(es) can be expressed in a number of different ways:

F         


        
13条回答
  •  一整个雨季
    2021-02-14 05:02

    At the risk of creating two problems, you could create a regular expression that matches any of your email formats. Use "|" to separate the formats within this one regex. Then you can run it over your input string and pull out all of the matches.

    public class Address
    {
        private string _first;
        private string _last;
        private string _name;
        private string _domain;
    
        public Address(string first, string last, string name, string domain)
        {
            _first = first;
            _last = last;
            _name = name;
            _domain = domain;
        }
    
        public string First
        {
            get { return _first; }
        }
    
        public string Last
        {
            get { return _last; }
        }
    
        public string Name
        {
            get { return _name; }
        }
    
        public string Domain
        {
            get { return _domain; }
        }
    }
    
    [TestFixture]
    public class RegexEmailTest
    {
        [Test]
        public void TestThreeEmailAddresses()
        {
            Regex emailAddress = new Regex(
                @"((?\w*), (?\w*) <(?\w*)@(?\w*\.\w*)>)|" +
                @"((?\w*) (?\w*) <(?\w*)@(?\w*\.\w*)>)|" +
                @"((?\w*)@(?\w*\.\w*))");
            string input = "First, Last , name@domain.com, First Last ";
    
            MatchCollection matches = emailAddress.Matches(input);
            List
    addresses = (from Match match in matches select new Address( match.Groups["first"].Value, match.Groups["last"].Value, match.Groups["name"].Value, match.Groups["domain"].Value)).ToList(); Assert.AreEqual(3, addresses.Count); Assert.AreEqual("Last", addresses[0].First); Assert.AreEqual("First", addresses[0].Last); Assert.AreEqual("name", addresses[0].Name); Assert.AreEqual("domain.com", addresses[0].Domain); Assert.AreEqual("", addresses[1].First); Assert.AreEqual("", addresses[1].Last); Assert.AreEqual("name", addresses[1].Name); Assert.AreEqual("domain.com", addresses[1].Domain); Assert.AreEqual("First", addresses[2].First); Assert.AreEqual("Last", addresses[2].Last); Assert.AreEqual("name", addresses[2].Name); Assert.AreEqual("domain.com", addresses[2].Domain); } }

    There are several down sides to this approach. One is that it doesn't validate the string. If you have any characters in the string that don't fit one of your chosen formats, then those characters are just ignored. Another is that the accepted formats are all expressed in one place. You cannot add new formats without changing the monolithic regex.

提交回复
热议问题