Best way to parse string of email addresses

后端 未结 13 2606
悲哀的现实
悲哀的现实 2021-02-14 04:10

So i am working with some email header data, and for the to:, from:, cc:, and bcc: fields the email address(es) can be expressed in a number of different ways:

F         


        
13条回答
  •  情书的邮戳
    2021-02-14 04:57

    I decided that I was going to draw a line in the sand at two restrictions:

    1. The To and Cc headers have to be csv parseable strings.
    2. Anything MailAddress couldn't parse, I'm just not going to worry about it.

    I also decided I'm just interested in email addresses and not display name, since display name is so problematic and hard to define, whereas email address I can validate. So I used MailAddress to validate my parsing.

    I treated the To and Cc headers like a csv string, and again, anything not parseable in that way I don't worry about it.

    private string GetProperlyFormattedEmailString(string emailString)
        {
            var emailStringParts = CSVProcessor.GetFieldsFromString(emailString);
    
            string emailStringProcessed = "";
    
            foreach (var part in emailStringParts)
            {
                try
                {
                    var address = new MailAddress(part);
                    emailStringProcessed += address.Address + ",";
                }
                catch (Exception)
                {
                    //wasn't an email address
                    throw;
                }
            }
    
            return emailStringProcessed.TrimEnd((','));
        }
    

    EDIT

    Further research has showed me that my assumptions are good. Reading through the spec RFC 2822 pretty much shows that the To, Cc, and Bcc fields are csv-parseable fields. So yeah it's hard and there are a lot of gotchas, as with any csv parsing, but if you have a reliable way to parse csv fields (which TextFieldParser in the Microsoft.VisualBasic.FileIO namespace is, and is what I used for this), then you are golden.

    Edit 2

    Apparently they don't need to be valid CSV strings...the quotes really mess things up. So your csv parser has to be fault tolerant. I made it try to parse the string, if it failed, it strips all quotes and tries again:

    public static string[] GetFieldsFromString(string csvString)
        {
            using (var stringAsReader = new StringReader(csvString))
            {
                using (var textFieldParser = new TextFieldParser(stringAsReader))
                {
                    SetUpTextFieldParser(textFieldParser, FieldType.Delimited, new[] {","}, false, true);
    
                    try
                    {
                        return textFieldParser.ReadFields();
                    }
                    catch (MalformedLineException ex1)
                    {
                        //assume it's not parseable due to double quotes, so we strip them all out and take what we have
                        var sanitizedString = csvString.Replace("\"", "");
    
                        using (var sanitizedStringAsReader = new StringReader(sanitizedString))
                        {
                            using (var textFieldParser2 = new TextFieldParser(sanitizedStringAsReader))
                            {
                                SetUpTextFieldParser(textFieldParser2, FieldType.Delimited, new[] {","}, false, true);
    
                                try
                                {
                                    return textFieldParser2.ReadFields().Select(part => part.Trim()).ToArray();
                                }
                                catch (MalformedLineException ex2)
                                {
                                    return new string[] {csvString};
                                }
                            }
                        }
                    }
                }
            }
        }
    

    The one thing it won't handle is quoted accounts in an email i.e. "Monkey Header"@stupidemailaddresses.com.

    And here's the test:

    [Subject(typeof(CSVProcessor))]
    public class when_processing_an_email_recipient_header
    {
        static string recipientHeaderToParse1 = @"""Lastname, Firstname"" " + "," +
                                               @", testto1@domain.com, testto2@domain.com" + "," +
                                               @", test3@domain.com" + "," +
                                               @"""""Yes, this is valid""""@[emails are hard to parse!]" + "," +
                                               @"First, Last , name@domain.com, First Last "
                                               ;
    
        static string[] results1;
        static string[] expectedResults1;
    
        Establish context = () =>
        {
            expectedResults1 = new string[]
            {
                @"Lastname",
                @"Firstname ",
                @"",
                @"testto1@domain.com",
                @"testto2@domain.com",
                @"",
                @"test3@domain.com",
                @"Yes",
                @"this is valid@[emails are hard to parse!]",
                @"First",
                @"Last ",
                @"name@domain.com",
                @"First Last "
            };
        };
    
        Because of = () =>
        {
            results1 = CSVProcessor.GetFieldsFromString(recipientHeaderToParse1);
        };
    
        It should_parse_the_email_parts_properly = () => results1.ShouldBeLike(expectedResults1);
    }
    

提交回复
热议问题