So i am working with some email header data, and for the to:, from:, cc:, and bcc: fields the email address(es) can be expressed in a number of different ways:
F
I decided that I was going to draw a line in the sand at two restrictions:
I also decided I'm just interested in email addresses and not display name, since display name is so problematic and hard to define, whereas email address I can validate. So I used MailAddress to validate my parsing.
I treated the To and Cc headers like a csv string, and again, anything not parseable in that way I don't worry about it.
private string GetProperlyFormattedEmailString(string emailString)
{
var emailStringParts = CSVProcessor.GetFieldsFromString(emailString);
string emailStringProcessed = "";
foreach (var part in emailStringParts)
{
try
{
var address = new MailAddress(part);
emailStringProcessed += address.Address + ",";
}
catch (Exception)
{
//wasn't an email address
throw;
}
}
return emailStringProcessed.TrimEnd((','));
}
EDIT
Further research has showed me that my assumptions are good. Reading through the spec RFC 2822 pretty much shows that the To, Cc, and Bcc fields are csv-parseable fields. So yeah it's hard and there are a lot of gotchas, as with any csv parsing, but if you have a reliable way to parse csv fields (which TextFieldParser in the Microsoft.VisualBasic.FileIO namespace is, and is what I used for this), then you are golden.
Edit 2
Apparently they don't need to be valid CSV strings...the quotes really mess things up. So your csv parser has to be fault tolerant. I made it try to parse the string, if it failed, it strips all quotes and tries again:
public static string[] GetFieldsFromString(string csvString)
{
using (var stringAsReader = new StringReader(csvString))
{
using (var textFieldParser = new TextFieldParser(stringAsReader))
{
SetUpTextFieldParser(textFieldParser, FieldType.Delimited, new[] {","}, false, true);
try
{
return textFieldParser.ReadFields();
}
catch (MalformedLineException ex1)
{
//assume it's not parseable due to double quotes, so we strip them all out and take what we have
var sanitizedString = csvString.Replace("\"", "");
using (var sanitizedStringAsReader = new StringReader(sanitizedString))
{
using (var textFieldParser2 = new TextFieldParser(sanitizedStringAsReader))
{
SetUpTextFieldParser(textFieldParser2, FieldType.Delimited, new[] {","}, false, true);
try
{
return textFieldParser2.ReadFields().Select(part => part.Trim()).ToArray();
}
catch (MalformedLineException ex2)
{
return new string[] {csvString};
}
}
}
}
}
}
}
The one thing it won't handle is quoted accounts in an email i.e. "Monkey Header"@stupidemailaddresses.com.
And here's the test:
[Subject(typeof(CSVProcessor))]
public class when_processing_an_email_recipient_header
{
static string recipientHeaderToParse1 = @"""Lastname, Firstname"" " + "," +
@", testto1@domain.com, testto2@domain.com" + "," +
@", test3@domain.com" + "," +
@"""""Yes, this is valid""""@[emails are hard to parse!]" + "," +
@"First, Last , name@domain.com, First Last "
;
static string[] results1;
static string[] expectedResults1;
Establish context = () =>
{
expectedResults1 = new string[]
{
@"Lastname",
@"Firstname ",
@"",
@"testto1@domain.com",
@"testto2@domain.com",
@"",
@"test3@domain.com",
@"Yes",
@"this is valid@[emails are hard to parse!]",
@"First",
@"Last ",
@"name@domain.com",
@"First Last "
};
};
Because of = () =>
{
results1 = CSVProcessor.GetFieldsFromString(recipientHeaderToParse1);
};
It should_parse_the_email_parts_properly = () => results1.ShouldBeLike(expectedResults1);
}