So i am working with some email header data, and for the to:, from:, cc:, and bcc: fields the email address(es) can be expressed in a number of different ways:
F
Here is the solution i came up with to accomplish this:
String str = "Last, First <name@domain.com>, name@domain.com, First Last <name@domain.com>, \"First Last\" <name@domain.com>";
List<string> addresses = new List<string>();
int atIdx = 0;
int commaIdx = 0;
int lastComma = 0;
for (int c = 0; c < str.Length; c++)
{
if (str[c] == '@')
atIdx = c;
if (str[c] == ',')
commaIdx = c;
if (commaIdx > atIdx && atIdx > 0)
{
string temp = str.Substring(lastComma, commaIdx - lastComma);
addresses.Add(temp);
lastComma = commaIdx;
atIdx = commaIdx;
}
if (c == str.Length -1)
{
string temp = str.Substring(lastComma, str.Legth - lastComma);
addresses.Add(temp);
}
}
if (commaIdx < 2)
{
// if we get here we can assume either there was no comma, or there was only one comma as part of the last, first combo
addresses.Add(str);
}
Here is how I would do it:
I decided that I was going to draw a line in the sand at two restrictions:
I also decided I'm just interested in email addresses and not display name, since display name is so problematic and hard to define, whereas email address I can validate. So I used MailAddress to validate my parsing.
I treated the To and Cc headers like a csv string, and again, anything not parseable in that way I don't worry about it.
private string GetProperlyFormattedEmailString(string emailString)
{
var emailStringParts = CSVProcessor.GetFieldsFromString(emailString);
string emailStringProcessed = "";
foreach (var part in emailStringParts)
{
try
{
var address = new MailAddress(part);
emailStringProcessed += address.Address + ",";
}
catch (Exception)
{
//wasn't an email address
throw;
}
}
return emailStringProcessed.TrimEnd((','));
}
EDIT
Further research has showed me that my assumptions are good. Reading through the spec RFC 2822 pretty much shows that the To, Cc, and Bcc fields are csv-parseable fields. So yeah it's hard and there are a lot of gotchas, as with any csv parsing, but if you have a reliable way to parse csv fields (which TextFieldParser in the Microsoft.VisualBasic.FileIO namespace is, and is what I used for this), then you are golden.
Edit 2
Apparently they don't need to be valid CSV strings...the quotes really mess things up. So your csv parser has to be fault tolerant. I made it try to parse the string, if it failed, it strips all quotes and tries again:
public static string[] GetFieldsFromString(string csvString)
{
using (var stringAsReader = new StringReader(csvString))
{
using (var textFieldParser = new TextFieldParser(stringAsReader))
{
SetUpTextFieldParser(textFieldParser, FieldType.Delimited, new[] {","}, false, true);
try
{
return textFieldParser.ReadFields();
}
catch (MalformedLineException ex1)
{
//assume it's not parseable due to double quotes, so we strip them all out and take what we have
var sanitizedString = csvString.Replace("\"", "");
using (var sanitizedStringAsReader = new StringReader(sanitizedString))
{
using (var textFieldParser2 = new TextFieldParser(sanitizedStringAsReader))
{
SetUpTextFieldParser(textFieldParser2, FieldType.Delimited, new[] {","}, false, true);
try
{
return textFieldParser2.ReadFields().Select(part => part.Trim()).ToArray();
}
catch (MalformedLineException ex2)
{
return new string[] {csvString};
}
}
}
}
}
}
}
The one thing it won't handle is quoted accounts in an email i.e. "Monkey Header"@stupidemailaddresses.com.
And here's the test:
[Subject(typeof(CSVProcessor))]
public class when_processing_an_email_recipient_header
{
static string recipientHeaderToParse1 = @"""Lastname, Firstname"" <firstname_lastname@domain.com>" + "," +
@"<testto@domain.com>, testto1@domain.com, testto2@domain.com" + "," +
@"<testcc@domain.com>, test3@domain.com" + "," +
@"""""Yes, this is valid""""@[emails are hard to parse!]" + "," +
@"First, Last <name@domain.com>, name@domain.com, First Last <name@domain.com>"
;
static string[] results1;
static string[] expectedResults1;
Establish context = () =>
{
expectedResults1 = new string[]
{
@"Lastname",
@"Firstname <firstname_lastname@domain.com>",
@"<testto@domain.com>",
@"testto1@domain.com",
@"testto2@domain.com",
@"<testcc@domain.com>",
@"test3@domain.com",
@"Yes",
@"this is valid@[emails are hard to parse!]",
@"First",
@"Last <name@domain.com>",
@"name@domain.com",
@"First Last <name@domain.com>"
};
};
Because of = () =>
{
results1 = CSVProcessor.GetFieldsFromString(recipientHeaderToParse1);
};
It should_parse_the_email_parts_properly = () => results1.ShouldBeLike(expectedResults1);
}
I use the following regular expression in Java to get email string from RFC-compliant email address:
[A-Za-z0-9]+[A-Za-z0-9._-]+@[A-Za-z0-9]+[A-Za-z0-9._-]+[.][A-Za-z0-9]{2,3}
You could use regular expressions to try to separate this out, try this guy:
^(?<name1>[a-zA-Z0-9]+?),? (?<name2>[a-zA-Z0-9]+?),? (?<address1>[a-zA-Z0-9.-_<>]+?)$
will match: Last, First test@test.com
; Last, First <test@test.com>
; First last test@test.com
; First Last <test@test.com>
. You can add another optional match in the regex at the end to pick up the last segment of First, Last <name@domain.com>, name@domain.com
after the email address enclosed in angled braces.
Hope this helps somewhat!
EDIT:
and of course you can add more characters to each of the sections to accept quotations etc for whatever format is being read in. As sjbotha mentioned, this could be difficult as the string that is submitted is not necessarily in a set format.
This link can give you more information about matching AND validating email addresses using regular expressions.
The clean and short solution is to use MailAddressCollection:
var collection = new MailAddressCollection();
collection.Add(addresses);
This approach parses a list of addresses separated with colon ,
, and validates it according to RFC. It throws FormatException
in case the addresses are invalid. As suggested in other posts, if you need to deal with invalid addresses, you have to pre-process or parse the value by yourself, otherwise recommending to use what .NET offers without using reflection.
var collection = new MailAddressCollection();
collection.Add("Joe Doe <doe@example.com>, postmaster@example.com");
foreach (var addr in collection)
{
// addr.DisplayName, addr.User, addr.Host
}