We have an internal .NET case management application that automatically creates a new case from an email. I want to be able to identify other emails that are related to the original email so we can prevent duplicate cases from being created.
I have observed that many, but not all, emails have a thread-index header that looks useful.
Does anybody know of a straightforward algorithm or package that we could use?
Use the JWZ threading algorithm.
As far as I know, there's not going to be a 100% foolproof solution, as not all email clients or gateways preserve or respect all headers.
However, you'll get a pretty high hit rate with the following:
Every email message should have a unique "Message-ID" field. Find this, and keep a record of it as a part of the case. (See RFC-822)
If you receive two messages with the same Message-ID, discard the second one as it's a duplicate.
Check for the "In-Reply-To" field, if the ID shown matches a known Message-ID then you know the email is related.
The "References" and "Original-Message-ID" headers have similar meanings.
If your system ever generates emails, include a CaseID# in the subject line in a way that you can search for it if you get an email back (eg: [Case#20081114-01]); most people don't edit subject lines when replying.
The internet standards RFC-822, RFC-2076 and RFC-4021 may be useful further reading.
Given that there will always be messages that are missed (for whatever reason), you'll also probably want related features in your case management system - say, "Close as Duplicate Case" or "Merge with Duplicate Case", along with tools to make it easier to find duplicates.
来源:https://stackoverflow.com/questions/288757/how-to-identify-email-belongs-to-existing-thread-or-conversation