I am currently working on a very large legacy application which handles a large amount of string data gathered from various sources (IE, names, identifiers, common codes relatin
This is already built-in .NET, it's called String.Intern, no need to reinvent.
This is essentially what string interning is, except you don't have to worry how it works. In your example you are still creating a string, then comparing it, then leaving the copy to be disposed of. .NET will do this for you in runtime.
See also String.Intern and Optimizing C# String Performance (C Calvert)
If a new string is created with code like (
String goober1 = "foo"; String goober2 = "foo";
) shown in lines 18 and 19, then the intern table is checked. If your string is already in there, then both variables will point at the same block of memory maintained by the intern table.
So, you don't have to roll your own - it won't really provide any advantage. EDIT UNLESS: your strings don't usually live for as long as your AppDomain - interned strings live for the lifetime of the AppDomain, which is not necessarily great for GC. If you want short lived strings, then you want a pool. From String.Intern
:
If you are trying to reduce the total amount of memory your application allocates, keep in mind that interning a string has two unwanted side effects. First, the memory allocated for interned String objects is not likely be released until the common language runtime (CLR) terminates. The reason is that the CLR's reference to the interned String object can persist after your application, or even your application domain, terminates. ...
EDIT 2 Also see Jon Skeets SO answer here
You can acheive this using the built in .Net functionality.
When you initialise your string, make a call to string.Intern() with your string.
For example:
dataList.Add(string.Intern("AAAA"));
Every subsequent call with the same string will use the same reference in memory. So if you have 1000 AAAAs, only 1 copy of AAAA is stored in memory.