I have to keep thousands of strings in memory to be accessed serially in Java. Should I store them in an array or should I use some kind of List ?
Since arrays keep
If you have thousands, consider using a trie. A trie is a tree-like structure that merges the common prefixes of the stored string.
For example, if the strings were
intern
international
internationalize
internet
internets
The trie would store:
intern
-> \0
international
-> \0
-> ize\0
net
->\0
->s\0
The strings requires 57 characters (including the null terminator, '\0') for storage, plus whatever the size of the String object that holds them. (In truth, we should probably round all sizes up to multiples of 16, but...) Call it 57 + 5 = 62 bytes, roughly.
The trie requires 29 (including the null terminator, '\0') for storage, plus sizeof the trie nodes, which are a ref to an array and a list of child trie nodes.
For this example, that probably comes out about the same; for thousands, it probably comes out less as long as you do have common prefixes.
Now, when using the trie in other code, you'll have to convert to String, probably using a StringBuffer as an intermediary. If many of the strings are in use at once as Strings, outside the trie, it's a loss.
But if you're only using a few at the time -- say, to look up things in a dictionary -- the trie can save you a lot of space. Definitely less space than storing them in a HashSet.
You say you're accessing them "serially" -- if that means sequentially an alphabetically, the trie also obviously gives you alphabetical order for free, if you iterate it depth-first.