I have to keep thousands of strings in memory to be accessed serially in Java. Should I store them in an array or should I use some kind of List ?
Since arrays keep
It depends on how you have to access it.
After storing, if you mainly want to do search operation, with little or no insert/delete, then go for Array (as search is done in O(1) in arrays, whereas add/delete may need re-ordering of the elements).
After storing, if your main purpose is to add/delete strings, with little or no search operation, then go for List.
Arrays - It would always be better when we have to achieve faster fetching of results
Lists- Performs results on insertion and deletion since they can be done in O(1) and this also provides methods to add, fetch and delete data easily. Much easier to use.
But always remember that the fetching of data would be fast when the index position in array where the data is stored - is known.
This could be achieved well by sorting the array. Hence this increases the time to fetch the data (ie; storing the data + sorting the data + seek for the position where the data is found). Hence this increases additional latency to fetch the data from the array even they may be good at fetching the data sooner.
Hence this could be solved with trie data structure or ternary data structure. As discussed above the trie data structure would be very efficient in searching for the data the search for a particularly word can be done in O(1) magnitude. When time matters ie; if you have to search and retrieve data quickly you may go with trie data structure.
If you want your memory space to be consumed less and you wish to have a better performance then go with ternary data structure. Both these are suitable for storing huge number of strings (eg; like words contained in dictionary).
I don't think it makes a real difference for Strings. What is contiguous in an array of strings is the references to the strings, the strings themselves are stored at random places in memory.
Arrays vs. Lists can make a difference for primitive types, not for objects. IF you know in advance the number of elements, and don't need flexibility, an array of millions of integers or doubles will be more efficient in memory and marginally in speed than a list, because indeed they will be stored contiguously and accessed instantly. That's why Java still uses arrays of chars for strings, arrays of ints for image data, etc.
I suggest that you use a profiler to test which is faster.
My personal opinion is that you should use Lists.
I work on a large codebase and a previous group of developers used arrays everywhere. It made the code very inflexible. After changing large chunks of it to Lists we noticed no difference in speed.
The Java way is that you should consider what data abstraction most suits your needs. Remember that in Java a List is an abstract, not a concrete data type. You should declare the strings as a List, and then initialize it using the ArrayList implementation.
List<String> strings = new ArrayList<String>();
This separation of Abstract Data Type and specific implementation is one the key aspects of object oriented programming.
An ArrayList implements the List Abstract Data Type using an array as its underlying implementation. Access speed is virtually identical to an array, with the additional advantages of being able to add and subtract elements to a List (although this is an O(n) operation with an ArrayList) and that if you decide to change the underlying implementation later on you can. For example, if you realize you need synchronized access, you can change the implementation to a Vector without rewriting all your code.
In fact, the ArrayList was specifically designed to replace the low-level array construct in most contexts. If Java was being designed today, it's entirely possible that arrays would have been left out altogether in favor of the ArrayList construct.
Since arrays keep all the data in a contiguous chunk of memory (unlike Lists), would the use of an array to store thousands of strings cause problems ?
In Java, all collections store only references to objects, not the objects themselves. Both arrays and ArrayList will store a few thousand references in a contiguous array, so they are essentially identical. You can consider that a contiguous block of a few thousand 32-bit references will always be readily available on modern hardware. This does not guarantee that you will not run out of memory altogether, of course, just that the contiguous block of memory requirement is not difficult to fufil.
Well firstly it's worth clarifying do you mean "list" in the classical comp sci data structures sense (ie a linked list) or do you mean java.util.List? If you mean a java.util.List, it's an interface. If you want to use an array just use the ArrayList implementation and you'll get array-like behaviour and semantics. Problem solved.
If you mean an array vs a linked list, it's a slightly different argument for which we go back to Big O (here is a plain English explanation if this is an unfamiliar term.
Array;
Linked List:
So you choose whichever one best suits how you resize your array. If you resize, insert and delete a lot then maybe a linked list is a better choice. Same goes for if random access is rare. You mention serial access. If you're mainly doing serial access with very little modification then it probably doesn't matter which you choose.
Linked lists have a slightly higher overhead since, like you say, you're dealing with potentially non-contiguous blocks of memory and (effectively) pointers to the next element. That's probably not an important factor unless you're dealing with millions of entries however.