Array or List in Java. Which is faster?

后端 未结 30 1653
滥情空心
滥情空心 2020-11-22 04:30

I have to keep thousands of strings in memory to be accessed serially in Java. Should I store them in an array or should I use some kind of List ?

Since arrays keep

相关标签:
30条回答
  • 2020-11-22 04:54

    If you have thousands, consider using a trie. A trie is a tree-like structure that merges the common prefixes of the stored string.

    For example, if the strings were

    intern
    international
    internationalize
    internet
    internets
    

    The trie would store:

    intern
     -> \0
     international
     -> \0
     -> ize\0
     net
     ->\0
     ->s\0
    

    The strings requires 57 characters (including the null terminator, '\0') for storage, plus whatever the size of the String object that holds them. (In truth, we should probably round all sizes up to multiples of 16, but...) Call it 57 + 5 = 62 bytes, roughly.

    The trie requires 29 (including the null terminator, '\0') for storage, plus sizeof the trie nodes, which are a ref to an array and a list of child trie nodes.

    For this example, that probably comes out about the same; for thousands, it probably comes out less as long as you do have common prefixes.

    Now, when using the trie in other code, you'll have to convert to String, probably using a StringBuffer as an intermediary. If many of the strings are in use at once as Strings, outside the trie, it's a loss.

    But if you're only using a few at the time -- say, to look up things in a dictionary -- the trie can save you a lot of space. Definitely less space than storing them in a HashSet.

    You say you're accessing them "serially" -- if that means sequentially an alphabetically, the trie also obviously gives you alphabetical order for free, if you iterate it depth-first.

    0 讨论(0)
  • 2020-11-22 04:57

    A lot of microbenchmarks given here have found numbers of a few nanoseconds for things like array/ArrayList reads. This is quite reasonable if everything is in your L1 cache.

    A higher level cache or main memory access can have order of magnitude times of something like 10nS-100nS, vs more like 1nS for L1 cache. Accessing an ArrayList has an extra memory indirection, and in a real application you could pay this cost anything from almost never to every time, depending on what your code is doing between accesses. And, of course, if you have a lot of small ArrayLists this might add to your memory use and make it more likely you'll have cache misses.

    The original poster appears to be using just one and accessing a lot of contents in a short time, so it should be no great hardship. But it might be different for other people, and you should watch out when interpreting microbenchmarks.

    Java Strings, however, are appallingly wasteful, especially if you store lots of small ones (just look at them with a memory analyzer, it seems to be > 60 bytes for a string of a few characters). An array of strings has an indirection to the String object, and another from the String object to a char[] which contains the string itself. If anything's going to blow your L1 cache it's this, combined with thousands or tens of thousands of Strings. So, if you're serious - really serious - about scraping out as much performance as possible then you could look at doing it differently. You could, say, hold two arrays, a char[] with all the strings in it, one after another, and an int[] with offsets to the starts. This will be a PITA to do anything with, and you almost certainly don't need it. And if you do, you've chosen the wrong language.

    0 讨论(0)
  • 2020-11-22 04:58

    No, because technically, the array only stores the reference to the strings. The strings themselves are allocated in a different location. For a thousand items, I would say a list would be better, it is slower, but it offers more flexibility and it's easier to use, especially if you are going to resize them.

    0 讨论(0)
  • 2020-11-22 04:58

    Array vs. List choice is not so important (considering performance) in the case of storing string objects. Because both array and list will store string object references, not the actual objects.

    1. If the number of strings is almost constant then use an array (or ArrayList). But if the number varies too much then you'd better use LinkedList.
    2. If there is (or will be) a need for adding or deleting elements in the middle, then you certainly have to use LinkedList.
    0 讨论(0)
  • 2020-11-22 05:01

    You should prefer generic types over arrays. As mentioned by others, arrays are inflexible and do not have the expressive power of generic types. (They do however support runtime typechecking, but that mixes badly with generic types.)

    But, as always, when optimizing you should always follow these steps:

    • Don't optimize until you have a nice, clean, and working version of your code. Changing to generic types could very well be motivated at this step already.
    • When you have a version that is nice and clean, decide if it is fast enough.
    • If it isn't fast enough, measure its performance. This step is important for two reasons. If you don't measure you won't (1) know the impact of any optimizations you make and (2) know where to optimize.
    • Optimize the hottest part of your code.
    • Measure again. This is just as important as measuring before. If the optimization didn't improve things, revert it. Remember, the code without the optimization was clean, nice, and working.
    0 讨论(0)
  • 2020-11-22 05:02

    I'm guessing the original poster is coming from a C++/STL background which is causing some confusion. In C++ std::list is a doubly linked list.

    In Java [java.util.]List is an implementation-free interface (pure abstract class in C++ terms). List can be a doubly linked list - java.util.LinkedList is provided. However, 99 times out of 100 when you want a make a new List, you want to use java.util.ArrayList instead, which is the rough equivalent of C++ std::vector. There are other standard implementations, such as those returned by java.util.Collections.emptyList() and java.util.Arrays.asList().

    From a performance standpoint there is a very small hit from having to go through an interface and an extra object, however runtime inlining means this rarely has any significance. Also remember that String are typically an object plus array. So for each entry, you probably have two other objects. In C++ std::vector<std::string>, although copying by value without a pointer as such, the character arrays will form an object for string (and these will not usually be shared).

    If this particular code is really performance-sensitive, you could create a single char[] array (or even byte[]) for all the characters of all the strings, and then an array of offsets. IIRC, this is how javac is implemented.

    0 讨论(0)
提交回复
热议问题