array of structures, or structure of arrays?

后端 未结 11 453
甜味超标
甜味超标 2021-01-01 01:32

Hmmm. I have a table which is an array of structures I need to store in Java. The naive don\'t-worry-about-memory approach says do this:

public class Record          


        
相关标签:
11条回答
  • 2021-01-01 02:00

    Whenever I have tried doing number crunching in Java, I have always had to revert to C-style coding (i.e. close to your option 2). It minimised the number of objects floating around in your system, as instead of 1,000,000 objects, you only have 3. I was able to do a bit of FFT analysis of real-time sound data using the C-style, and it was far too slow using objects.

    0 讨论(0)
  • 2021-01-01 02:01

    I'd choose the first method (array of structures) unless you access the store relatively infrequently and are running into serious memory pressure issues.

    First version basically stores the objects in their "natural" form (+1 BTW for using immutable records). This uses a little more memory because of the per-object overhead (probably around 8-16 bytes depending on your JVM) but is very good for accessing and returning objects in a convenient and human-understandable form in one simple step.

    Second version uses less memory overall, but the allocation of a new object on every "get" is a pretty ugly solution that will not perform well if accesses are frequent.

    Some other possibilities to consider:

    An interesting "extreme" variant would be to take the second version but write your algorithms / access methods to interact with the underlying arrays directly. This is clearly going to result in complex inter-dependencies and some ugly code, but would probably give you the absolute best performance if you really needed it. It's quite common to use this approach for intensive graphics applications such as manipulating a large array of 3D coordinates.

    A "hybrid" option would be to store the underlying data in a structure of arrays as in the second version, but cache the accessed objects in a HashMap so that you only generate the object the first time a particular index is accessed. Might make sense if only a small fraction of objects are ever likely to accessed, but all data is needed "just in case".

    0 讨论(0)
  • 2021-01-01 02:02

    Because you are making the int[] fields final, you are stuck with just the one initialization of the array and that is it. Thus, if you wanted 10^6 field1's, Java would need to separate that much memory for each of those int[], because you cannot reassign the size of those arrays. With an ArrayList, if you do not know the number of records beforehand and will be removing records potentially, you save a lot of space upfront and then later on as well when you go to remove records.

    0 讨论(0)
  • 2021-01-01 02:02

    I would go for the ArrayList version too, so I don't need to worry about growing it. Do you need to have a column like access to values? What is your scenario behind your question?

    Edit You could also use a common long[][] matrix. I don't know how you pass the columns to Matlab, but I guess you don't gain much speed with a column based storage, more likely you loose speed in the java computation.

    0 讨论(0)
  • 2021-01-01 02:04

    I was curious so I actually ran a benchmark. If you don't re-create the object like you are[1], then SoA beats AoS by 5-100% depending on workload[2]. See my code here:

    https://gist.github.com/twolfe18/8168262c5420c7a62d39

    [1] I didn't add that because if you are concerned enough about speed to consider this refactor, it would be silly to do that.

    [2] This also doesn't account for re-allocation, but again, this is often something you can either amortize away or know statically. This is a reasonable assumption for a pure-speed benchmark.

    0 讨论(0)
提交回复
热议问题