Using array fields instead of massive number of objects

前端 未结 8 2119
无人共我
无人共我 2021-02-06 07:41

In light of this article, I am wondering what people\'s experiences are with storing massive datasets (say, >10,000,000 objects) in-memory using arrays to store data fields inst

相关标签:
8条回答
  • 2021-02-06 08:19

    It depends on your concrete scenario. Depends on how often your objects are created, you can:

    1. If objects are serializable save them in MemoryMappedFile (obtaining some fusion of middle/low performance and low memory consumption).

    2. Map th fields between different objects: I mean if object initially have default values, have all them in separate base and really allocate a new space if that value becomes different from default one. (this make sense for reference types naturally).

    3. Another solution again save objects to SqlLite base. Much easier to manage than MemoryMappedFiles as you can use simple SQL.

    The choice is up to you, as it depends on your concrete project requierements.

    Regards.

    0 讨论(0)
  • 2021-02-06 08:25

    Actually the ADO.NET DataTable uses similar approach to store the data. Maybe you should look how it is implemented there. So, you'll need to have a DataRow-like object that internally holds pointer to Table and index of the row data. This would be the most lightweight solution I beleive.

    In your case: a) If you are constructing the Thing each time you call the GetThingAtPosition method you create the object in the heap, that doubles information that is already in your table. Plus "object overhead" data.

    b) If you need to access each item in your ContainerOfThings the required memory will be doubled + 12bytes * number of objects overhead. In such scenario it would be better to have a simple array of things without creating them on-the-fly.

    0 讨论(0)
  • 2021-02-06 08:26

    You make an Array of System.Array with an element for each property in your type. The size of these sub-arrays is equal to the number of objects you have. Property access would be:

    masterArray[propertyIndex][objectIndex]

    This will allow you to use value type arrays instead of arrays of object.

    0 讨论(0)
  • 2021-02-06 08:27

    Unfortunately, OO can't abstract away the performance issues (saturation of bandwidth being one). It's a convenient paradigm, but it comes with limitations.

    I like your idea, and I use this as well... and guess what, we're not the first to think of this ;-). I've found that it does require a bit of a mind shift though.

    May I refere you to the J community? See:

    http://www.JSoftware.com.

    That's not a C# (or Java) group. They're a good bunch. Typically the array needs to be treated as a first class object. In C#, it's not nearly as flexible. It can be a frustrating structure to work withing C#.

    There are various OO patterns for large dataset problems... but if you are asking a question like this, probably it is time to go a little more functional. Or at least functional for problem solving / prototyping.

    0 讨论(0)
  • 2021-02-06 08:34

    I've done such a thing for the rapidSTORM project, where several million sparsely populated objects need to be cached (localization microscopy). While I can't really give you good code snippets (too many dependencies), I found that the implementation was very quick and straightforward with Boost Fusion. Fusionized the structure, built a vector for each element type, and then wrote a quite straightforward accessor for that vector that reconstructed each element.

    (D'oh, I just noticed that you tagged the question, but maybe my C++ answer helps as well)

    0 讨论(0)
  • 2021-02-06 08:35

    In light of this article, I am wondering what people's experiences are with storing massive datasets (say, >10,000,000 objects) in-memory using arrays to store data fields instead of instantiating millions of objects and racking up the memory overhead...

    I guess there are several ways to approach this, and indeed you are onto a possible solution to limit the data in memory. However, I'm not sure that reducing your structure by even 24? bytes is going to do you a whole lot of good. Your structure is around 79 bytes (for a 15 char string) = 8 + 8 + 4 + 24? + 4 + 1 + (2 * character length) so your total gain is at best 25%. That doesn't seem very useful since you'd have to be in a position where 10 million * 80 bytes fits in memory and 10 million * 100 bytes does not. That would mean that your designing a solution that is on the edge of disaster, too many large strings, or too many records, or some other program hogging memory and your machine is out of memory.

    If you need to support random access to n small records, where n = 10 million, then you should aim to design for at least 2n or 10n. Perhaps your already considering this in your 10 million? Either way there are plenty of technologies that can support this type of data being accessed.

    One possibility is if the string is limited in Max Length (ml), of a reasonable size (say 255) then you can go to a simple ISAM store. Each record would be 8 + 8 + 4 + 255 bytes and you can simply offset into a flat file to read them. If the record size is variable or possibly large then you will want to use a different storage format for this and store offsets into the file.

    Another possibility is if your looking up values by some key then I would recommend something like an embedded database, or BTree, one you can disable some of the disk consistency to gain the performance. As it happens I wrote a BPlusTree for client-side caches of large volumes of data. Detailed information on using the B+Tree are here.

    0 讨论(0)
提交回复
热议问题