Multimaps in ChronicleMap

时光总嘲笑我的痴心妄想 提交于 2019-12-03 03:10:36

If actual data indeed resemble pizzas, toppings and crusts, i. e. there are only a handful of distinct toppings/crusts, and thousands of pizzas contain each of them, I would say that having a proper multimap is overkill for this case and you would better have pepperoni_pizzas.dat, onions_pizzas.dat, ... distinct appendable shared lists with UUIDs, you can use Chronicle Queue for access and update them from multiple processes conveniently.

If there are 10s-100s of thousands of toppings/crusts, only 10s-100s of pizzas have a specific topping on average, you should use multimap indeed.

Essentially there are 3 kinds of "problems" with Chronicle-Maps-as-multimaps:

Excessive garbage allocation on each query

If you create a Chronicle Map with List<UUID> or Set<UUID> type of value without specifying custom value serializers, it will work, but it will be utterly inefficient, because it will default to built-in Java serialization for serializing and deserializing the whole value collection on each request, without reusing neither collection heap objects, nor individual UUID heap objects for elements. Hence a lot of garbage will be generated on each request to the ChronicleMap.

Solution However, if you specify value serializer as ListMarshaller or SetMarshaller (or your custom collection marshaller, which you could write based on ListMarshaller and SetMarshaller implementations) in conjunction with reusable UUID heap object, it will resolve this garbage problem:

ListMarshaller<ReusableUuid> valueMarshaller = ListMarshaller.of(
     ReusableUuidReader.INSTANCE, ReusableUuidWriter.INSTANCE);
List<ReusableUuid> averageValue = Stream
    .generate(() -> ReusableUuid.random())
    .limit(averagePizzasForTopping)
    .collect(Collectors.toList());
 ChronicleMap<Topping, List<ReusableUuid>> map = ChronicleMap
     .of(Topping.class, (Class<List<ReusableUuid>>) (Class) List.class)
     .averageKey(pepperoni)
     .valueMarshaller(valueMarshaller)
     .averageValue(averageValue)
     .entries(numberOfToppings)
     .createPersistedTo(new File("toppings_to_pizza_ids.dat"));

Inefficient value updates and replication

When you append another pizza UUID to a list of 100 UUIDs, and insert the new value back to Chronicle Map, Chronicle Map will re-write the whole list again, instead of appending one UUID to the end of off-heap memory chunk. And if you use replication, it will send the whole list of 100 UUIDs as an updated value to other nodes, instead of sending just one added UUID.

Both (value update and replication) could be optimized via terrible hacks, but it requires very deep knowledge of Chronicle Map implementation and will be very fragile.

Fragmentation of Chronicle-Map's memory

If you plan add new pizzas during data store lifetime, memory areas initially allocated for entires will become too small to hold new values with more UUIDs, so memory areas will be re-allocated (possibly several times for each list of UUIDs). Chronicle Map's data stucture design implies simplified memory allocation scheme, which suffers badly from fragmentation, if entries are re-allocated many times.

If you have a lot of UUIDs in lists, and you run your application on Linux, you can mitigate this problem by pre-allocating a lot of memory (more than will practically be needed by any list) for each entry (by specifying .actualChunkSize() in ChronicleMapBuilder configuration) and relying on Linux's feature of lazy mapped memory allocation (page-by-page, as needed). So you will lose at most 4KB of memory for each UUID list, that might be OK if lists are many KBs of size.

On the other hand, if your lists are so long (and they are lists of UUIDs i. e. small structures), and you have only 100 000 pizzas in total, you don't need multimap in the first place, see the beginning of this answer.

The trick with memory overcommit and relying on lazy mapped memory allocation in Linux also would work for short list (collections) of values, but only if elements themselves are big, so that the average total value size is many KBs.

Fragmentation is also less an issue when you can avoid entry memory re-allocation any other way, i. e. new pizza UUIDs are added in time but removed as well, so topping-to-uuids list sizes float around some average and re-allocation is rarely hitted.

Memory fragmentation is never an issue if values are never updated (or never change in size) after entry is inserted into Chronicle Map.

Conclusion

In some use cases and with proper configuration, Chronicle Map could serve as a multimap well. In other cases Chronicle Map as multimap is inherently inefficient.

Factors that matter:

  • Total number of key -> List<Value> entries in a multimap
  • Total number of values
  • Average and distribution of key sizes
  • Average and distribution of distinct value sizes
  • Average and distribution of value list sizes
  • Value lists dynamics over Chronicle Map lifetime (never updated, append only, remove and append. Removes from beginning and middle of lists are more expensive.)
  • If Chronicle Map is replicated, or not
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!