When to use Cassandra vs. Solr in DSE?

后端未结

关注

 1  639

没有蜡笔的小新 2021-02-06 07:04

I\'m using DSE for Cassandra/Solr integration so that data are stored in Cassandra and indexed in Solr. It\'s very natural to use Cassandra to handle CRUD operation and use Solr

1条回答

予麋鹿 (楼主)

2021-02-06 07:27
Cassandra secondary indexes have limited use cases:
1. No more than a couple of columns indexed.
2. Only a single indexed column in a query.
3. Too much inter-node traffic for high cardinality data (relatively unique column values)
4. Too much inter-node traffic for low cardinality data (high percentage of rows will match)
5. Queries need to be known in advance so data model can be optimized around them.
Because of these limitations, it is common for apps to create "index tables" which are indexed by whatever column is desired. This requires either that data be duplicated from the main table to each index table, or an extra query will be needed to read the index table and then read the actual row from the main table after reading the main key from the index table. Queries on multiple columns will have to be manually indexed in advance, making ad hoc queries problematic. And any duplicated will have to be manually updated by the app into each index table.

Other than that... they will work fine in cases where a "modest" number of rows will be selected from a modest number of nodes, and queries are well specified in advance and not ad hoc.

DSE/Solr is better for:
1. A moderate number of columns are indexed.
2. Complex queries with a number of columns/fields referenced - Lucene matches all specified fields in a query in parallel. Lucene indexes the data on each node, so nodes query in parallel.
3. Ad hoc queries in general, where the precise queries are not known in advance.
4. Rich text queries such as keyword search, wildcard, fuzzy/like, range, inequality.
There is a performance and capacity cost to using Solr indexing, so a proof of concept implementation is recommended to evaluate how much additional RAM, storage, and nodes are needed, which depends on how many columns you index, the amount of text indexed, and any text filtering complexity (e.g., n-grams need more.) It could range from 25% increase for a relatively small number of indexed columns to 100% if all columns are indexed. Also, you need to have enough nodes so that the per-node Solr index fits in RAM or mostly in RAM if using SSD. And vnodes are not currently recommended for Solr data centers.
0 讨论(0)
发布评论:

提交评论
- 加载中...