问题
I'm relatively new to NoSQL, but I've done a fair bit of toying with relational databases.
We are evaluating Cassandra for use in an environment where our data model might need to evolve fairly aggressively. I've seen it written multiple places that Cassandra can store "structured, semi-structured and unstructured" data.
I understand the structured claim. It's obvious: a table has defined columns.
I think I understand the semi-structured claim. A row does not need to populate all columns.
But I'm not clear on the unstructured claim. Certainly you could store everything as a key-value blob but you'd have no means of searching by value (efficiently).
I've failed to find any resource on the net that describes best practices using unstructured data with Cassandra. Ideally, for our application semi-structured data would be sufficient; but I want to understand the unstructured claim in the event that it can add value for us.
Thanks.
回答1:
Cassandra can at best be searchable for semi-structured data. That too via use of clustering keys and secondary indexes. Clustering keys is definitely an efficient way for searching semi-structured data.
Searching secondary indexed data without specifying the partition key is not efficient. There a few solutions which help help here namely DSE Search(Solr with Cassandr) and Stargate. Both of these solutions may also help in case one of the columns is unstructured text.
Otherwise it isn't a great idea to do unstructured data with Cassandra as it may not be searchable without a key.
回答2:
Unstructured means that you have a schema-less column family. Each row has (obviously) a row-key. But the rest of each row can contain arbitrary key/value pairs - even the data types do not need to match between rows.
But as trulite correctly notes, it is a generally a bad idea to use a schema-less data model. Look at http://planetcassandra.org/blog/post/the-myth-of-schema-less/
来源:https://stackoverflow.com/questions/24806170/cassandra-and-unstructured-data