问题
I want to use Cassandra in a .Net application. My objective is to store some data in a column family, but each row of data will have varying schema.
Example (A very simple one) I want to have a 'Toys' column family to store the following objects, (Notice how they have very different properties other than the ID property)
Toy object 1 { "id":"1", "name":"Car", "number_of_doors":4, "likes":3}
Toy object 2 { "id":"2", "type":"Plane", "flying_range":"100m"}
Toy object 3 { "id":"3", "category":"Train", "number_of_carriages":10}
From my initial understanding and using of Datastax CSharp driver I have to always alter the table (Column family) which does not sit right with me. I would like each row to have its own schema. Thrift API might be able to solve this but it seems HectorSharp is all but dead.
A question similar to my requirement but it doesn't have the answer I want
Cassandra for a schemaless db, 10's of millions order tables and millions of queries per day
Am I barking up the wrong tree by expecting each row to have its own schema or is there a way to do this using Cassandra+Csharp ?
Thanks in advance for your answers.
回答1:
Older versions of Cassandra were Schema-less, meaning that you didn't have anywhere a definition of what a row could contain. What you need now could be partially done with a Map
on Cassandra 2.1
CREATE TABLE toys (
id text PRIMARY KEY,
toy map<text, text>
)
Put some data ...
INSERT INTO toys (id, toy) VALUES ( '1', {'name':'Car', 'number_of_doors':'4', 'likes':'3'});
INSERT INTO toys (id, toy) VALUES ( '2', {'type':'Plane', 'flying_range':'100m'});
INSERT INTO toys (id, toy) VALUES ( '3', {'category':'Train', 'number_of_carriages':'10'});
Table content ...
id | toy
----+-------------------------------------------------------
3 | {'category': 'Train', 'number_of_carriages': '10'}
2 | {'flying_range': '100m', 'type': 'Plane'}
1 | {'likes': '3', 'name': 'Car', 'number_of_doors': '4'}
We can now create an index on keys ...
CREATE INDEX toy_idx ON toys (KEYS(toy));
... and perform queries on Map keys ...
SELECT * FROM toys WHERE toy CONTAINS KEY 'name';
id | toy
----+-------------------------------------------------------
1 | {'likes': '3', 'name': 'Car', 'number_of_doors': '4'}
Now you can update or delete map entries like you would do with normal columns, without reading before writing
DELETE toy['name'] FROM toys WHERE id='1';
UPDATE toys set toy = toy + {'name': 'anewcar'} WHERE id = '1';
SELECT * FROM toys;
id | toy
----+-----------------------------------------------------------
3 | {'category': 'Train', 'number_of_carriages': '10'}
2 | {'flying_range': '100m', 'type': 'Plane'}
1 | {'likes': '3', 'name': 'anewcar', 'number_of_doors': '4'}
A few limitations
- you can not retrieve part of a collection: even if internally each entry of a map is stored as a column you can only retrieve the whole collection
- you have to choose whether creating an index on keys or on values, both simultaneously are not supported.
- since maps are typed you can't put mixed values -- in my examples all integers are now strings
I personally consider an extensive usage of this approach an anti-pattern.
HTH, Carlo
回答2:
To add to Carlo's answer:
- Indexes on collections are not available on older cassandra versions (pre 2.1). Secondary indices have limitations as well, and are eventually consistent. Dig deeper into this.
- Don't go looking for "give me all toys that are cars" type queries with this. As with most things cassandra, think of how you're going to access the data (queries) and model accordingly. Depending on queries, it's perfectly acceptable to have multiple tables storing toy data with different structures to facilitate different queries.
来源:https://stackoverflow.com/questions/25098451/cql3-each-row-to-have-its-own-schema