问题
I need to implement a database for a testing system. It is designed to store test data for future statistical analysis. It has to be Cassandra based.
I've designed a schema, but since this is my first attempt at NoSQL design, I would like to get some feedback.
I will first describe the data I wish to save, then describe two basic queries and finally present my suggested design.
I intend on using Cassandra 1.1 so I tried to use Composite Columns in my design, however, feel free to suggest super columns or what ever seems right.
Data:
The basic unit we are testing is an alien. Each alien has a unique ID. Each alien has a number of bodyparts. Also, each alien is part of a family of aliens. The families have unique names.
When we run a test, we run it on a few bodyparts of an alien group. For example, we take a few families and run a test on all of their eyes and mouths.
There are a few kind of tests. We log each test with it's own test unique ID.
When we run a test, we sample all relevant alien bodyparts every couple of minutes and gather some statistics.
Basic Queries:
- Per each family or alien or unique bodypart - which tests it participated in.
- Per each test ID - which families or aliens or unique bodyparts participated in it.
- In the future, statistical analysis of all data...
My attempt at design:
GeneralAliensData : { // Column Family - general data on aliens.
[FamilyID][AlienID][Bodypart] : { //Composite Columns as Row keys
Race: 'Blurgons' // column
Shoesize: 5 // column
Favorite probe: 'fun, toy' // column
}
}
TestsData : { // Column Family - we sample each test every couple of minutes...
[TestID][AlienID][Bodypart][MinutesFromTestStart]: { //Composite Columns as Rowkeys
Temperture: 30 // column
Size: 5 // column
}
}
BodypartTestParticipation : { // Column Family - all the tests a unique bodypart passed...
[FamilyID][AlienID][Bodypart]: { //Composite Columns as Row keys
TestID: 105 // column
TestID: 564 // column
...
}
}
This is it. Since I'm a real beginner in databases and Cassandra in particular, I'd appreciate any input.
Thank you for your time.
回答1:
How large will your dataset eventually be in rows? We use PlayOrm to store relational data in noSQL sometimes which works great and tables can go into the X millions of rows. If you are going into the billions/trillions of rows, then we use PlayOrm to partition the same data so it scales.
So, do you need ability to scale? You may want to check out the wide row pattern(PlayOrm makes heavy use of that). Wide rows can help you index stuff for very fast lookups.
I really don't get this part of your stuff
TestsData : { // Column Family - we sample each test every couple of minutes...
[TestID][AlienID][Bodypart][MinutesFromTestStart]: { //Composite Columns as Rowkeys
Temperture: 30 // column
Size: 5 // column
}
}
Shouldn't it be more a wide row here? where testid is the row key and you have many composite names for the other data? and wide rows should not be larger than 10 million columns so make sure no test data rows would go over that. So a wide row might be
testid -> alienId:fk23=null, alienId:fk25=null, etc. etc. temperture=30, size=5
later, Dean
来源:https://stackoverflow.com/questions/13131254/cassandra-database-data-model-critic-my-schema-design