问题
Let assume 10 devices(dev01,dev02,dev03..etc).
It send data with some interval time,we collect those data,so our data schema is
dev01 :int
signalname :string
signaltime :date/time[with YY-MM-DD HHMMSS.mm]
Extradata :String
I want to push data into cassandra ,which way is best to store those data?
My Query is Like ,
1 Need to retrive device based current day data,or with some date range?
2 5 Device current day data?
I am not sure the following way to store data into cassadra is best model
Standard columnfamily Name:signalname
row key :dev01
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
columnname :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue :Json data
row key :dev02
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
columnname :timeseries(20120801124205)[YYMMDD HHMMSS][next second data]
columnvalue :Json data
Or
Super columnfamily :signalname
row key :Clientid1
supercolumnname :dev01
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
supercolumnname :dev02
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
row key :Clientid2
supercolumnname :dev03
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
supercolumnname :dev04
columnname :timeseries(20120801124204)[YYMMDD HHMMSS]
columnvalue :Json data
kindly help me out regarding this issue, Any other Way?
Thanks&Regards, Kannadhasan
回答1:
I see 3 issues with your approach here which I will address below:
- super column families,
- thrift vs cql3,
- json data as cell values.
Before you go ahead: the use super column families is discouraged. Read more here. Composite keys (as described below) are the way to go.
Also, you might need to read up on CQL3, since thrift is a legacy API since 1.2.
Instead of storing json data, you may make use of native collection data types like lists, and maps etc. If you still want to work with JSON, there is improved JSON support in in Cassandra since version 2.2.
In general, it is pretty straightforward to query per device and per timeperiod:
- you row key would be the device id and the column key a timeuuid
- To avoid hot spots, you could add "bucket" counters to the row key (create a composite row/partition key) to rotate the nodes
- You can then query for time ranges if you know the row/device id.
Alternatively you could use your signal type as a row key (and timeuuid/timestamp as a column key) if you want to query data for multiple devices (but one event type) at once. Read more on timeseries data in cassandra in this blog entry.
Hope that helps!
来源:https://stackoverflow.com/questions/17987921/cassandra-timeseries-datamodel