Best way to store last-touched time in Cassandra

匿名 (未验证) 提交于 2019-12-03 09:02:45

问题:

I'm storing a last-touched time in a User table in Postgres, but there are many frequent updates and enough contention that I can see examples of 3 of the same updates deadlocking.

Cassandra seems a better fit for this - but should I devote a table to just this purpose? And I don't need old timestamps, just the latest. Should I use something other than Cassandra? If I should use Cassandra, any tips on table properties?

The table I have in mind:

CREATE TABLE ksp1.user_last_job_activities (     user_id bigint,     touched_at timeuuid,     PRIMARY KEY (user_id, touched_at) ) WITH CLUSTERING ORDER BY (touched_at DESC)     AND bloom_filter_fp_chance = 0.01     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'     AND comment = ''     AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}     AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}     AND dclocal_read_repair_chance = 0.1     AND default_time_to_live = 0     AND gc_grace_seconds = 864000     AND max_index_interval = 2048     AND memtable_flush_period_in_ms = 0     AND min_index_interval = 128     AND read_repair_chance = 0.0     AND speculative_retry = '99.0PERCENTILE'; 

Update

Thanks! I did some experiments around writetime and since I had to write a value anyway, I just wrote the time.

Like so:

CREATE TABLE simple_user_last_activity (     user_id bigint,     touched_at timestamp,     PRIMARY KEY (user_id) ); 

Then:

INSERT INTO simple_user_last_activity (user_id, touched_at) VALUES (6, dateof(now())); SELECT touched_at from simple_user_last_activity WHERE user_id = 6; 

Since touched_at is no longer in the primary key, only one record per user is stored.

Update 2

There's another option that I am going to go with. I can store the job_id too, which gives more data for analytics:

CREATE TABLE final_user_last_job_activities (     user_id bigint,     touched_at timestamp,     job_id bigint,     PRIMARY KEY (user_id, touched_at) )  WITH CLUSTERING ORDER BY (touched_at DESC) AND default_time_to_live = 604800; 

Adding the 1-week TTL takes care of expiring records - if there are none I return current time.

INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 5); INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6); INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 7); INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6);  SELECT * FROM final_user_last_job_activities LIMIT 1; 

Which gives me:

 user_id | touched_at               | job_id ---------+--------------------------+--------        5 | 2015-06-17 12:43:30+1200 |      6 

Simple benchmarks show no significant performance difference in storing or reading from the bigger table.

回答1:

Because c* is last write wins, you can simply keep the latest versions of each row.

You could, as MSD suggests, use writetime to pull the time of the write. But be careful because this is column specific and you can't use write time on your primary key columns. For example in a table as follows:

cqlsh> create TABLE test.test ( a int, b int, c int, d int, primary key (a))    ... ; cqlsh> insert INTO  test.test (a, b, c, d) VALUES ( 1,2,3,4)    ... ;  cqlsh> select * from test.test    ... ;   a | b    | c | d ---+------+---+------  1 |    2 | 3 |    4  (2 rows)  cqlsh> insert into test.test (a,c) values (1, 6); cqlsh> select * from test.test ;   a | b    | c | d ---+------+---+------  1 |    2 | 6 |    4  (2 rows) cqlsh> select writetime(a), writetime(b), writetime(c), writetime(d) from test.test    ... ; InvalidRequest: code=2200 [Invalid query] message="Cannot use selection function writeTime on PRIMARY KEY part a"  cqlsh> select  writetime(b), writetime(c), writetime(d) from test.test  ;   writetime(b)     | writetime(c)     | writetime(d) ------------------+------------------+------------------  1434424690700887 | 1434424690700887 | 1434424702420929 

Otherwise you can add a cql column with the timestamp:

create TABLE test.test ( a int, b int, c int, d int, touched_at timeuuid, primary key (a)) ; 

Some quick benchmarking would help you determine which is more performant.



回答2:

Cassandra has implicit support for writetime per each column. See this, looks like that is what you are looking for here.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!