I'm storing a last-touched time in a User table in Postgres, but there are many frequent updates and enough contention that I can see examples of 3 of the same updates deadlocking.
Cassandra seems a better fit for this - but should I devote a table to just this purpose? And I don't need old timestamps, just the latest. Should I use something other than Cassandra? If I should use Cassandra, any tips on table properties?
The table I have in mind:
CREATE TABLE ksp1.user_last_job_activities ( user_id bigint, touched_at timeuuid, PRIMARY KEY (user_id, touched_at) ) WITH CLUSTERING ORDER BY (touched_at DESC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE';
Update
Thanks! I did some experiments around writetime and since I had to write a value anyway, I just wrote the time.
Like so:
CREATE TABLE simple_user_last_activity ( user_id bigint, touched_at timestamp, PRIMARY KEY (user_id) );
Then:
INSERT INTO simple_user_last_activity (user_id, touched_at) VALUES (6, dateof(now())); SELECT touched_at from simple_user_last_activity WHERE user_id = 6;
Since touched_at is no longer in the primary key, only one record per user is stored.
Update 2
There's another option that I am going to go with. I can store the job_id too, which gives more data for analytics:
CREATE TABLE final_user_last_job_activities ( user_id bigint, touched_at timestamp, job_id bigint, PRIMARY KEY (user_id, touched_at) ) WITH CLUSTERING ORDER BY (touched_at DESC) AND default_time_to_live = 604800;
Adding the 1-week TTL takes care of expiring records - if there are none I return current time.
INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 5); INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6); INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 7); INSERT INTO final_user_last_job_activities (user_id, touched_at, job_id) VALUES (5, dateof(now()), 6); SELECT * FROM final_user_last_job_activities LIMIT 1;
Which gives me:
user_id | touched_at | job_id ---------+--------------------------+-------- 5 | 2015-06-17 12:43:30+1200 | 6
Simple benchmarks show no significant performance difference in storing or reading from the bigger table.