Delete records in Cassandra table based on time range

随声附和 提交于 2021-02-05 06:49:46

问题


I have a Cassandra table with schema:

CREATE TABLE IF NOT EXISTS TestTable(
    documentId text,
    sequenceNo bigint,
    messageData blob,
    clientId text
    PRIMARY KEY(documentId, sequenceNo))
WITH CLUSTERING ORDER BY(sequenceNo DESC);

Is there a way to delete the records which were inserted between a given time range? I know internally Cassandra must be using some timestamp to track the insertion time of each record, which would be used by features like TTL.

Since there is no explicit column for insertion timestamp in the given schema, is there a way to use the implicit timestamp or is there any better approach?

There is never any update to the records after insertion.


回答1:


It's an interesting question...

All columns that aren't part of the primary key have so-called WriteTime that could be retrieved using the writetime(column_name) function of CQL (warning: it doesn't work with collection columns, and return null for UDTs!). But because we don't have nested queries in the CQL, you will need to write a program to fetch data, filter out entries by WriteTime, and delete entries where WriteTime is older than your threshold. (note that value of writetime is in microseconds, not milliseconds as in CQL's timestamp type).

The easiest way is to use Spark Cassandra Connector's RDD API, something like this:

val timestamp = someDate.toInstant.getEpochSecond * 1000L
val oldData = sc.cassandraTable(srcKeyspace, srcTable)
      .select("prk1", "prk2", "reg_col".writeTime as "writetime")
      .filter(row => row.getLong("writetime") < timestamp)
oldData.deleteFromCassandra(srcKeyspace, srcTable, 
      keyColumns = SomeColumns("prk1", "prk2"))

where: prk1, prk2, ... are all components of the primary key (documentId and sequenceNo in your case), and reg_col - any of the "regular" columns of the table that isn't collection or UDT (for example, clientId). It's important that list of the primary key columns in select and deleteFromCassandra was the same.



来源:https://stackoverflow.com/questions/59859771/delete-records-in-cassandra-table-based-on-time-range

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!