Delete records in Cassandra table based on time range

后端 未结 1 636
耶瑟儿~
耶瑟儿~ 2021-01-21 22:30

I have a Cassandra table with schema:

CREATE TABLE IF NOT EXISTS TestTable(
    documentId text,
    sequenceNo bigint,
    messageData blob,
    clientId text
          


        
相关标签:
1条回答
  • 2021-01-21 23:10

    It's an interesting question...

    All columns that aren't part of the primary key have so-called WriteTime that could be retrieved using the writetime(column_name) function of CQL (warning: it doesn't work with collection columns, and return null for UDTs!). But because we don't have nested queries in the CQL, you will need to write a program to fetch data, filter out entries by WriteTime, and delete entries where WriteTime is older than your threshold. (note that value of writetime is in microseconds, not milliseconds as in CQL's timestamp type).

    The easiest way is to use Spark Cassandra Connector's RDD API, something like this:

    val timestamp = someDate.toInstant.getEpochSecond * 1000L
    val oldData = sc.cassandraTable(srcKeyspace, srcTable)
          .select("prk1", "prk2", "reg_col".writeTime as "writetime")
          .filter(row => row.getLong("writetime") < timestamp)
    oldData.deleteFromCassandra(srcKeyspace, srcTable, 
          keyColumns = SomeColumns("prk1", "prk2"))
    

    where: prk1, prk2, ... are all components of the primary key (documentId and sequenceNo in your case), and reg_col - any of the "regular" columns of the table that isn't collection or UDT (for example, clientId). It's important that list of the primary key columns in select and deleteFromCassandra was the same.

    0 讨论(0)
提交回复
热议问题