I have a Cassandra table with schema:
CREATE TABLE IF NOT EXISTS TestTable(
documentId text,
sequenceNo bigint,
messageData blob,
clientId text
It's an interesting question...
All columns that aren't part of the primary key have so-called WriteTime that could be retrieved using the writetime(column_name)
function of CQL (warning: it doesn't work with collection columns, and return null for UDTs!). But because we don't have nested queries in the CQL, you will need to write a program to fetch data, filter out entries by WriteTime, and delete entries where WriteTime is older than your threshold. (note that value of writetime
is in microseconds, not milliseconds as in CQL's timestamp
type).
The easiest way is to use Spark Cassandra Connector's RDD API, something like this:
val timestamp = someDate.toInstant.getEpochSecond * 1000L
val oldData = sc.cassandraTable(srcKeyspace, srcTable)
.select("prk1", "prk2", "reg_col".writeTime as "writetime")
.filter(row => row.getLong("writetime") < timestamp)
oldData.deleteFromCassandra(srcKeyspace, srcTable,
keyColumns = SomeColumns("prk1", "prk2"))
where: prk1
, prk2
, ... are all components of the primary key (documentId
and sequenceNo
in your case), and reg_col
- any of the "regular" columns of the table that isn't collection or UDT (for example, clientId
). It's important that list of the primary key columns in select
and deleteFromCassandra
was the same.