问题
I'm running Cassandra cluster
Software version: 2.0.9
Nodes: 3
Replication factor: 2
I'm having a very simple table where I insert and update data.
CREATE TABLE link_list (
url text,
visited boolean,
PRIMARY KEY ((url))
);
There is no expire on rows and I'm not doing any DELETEs. As soon as I run my application it quickly slows down due to the increasing number of tombstoned cells:
Read 3 live and 535 tombstoned cells
It gets up to thousands in few minutes.
My question is what is responsible for generating those cells if I'm not doing any deletions?
// Update
This is the implementation I'm using to talk to Cassandra with com.datastax.driver.
public class LinkListDAOCassandra implements DAO {
public void save(Link link) {
save(new VisitedLink(link.getUrl(), false));
}
@Override
public void save(Model model) {
save((Link) model);
}
public void update(VisitedLink link) {
String cql = "UPDATE link_list SET visited = ? WHERE url = ?";
Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getVisited(), link.getUrl());
}
public void save(VisitedLink link) {
String cql = "SELECT url FROM link_list_inserted WHERE url = ?";
if(Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl()).all().size() == 0) {
cql = "INSERT INTO link_list_inserted (url) VALUES (?)";
Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl());
cql = "INSERT INTO link_list (url, visited) VALUES (?,?)";
Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, link.getUrl(), link.getVisited());
}
}
public VisitedLink getByUrl(String url) {
String cql = "SELECT * FROM link_list WHERE url = ?";
for(Row row : Cassandra.DB.execute(cql, url)) {
return new VisitedLink(row.getString("url"), row.getBool("visited"));
}
return null;
}
public List<Link> getLinks(int limit) {
List<Link> links = new ArrayList();
ResultSet results;
String cql = "SELECT * FROM link_list WHERE visited = False LIMIT ?";
for(Row row : Cassandra.DB.execute(cql, ConsistencyLevel.QUORUM, limit)) {
try {
links.add(new Link(new URL(row.getString("url"))));
}
catch(MalformedURLException e) { }
}
return links;
}
}
This is the execute implementation
public ResultSet execute(String cql, ConsistencyLevel cl, Object... values) {
PreparedStatement statement = getSession().prepare( cql ).setConsistencyLevel(cl);
BoundStatement boundStatement = new BoundStatement( statement );
boundStatement.bind(values);
return session.execute(boundStatement);
}
// Update 2
An interesting finding from the cfstats shows that only one table has tombstones. It's link_list_visited
. Does it mean that updating a column with a secondary index will create tombstones?
Table (index): link_list.link_list_visited
SSTable count: 2
Space used (live), bytes: 5055920
Space used (total), bytes: 5055991
SSTable Compression Ratio: 0.3491883995187955
Number of keys (estimate): 256
Memtable cell count: 15799
Memtable data size, bytes: 1771427
Memtable switch count: 1
Local read count: 85703
Local read latency: 2.805 ms
Local write count: 484690
Local write latency: 0.028 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 32
Compacted partition minimum bytes: 8240
Compacted partition maximum bytes: 7007506
Compacted partition mean bytes: 3703162
Average live cells per slice (last five minutes): 3.0
Average tombstones per slice (last five minutes): 674.0
回答1:
The only major differences between a secondary index and an extra column family to manually hold the index is that the secondary index only contains information about the current node (i.e. it does not contain information about other node's data) and the operations over the secondary index as a result of an update on the primary table are atomic operations. Other than that you can see it as a regular column family with the same weak spots, a high number of updates on the primary column family will lead to a high number of deletes on the index table because the updates on the primary table will be translated as a delete/insert operation on the index table. Said deletions in the index table are the source of the tombstones. Cassandra deletes are logical deletes until the next repair process (when the tombstones will be removed).
Hope it helps!
来源:https://stackoverflow.com/questions/25443979/tombstoned-cells-without-delete