We recently deployed in production a distributed system that uses Cassandra as its persistent storage.
Not long after we noticed that there were many warnings about tombstones in Cassandra logs.
WARN [SharedPool-Worker-2] 2017-01-20 16:14:45,153 ReadCommand.java:508 -
Read 5000 live rows and 4771 tombstone cells for query
SELECT * FROM warehouse.locations WHERE token(address) >= token(D3-DJ-21-B-02) LIMIT 5000
(see tombstone_warn_threshold)
We found it quite surprising at first because we’ve only inserted data so far and didn’t expect to see that many tombstones in our database. After asking some people around no one seemed to have a clear explanation on what was going on in Cassandra.
In fact, the main misconception about tombstones is that people associate it with delete operations. While it’s true that tombstones are generated when data is deleted it is not the only case as we shall see. Continue reading “Understanding Cassandra tombstones”