I have a keyspace populated with data that was expensive to generate. I want two copies of this data within my cluster. I would like to end up with two keyspaces: lets call
" Is there a better way?"
All Cassandra data are stored in the data/ folder (check config value data_file_directories in cassandra.yaml). You may also check the saved_caches_directory and commitlog_directory config.
Inside the data folder, you'll have
One folder per keyspace
One folder for system keyspace
Some folder for authentication etc..
Inside each keyspace folder, you'll have
*-Data.db files which contain your real data
*-Filter.db files
*-Index.db files for index
...
To replicate data, you do a plain copy of those folders.
In our team, the ops use a crontab to schedule regular backup of Cassandra data this way.
Note: sometimes, you may miss live data which are still in memory or in memtable and not flushed yet to disk. You can trigger a full compaction before backuping data files. But full compaction may hurt you perf so be careful
Better answer: use the provided tool to take a snapshot of you DB:
http://www.datastax.com/docs/1.0/operations/backup_restore