问题
I'm trying to figure out how to migrate data from one cassandra cluster, to another cassandra cluster of a different ring size...say from a 5 node cluster to a 7 node cluster.
I started looking at sstable2json, since it creates a json file for the SSTable on that specific cassandra node. My thought was to do this for a column family on each node in the ring. So on a 5 node ring, this would give me 5 json files, one file for the data stored on in the column family that resides on each node.
Then I'd merge the json files into one file, and use json2sstable to import into a new cluster, of size, lets say 7. I was hoping that cassandra would then replicate/balance the data out evenly across the nodes in the ring, but I just read that SSTables are immutable once written. So if I did what I just mentioned, I'd end up with a ring with all the data in my column family on one node.
So can anyone help me figure out the process for migrating data from one cluster to a different cluster of a different ring size?
回答1:
Better: use bin/sstableloader on the sstables from the old ring, to stream to the new one.
Normally sstableloader is used in a sequence like this:
- Create sstables locally using SSTableWriter
- Use sstableloader to stream the data in the sstables to the right nodes (bin/sstableloader path-to-directory-full-of-sstables). The directory name is assumed to be the keyspace, which will be the case if you point it at an existing Cassandra data directory.
Since you're looking to stream data from an existing cluster A to a new cluter B, you can skip straight to running sstableloader against the data on each node in cluster A.
More details on using sstableloader in this blog post.
回答2:
You don't need to use sstable2json. If you have the space you can:
- get all the sstables from all of the nodes on the old ring
- put them all together on each of the new servers (renaming any which have the same names)
- run nodetool cleanup on each node in the new ring and they will throw away the data that doesn't belong to them.
回答3:
You may do some steps as following: 1. Join 7 nodes into 5 nodes clusters and set up each node with its own ring token. At this time, you may have a cluster with 12 nodes. 2. Remove 5 nodes from new cluster in step 1. 3. Set up the token ring for each node after moving 5 nodes in your own. 4. Repairing the 7 nodes cluster.
回答4:
I would venture to say that this isn't as big of a problem as it may seem.
- Create your new ring and define the tokens for each node appropriately as per http://wiki.apache.org/cassandra/Operations#Token_selection
- Import data into the new ring.
- The ring will balance itself based on the tokens you have defined http://wiki.apache.org/cassandra/Operations#Import_.2BAC8_export
来源:https://stackoverflow.com/questions/6781132/how-to-migrate-data-from-cassandra-cluster-of-size-n-to-a-different-cluster-of-s