Best practice cassandra setup on ec2 with large amount of data

前端 未结 2 577
刺人心
刺人心 2021-01-31 21:23

I am doing a large migration from physical machines to ec2 instances.

As of right now I have 3 x.large nodes each with 4 instance store drives (raid-0 1.6TB). After I se

2条回答
  •  醉梦人生
    2021-01-31 21:49

    It really depends on your data. But first, you have to consider that Cassandra has its own backup/replication mechanism. If one of your nodes goes down the other nodes will still manage to have your data. The higher your replication factor the "safer" your data will be, and also the higher the replication factor the more Cassandra nodes you will need.

    If your data is very critical you'd have to ask yourself, can you effectively rebuild your data without the need of a backup in the ephemeral storage? Are you looking for better performance? Ephemeral storage performs much better than EBS and it would work great if your application is read/write intensive. In our case we used Cassandra with ephemeral storage populated with data that we already were storing in Amazon S3.

    If you can't rebuild your data and your data is very critical and you don't trust Cassandra, you can always use EBS at a performance penalty. The issue with Cassandra is that it works best if all your nodes in your cluster are the same too. So it's not easy to say have some nodes ephemeral backed and some nodes EBS backed. Unless you want to completely replicate your ephemeral cluster with an EBS backed cluster but it's not straight forward.

    You can more easily replicate mysql or couchdb instances using EBS backed instances (from ephemeral storage instances) because of their master slave setup. For example, you can make your mysql master run on an ephemeral storage instance and your mysql slave run on an EBS backed instance.

    There's another discussion about Ephemeral vs EBS here:

    How do I take a backup of aws ec2 instance/ephemeral storage?

    Hope it helps.

提交回复
热议问题