Best practice cassandra setup on ec2 with large amount of data

前端 未结 2 585
刺人心
刺人心 2021-01-31 21:23

I am doing a large migration from physical machines to ec2 instances.

As of right now I have 3 x.large nodes each with 4 instance store drives (raid-0 1.6TB). After I se

2条回答
  •  星月不相逢
    2021-01-31 21:41

    I have been running Cassandra on EC2 for over 2 years. To address your concerns, you need to form a proper availability architecture on EC2 for your Cassandra cluster. Here is a bullet list for you to consider:

    1. Consider at least 3 zones for setting up your cluster;
    2. Use NetworkTopologyStrategy with EC2Snitch/EC2MultiRegionSnitch to propagate a replica of your data to each zone; this means that the machines in each zone will have your full data set combined; for example the strategy_options would be like {us-east:3}.

    The above two tips should satisfy basic availability in AWS and in case your queries are sent using LOCAL_QUORUM, your application will be fine even if one zone goes down.

    If you are concerned about 2 zones going down (don't recall it happened in AWS for the past 2 years of my use), then you can also add another region to your cluster.

    With the above, if any node dies for any reason, you can restore it from nodes in other zones. After all, CAssandra was designed to provide you with this kind of availability.

    About EBS vs Ephemeral:

    I have always been against using EBS volumes in anything production because it is one of the worst AWS service in terms of availability. They go down several times a year, and their downside usually cascades to other AWS services like ELBs and RDS. They are also like attached Network storage, so any read/write will have to go over the Network. Don't use them. Even DataStax doesn't recommend them:

    http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/../../cassandra/architecture/architecturePlanningEC2_c.html

    About Backups:

    I use a solution called Priam (https://github.com/Netflix/Priam) which was written by Netflix. It can take a nightly snapshot of your cluster and copy everything to S3. If you enable incremental_backups, it also uploads incremental backups to S3. In case a node goes down, you can trigger a restore on the specific node using a simple API call. It restores a lot faster and does not put a lot of streaming load on your other nodes. I also added a patch to it which let's you do fancy things like bringing up multiple DCs inside one AWS region.

    You can read about my setup here: http://aryanet.com/blog/shrinking-the-cassandra-cluster-to-fewer-nodes

    Hope above helps.

提交回复
热议问题