I am doing a large migration from physical machines to ec2 instances.
As of right now I have 3 x.large nodes each with 4 instance store drives (raid-0 1.6TB). After I se
I have been running Cassandra on EC2 for over 2 years. To address your concerns, you need to form a proper availability architecture on EC2 for your Cassandra cluster. Here is a bullet list for you to consider:
The above two tips should satisfy basic availability in AWS and in case your queries are sent using LOCAL_QUORUM, your application will be fine even if one zone goes down.
If you are concerned about 2 zones going down (don't recall it happened in AWS for the past 2 years of my use), then you can also add another region to your cluster.
With the above, if any node dies for any reason, you can restore it from nodes in other zones. After all, CAssandra was designed to provide you with this kind of availability.
About EBS vs Ephemeral:
I have always been against using EBS volumes in anything production because it is one of the worst AWS service in terms of availability. They go down several times a year, and their downside usually cascades to other AWS services like ELBs and RDS. They are also like attached Network storage, so any read/write will have to go over the Network. Don't use them. Even DataStax doesn't recommend them:
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/../../cassandra/architecture/architecturePlanningEC2_c.html
About Backups:
I use a solution called Priam (https://github.com/Netflix/Priam) which was written by Netflix. It can take a nightly snapshot of your cluster and copy everything to S3. If you enable incremental_backups, it also uploads incremental backups to S3. In case a node goes down, you can trigger a restore on the specific node using a simple API call. It restores a lot faster and does not put a lot of streaming load on your other nodes. I also added a patch to it which let's you do fancy things like bringing up multiple DCs inside one AWS region.
You can read about my setup here: http://aryanet.com/blog/shrinking-the-cassandra-cluster-to-fewer-nodes
Hope above helps.
It really depends on your data. But first, you have to consider that Cassandra has its own backup/replication mechanism. If one of your nodes goes down the other nodes will still manage to have your data. The higher your replication factor the "safer" your data will be, and also the higher the replication factor the more Cassandra nodes you will need.
If your data is very critical you'd have to ask yourself, can you effectively rebuild your data without the need of a backup in the ephemeral storage? Are you looking for better performance? Ephemeral storage performs much better than EBS and it would work great if your application is read/write intensive. In our case we used Cassandra with ephemeral storage populated with data that we already were storing in Amazon S3.
If you can't rebuild your data and your data is very critical and you don't trust Cassandra, you can always use EBS at a performance penalty. The issue with Cassandra is that it works best if all your nodes in your cluster are the same too. So it's not easy to say have some nodes ephemeral backed and some nodes EBS backed. Unless you want to completely replicate your ephemeral cluster with an EBS backed cluster but it's not straight forward.
You can more easily replicate mysql or couchdb instances using EBS backed instances (from ephemeral storage instances) because of their master slave setup. For example, you can make your mysql master run on an ephemeral storage instance and your mysql slave run on an EBS backed instance.
There's another discussion about Ephemeral vs EBS here:
How do I take a backup of aws ec2 instance/ephemeral storage?
Hope it helps.