I\'m using Cassandra 2.0.9 for store quite big amounts of data, let\'s say 100Gb, in one column family. I would like to export this data to CSV in fast way. I tried:
Inspired by @user1859675 's answer, Here is how we can export data from Cassandra using Spark
val cassandraHostNode = "10.xxx.xxx.x5,10.xxx.xxx.x6,10.xxx.xxx.x7";
val spark = org.apache.spark.sql.SparkSession
.builder
.config("spark.cassandra.connection.host", cassandraHostNode)
.appName("Awesome Spark App")
.master("local[*]")
.getOrCreate()
val dataSet = spark.read.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "xxxxxxx", "keyspace" -> "xxxxxxx"))
.load()
val targetfilepath = "/opt/report_values/"
dataSet.write.format("csv").save(targetfilepath) // Spark 2.x
You will need "spark-cassandra-connector
" in your classpath for this to work.
The version I am using is below
com.datastax.spark
spark-cassandra-connector_2.11
2.3.2