Export large amount of data from Cassandra to CSV

后端 未结 3 772
陌清茗
陌清茗 2021-02-05 05:46

I\'m using Cassandra 2.0.9 for store quite big amounts of data, let\'s say 100Gb, in one column family. I would like to export this data to CSV in fast way. I tried:

    <
3条回答
  •  [愿得一人]
    2021-02-05 06:13

    Inspired by @user1859675 's answer, Here is how we can export data from Cassandra using Spark

    val cassandraHostNode = "10.xxx.xxx.x5,10.xxx.xxx.x6,10.xxx.xxx.x7";
    val spark = org.apache.spark.sql.SparkSession
                                        .builder
                                        .config("spark.cassandra.connection.host",  cassandraHostNode)
                                        .appName("Awesome Spark App")
                                        .master("local[*]")
                                        .getOrCreate()
    
    val dataSet = spark.read.format("org.apache.spark.sql.cassandra")
                            .options(Map("table" -> "xxxxxxx", "keyspace" -> "xxxxxxx"))
                            .load()
    
    val targetfilepath = "/opt/report_values/"
    dataSet.write.format("csv").save(targetfilepath)  // Spark 2.x
    

    You will need "spark-cassandra-connector" in your classpath for this to work.
    The version I am using is below

        com.datastax.spark
        spark-cassandra-connector_2.11
        2.3.2
    

提交回复
热议问题