Save CSV file to hbase table using Spark and Phoenix

问题

Can someone point me to a working example of saving a csv file to Hbase table using Spark 2.2 Options that I tried and failed (Note: all of them work with Spark 1.6 for me)

phoenix-spark
hbase-spark
it.nerdammer.bigdata : spark-hbase-connector_2.10

All of them finally after fixing everything give similar error to this Spark HBase

Thanks

回答1:

Add below parameters to your spark job-

spark-submit \
--conf "spark.yarn.stagingDir=/somelocation" \
--conf "spark.hadoop.mapreduce.output.fileoutputformat.outputdir=/s‌omelocation" \
--conf "spark.hadoop.mapred.output.dir=/somelocation"

回答2:

Phoexin has plugin and jdbc thin client which can connect(read/write) to HBASE, example are in https://phoenix.apache.org/phoenix_spark.html

Option 1 : Connect via zookeeper url - phoenix plugin

            import org.apache.spark.SparkContext
            import org.apache.spark.sql.SQLContext
            import org.apache.phoenix.spark._

            val sc = new SparkContext("local", "phoenix-test")
            val sqlContext = new SQLContext(sc)

            val df = sqlContext.load(
              "org.apache.phoenix.spark",
              Map("table" -> "TABLE1", "zkUrl" -> "phoenix-server:2181")
            )

            df
              .filter(df("COL1") === "test_row_1" && df("ID") === 1L)
              .select(df("ID"))
              .show

Option 2 : Use JDBC thin client provied by phoenix query server

more info on https://phoenix.apache.org/server.html

jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF

来源：https://stackoverflow.com/questions/46477932/save-csv-file-to-hbase-table-using-spark-and-phoenix

标签

apache-spark

hbase

phoenix