问题
Can someone point me to a working example of saving a csv file to Hbase table using Spark 2.2 Options that I tried and failed (Note: all of them work with Spark 1.6 for me)
- phoenix-spark
- hbase-spark
- it.nerdammer.bigdata : spark-hbase-connector_2.10
All of them finally after fixing everything give similar error to this Spark HBase
Thanks
回答1:
Add below parameters to your spark job-
spark-submit \
--conf "spark.yarn.stagingDir=/somelocation" \
--conf "spark.hadoop.mapreduce.output.fileoutputformat.outputdir=/somelocation" \
--conf "spark.hadoop.mapred.output.dir=/somelocation"
回答2:
Phoexin has plugin and jdbc thin client which can connect(read/write) to HBASE, example are in https://phoenix.apache.org/phoenix_spark.html
Option 1 : Connect via zookeeper url - phoenix plugin
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.phoenix.spark._
val sc = new SparkContext("local", "phoenix-test")
val sqlContext = new SQLContext(sc)
val df = sqlContext.load(
"org.apache.phoenix.spark",
Map("table" -> "TABLE1", "zkUrl" -> "phoenix-server:2181")
)
df
.filter(df("COL1") === "test_row_1" && df("ID") === 1L)
.select(df("ID"))
.show
Option 2 : Use JDBC thin client provied by phoenix query server
more info on https://phoenix.apache.org/server.html
jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF
来源:https://stackoverflow.com/questions/46477932/save-csv-file-to-hbase-table-using-spark-and-phoenix