I am new to Spark and Cassandra. On trying to submit a spark job, I am getting an error while connecting to Cassandra.
Details:
Versions:
Spa
you did not specified spark.cassandra.connection.host
by default spark assume that cassandra host is same as spark master node.
var sc:SparkContext=_
val conf = new SparkConf().setAppName("Cassandra Demo").setMaster(master)
.set("spark.cassandra.connection.host", "192.168.101.11")
c=new SparkContext(conf)
val rdd = sc.cassandraTable("test", "words")
rdd.toArray.foreach(println)
it should work if you have properly set seed nodein cassandra.yaml
The issue resolved. It was due to some mess up with the dependencies. I built a jar with dependencies and passed it to spark-submit, instead of specifying dependent jars separately.
It's worked finally :
steps :
This is an issue with version of the cassandra-driver-core jar's dependency.
The provided cassandra's version is 2.0
The provided cassandra-driver-core jar's version is 2.1.5
The jar should be the same as the version of the cassandra running.
In this case, the included jar file should be cassandra-driver-core-2.0.0.jar
I struggled with this issue overnight, and finally got a combination that works. I am writing it down for those who may run into similar issue.
First of all, this is a version issue cassandra-driver-core's dependency. But to track down the exact combination that works takes me quite a bit time.
Secondly, this is the combination that works for me.
"com.datastax.spark" %% "spark-cassandra-connector" % "1.4.0",
"com.datastax.cassandra" % "cassandra-driver-core" % "2.1.5"
Thirdly, let me clarify my frustrations. With spark-cassandra-connector 1.5.0, I can run the assembly with spark-submit with --master "local[2]" on the same machine with remote cassandra connection without any problem. Any combination of connector 1.5.0, 1.6.0 with Cassandra 2.0, 2.1, 2.2, 3,4 works well. But if I try to submit the job to a cluster from the same machine (NodeManager) with --master yarn --deploy-mode cluster, then I will always run into the problem: Failed to open native connection to Cassandra at {192.168.122.12}:9042
What is going on here? Any from DataStarX can take a look at this issue? I can only guess it has something to do with "cqlversion", which should match the version of Cassandra cluster.
Anybody know a better solution? [cassandra], [apache-spark]