问题
I have spark streaming job. I am using Cassandra as datastore. I have stream which is need to be joined with cassandra table. I am using spark-cassandra-connector, there is great method joinWithCassandraTable which is as far as I can understand implementing inner join with cassandra table
val source: DStream[...] = ...
source.foreachRDD { rdd =>
rdd.joinWithCassandraTable( "keyspace", "table" ).map{ ...
}
}
So the question is how can I implement left outer join with cassandra table?
Thanks in advance
回答1:
This is currently not supported, but there is a ticket to introduce the functionality. Please vote on it if you would like it introduced in the future.
https://datastax-oss.atlassian.net/browse/SPARKC-181
A workaround is suggested in the ticket
回答2:
As RussS mentioned such feature is not available in spark-cassandra-connector driver yet. So as workaround I propose the following code snippet.
rdd.foreachPartition { partition =>
CassandraConnector(rdd.context.getConf).withSessionDo { session =>
for (
leftSide <- partition;
rightSide <- {
val rs = session.execute(s"""SELECT * FROM "keyspace".table where id = "${leftSide._2}"""")
val iterator = new PrefetchingResultSetIterator(rs, 100)
if (iterator.isEmpty) Seq(None)
else iterator.map(r => Some(r.getString(1)))
}
) yield (leftSide, rightSide)
}
}
来源:https://stackoverflow.com/questions/32520182/how-implement-left-or-right-join-using-spark-cassandra-connector