问题
I am trying to persist spark stream in to Cassandra, here is my code:
JavaDStream<BusinessPointNYCT> studentFileDStream = m_JavaStreamingContext.textFileStream(new File(fileDir, "BUSINESSPOINTS_NY_CT.csv").getAbsolutePath()).map(new BusinessPointMapFunction());
//Save it to Cassandra
CassandraStreamingJavaUtil.javaFunctions(studentFileDStream)
.writerBuilder("spatial_keyspace", "businesspoints_ny_ct", mapToRow(BusinessPointNYCT.class)).saveToCassandra();
My application is started without any error or warning, but the data is not persisting in to Cassandra. As per log it is deleting it after storing:
16/04/14 14:54:30 INFO JobScheduler: Added jobs for time 1460625870000 ms
16/04/14 14:54:30 INFO JobScheduler: Starting job streaming job 1460625870000 ms.0 from job set of time 1460625870000 ms
16/04/14 14:54:31 INFO SparkContext: Starting job: runJob at DStreamFunctions.scala:54
16/04/14 14:54:31 INFO DAGScheduler: Job 0 finished: runJob at DStreamFunctions.scala:54, took 0.001267 s
16/04/14 14:54:31 INFO JobScheduler: Finished job streaming job 1460625870000 ms.0 from job set of time 1460625870000 ms
16/04/14 14:54:31 INFO JobScheduler: Total delay: 1.028 s for time 1460625870000 ms (execution: 0.058 s)
16/04/14 14:54:31 INFO FileInputDStream: Cleared 0 old files that were older than 1460625810000 ms:
16/04/14 14:54:31 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
16/04/14 14:54:31 INFO ReceiverTracker: Cleanup old received batch data: 1460625810000 ms
16/04/14 14:54:31 INFO InputInfoTracker: remove old batch metadata:
16/04/14 14:54:40 INFO FileInputDStream: Finding new files took 0 ms
16/04/14 14:54:40 INFO FileInputDStream: New files at time 1460625880000 ms:
16/04/14 14:54:40 INFO JobScheduler: Added jobs for time 1460625880000 ms
16/04/14 14:54:40 INFO JobScheduler: Starting job streaming job 1460625880000 ms.0 from job set of time 1460625880000 ms
16/04/14 14:54:40 INFO SparkContext: Starting job: runJob at DStreamFunctions.scala:54
16/04/14 14:54:40 INFO DAGScheduler: Job 1 finished: runJob at DStreamFunctions.scala:54, took 0.000018 s
16/04/14 14:54:40 INFO JobScheduler: Finished job streaming job 1460625880000 ms.0 from job set of time 1460625880000 ms
16/04/14 14:54:40 INFO JobScheduler: Total delay: 0.022 s for time 1460625880000 ms (execution: 0.010 s)
16/04/14 14:54:40 INFO MapPartitionsRDD: Removing RDD 2 from persistence list
16/04/14 14:54:40 INFO MapPartitionsRDD: Removing RDD 1 from persistence list
16/04/14 14:54:40 INFO BlockManager: Removing RDD 2
16/04/14 14:54:40 INFO FileInputDStream: Cleared 0 old files that were older than 1460625820000 ms:
16/04/14 14:54:40 INFO BlockManager: Removing RDD 1
16/04/14 14:54:40 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()
16/04/14 14:54:40 INFO ReceiverTracker: Cleanup old received batch data: 1460625820000 ms
16/04/14 14:54:40 INFO InputInfoTracker: remove old batch metadata:
16/04/14 14:54:41 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
16/04/14 14:54:50 INFO FileInputDStream: Finding new files took 1 ms
16/04/14 14:54:50 INFO FileInputDStream: New files at time 1460625890000 ms:
I also verified it from a Cassandara client, it is not returning any data:
CassandraSimpleClient client = new CassandraSimpleClient();
client.connect("127.0.0.1");
//Session session = cluster.connect(“Your keyspace name”);
Session session = client.getActiveCluster().connect("spatial_keyspace");
ResultSet result = session.execute("SELECT * FROM spatial_keyspace.BUSINESSPOINTS_NY_CT");
I am stuck here, spark streaming is not getting data from text file ? Need help !!. Thanks
It does not works for me, i think it work only with HDFS, so i changed it to socket textStream(), and that is working fine.
m_JavaStreamingContext.socketTextStream("IN-6WX6152", 9090);
来源:https://stackoverflow.com/questions/36619211/spark-streaming-to-cassandra-not-persisiting