问题
I'm new with Apache Spark Streaming. Trying to build Spark to read value from Kinesis Stream. This is my python script
import settings
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream
spark_context = SparkContext(master="local[2]", appName=settings.KINESIS_APP_NAME)
streaming_context = StreamingContext(sparkContext=spark_context, batchDuration=settings.BATCH_DURATION)
kinesis_good_stream = KinesisUtils.createStream(
ssc=streaming_context, kinesisAppName=settings.KINESIS_APP_NAME,
streamName=settings.KINESIS_GOOD_STREAM, endpointUrl=settings.KINESIS_ENDPOINT,
awsAccessKeyId=settings.AWS_ACCESS_KEY, awsSecretKey=settings.AWS_SECRET_KEY,
checkpointInterval=settings.KINESIS_CHECKPOINT_INTERVAL, regionName=settings.KINESIS_REGION,
initialPositionInStream=InitialPositionInStream.LATEST)
counts = kinesis_good_stream.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint()
streaming_context.start()
streaming_context.awaitTermination()
The settings file
# Kinesis Configuration
KINESIS_REGION = 'ap-southeast-1'
KINESIS_ENDPOINT = 'kinesis.ap-southeast-1.amazonaws.com'
KINESIS_GOOD_STREAM = 'GoodStream'
KINESIS_BAD_STREAM = 'BadStream'
KINESIS_CHECKPOINT_INTERVAL = 2000
KINESIS_APP_NAME = 'test-spark'
# Spark context
BATCH_DURATION = 2
# AWS Credential
AWS_ACCESS_KEY = ''
AWS_SECRET_KEY = ''
I run the script with this command
spark-submit --jars spark-streaming-kinesis-asl-assembly.jar kinesis.py
From my django project
INFO:snowplow_tracker.emitters:GET request finished with status code: 200
INFO:snowplow_tracker.emitters:POST request finished with status code: 200
From my collector, noticed that writing to Kinesis is successful
08:00:19.720 [pool-1-thread-9] INFO c.s.s.c.s.sinks.KinesisSink - Successfully wrote 2 out of 2 records
For my Spark Streaming
-------------------------------------------
Time: 2016-11-25 07:59:25
-------------------------------------------
16/11/25 07:59:30 ERROR Executor: Exception in task 0.0 in stage 345.0 (TID 173)
java.lang.NoSuchMethodError: org.apache.spark.storage.BlockManager.get(Lorg/apache/spark/storage/BlockId;)Lscala/Option;
at org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD.getBlockFromBlockManager$1(KinesisBackedBlockRDD.scala:104)
at org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD.compute(KinesisBackedBlockRDD.scala:117)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:390)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
For my Kinesis Stream, I'm using 1 Shard and set Spark Context with 2 Cores
回答1:
Managed to solve the error. I'm running with Spark-2.0.2 but I'm using streaming-kinesis-asl-assembly.2.10-2.0.0.jar which cause the java.lang.NoSuchMethodError.
来源:https://stackoverflow.com/questions/40800457/spark-streaming-error-when-reading-from-kinesis