Spark Streaming - Error when reading from Kinesis

扶醉桌前 提交于 2019-12-25 16:44:57

问题


I'm new with Apache Spark Streaming. Trying to build Spark to read value from Kinesis Stream. This is my python script

import settings
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils,   InitialPositionInStream
spark_context = SparkContext(master="local[2]", appName=settings.KINESIS_APP_NAME)

streaming_context = StreamingContext(sparkContext=spark_context, batchDuration=settings.BATCH_DURATION)

kinesis_good_stream = KinesisUtils.createStream(
ssc=streaming_context, kinesisAppName=settings.KINESIS_APP_NAME,
streamName=settings.KINESIS_GOOD_STREAM, endpointUrl=settings.KINESIS_ENDPOINT,
awsAccessKeyId=settings.AWS_ACCESS_KEY, awsSecretKey=settings.AWS_SECRET_KEY,
checkpointInterval=settings.KINESIS_CHECKPOINT_INTERVAL, regionName=settings.KINESIS_REGION,
initialPositionInStream=InitialPositionInStream.LATEST)

counts = kinesis_good_stream.flatMap(lambda line: line.split(" ")) \
    .map(lambda word: (word, 1)) \
    .reduceByKey(lambda a, b: a+b)
counts.pprint()

streaming_context.start()
streaming_context.awaitTermination()

The settings file

# Kinesis Configuration
KINESIS_REGION = 'ap-southeast-1'
KINESIS_ENDPOINT = 'kinesis.ap-southeast-1.amazonaws.com'
KINESIS_GOOD_STREAM = 'GoodStream'
KINESIS_BAD_STREAM = 'BadStream'
KINESIS_CHECKPOINT_INTERVAL = 2000
KINESIS_APP_NAME = 'test-spark'

# Spark context
BATCH_DURATION = 2

# AWS Credential
AWS_ACCESS_KEY = ''
AWS_SECRET_KEY = ''

I run the script with this command

spark-submit --jars spark-streaming-kinesis-asl-assembly.jar kinesis.py  

From my django project

INFO:snowplow_tracker.emitters:GET request finished with status code: 200
INFO:snowplow_tracker.emitters:POST request finished with status code: 200

From my collector, noticed that writing to Kinesis is successful

08:00:19.720 [pool-1-thread-9] INFO  c.s.s.c.s.sinks.KinesisSink - Successfully wrote 2 out of 2 records

For my Spark Streaming

-------------------------------------------
Time: 2016-11-25 07:59:25
-------------------------------------------

16/11/25 07:59:30 ERROR Executor: Exception in task 0.0 in stage 345.0 (TID 173)
java.lang.NoSuchMethodError: org.apache.spark.storage.BlockManager.get(Lorg/apache/spark/storage/BlockId;)Lscala/Option;
at org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD.getBlockFromBlockManager$1(KinesisBackedBlockRDD.scala:104)
at org.apache.spark.streaming.kinesis.KinesisBackedBlockRDD.compute(KinesisBackedBlockRDD.scala:117)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:390)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

For my Kinesis Stream, I'm using 1 Shard and set Spark Context with 2 Cores


回答1:


Managed to solve the error. I'm running with Spark-2.0.2 but I'm using streaming-kinesis-asl-assembly.2.10-2.0.0.jar which cause the java.lang.NoSuchMethodError.



来源:https://stackoverflow.com/questions/40800457/spark-streaming-error-when-reading-from-kinesis

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!