spark-streaming

Is there a way to dynamically stop Spark Structured Streaming?

丶灬走出姿态 提交于 2020-08-19 04:37:22
问题 In my scenario I have several dataSet that comes every now and then that i need to ingest in our platform. The ingestion processes involves several transformation steps. One of them being Spark. In particular I use spark structured streaming so far. The infrastructure also involve kafka from which spark structured streaming reads data. I wonder if there is a way to detect when there is nothing else to consume from a topic for a while to decide to stop the job. That is i want to run it for the

Getting : Error importing Spark Modules : No module named 'pyspark.streaming.kafka'

丶灬走出姿态 提交于 2020-08-08 20:22:18
问题 I have a requirement to push logs created from pyspark script to kafka. Iam doing POC so using Kafka binaries in windows machine. My versions are - kafka - 2.4.0, spark - 3.0 and python - 3.8.1. I am using pycharm editor. import sys import logging from datetime import datetime try: from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils except ImportError as e: print("Error importing Spark Modules :", e) sys.exit(1)

Getting : Error importing Spark Modules : No module named 'pyspark.streaming.kafka'

Deadly 提交于 2020-08-08 20:19:01
问题 I have a requirement to push logs created from pyspark script to kafka. Iam doing POC so using Kafka binaries in windows machine. My versions are - kafka - 2.4.0, spark - 3.0 and python - 3.8.1. I am using pycharm editor. import sys import logging from datetime import datetime try: from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils except ImportError as e: print("Error importing Spark Modules :", e) sys.exit(1)

Getting : Error importing Spark Modules : No module named 'pyspark.streaming.kafka'

前提是你 提交于 2020-08-08 20:18:23
问题 I have a requirement to push logs created from pyspark script to kafka. Iam doing POC so using Kafka binaries in windows machine. My versions are - kafka - 2.4.0, spark - 3.0 and python - 3.8.1. I am using pycharm editor. import sys import logging from datetime import datetime try: from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils except ImportError as e: print("Error importing Spark Modules :", e) sys.exit(1)

Getting : Error importing Spark Modules : No module named 'pyspark.streaming.kafka'

五迷三道 提交于 2020-08-08 20:18:17
问题 I have a requirement to push logs created from pyspark script to kafka. Iam doing POC so using Kafka binaries in windows machine. My versions are - kafka - 2.4.0, spark - 3.0 and python - 3.8.1. I am using pycharm editor. import sys import logging from datetime import datetime try: from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils except ImportError as e: print("Error importing Spark Modules :", e) sys.exit(1)

Getting : Error importing Spark Modules : No module named 'pyspark.streaming.kafka'

别来无恙 提交于 2020-08-08 20:18:08
问题 I have a requirement to push logs created from pyspark script to kafka. Iam doing POC so using Kafka binaries in windows machine. My versions are - kafka - 2.4.0, spark - 3.0 and python - 3.8.1. I am using pycharm editor. import sys import logging from datetime import datetime try: from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils except ImportError as e: print("Error importing Spark Modules :", e) sys.exit(1)

Spark Streaming Saving data to MySQL with foreachRDD() in Scala

怎甘沉沦 提交于 2020-08-03 02:22:54
问题 Spark Streaming Saving data to MySQL with foreachRDD() in Scala Please, can somebody give me a functional example about saving an Spark Streaming to MySQL DB using foreachRDD() in Scala. I have below code but it's not working. I just need a simple example, not sintaxis or theory. Thank you! package examples import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark._ import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.{Seconds, StreamingContext}

copy current row , modify it and add a new row in spark

耗尽温柔 提交于 2020-07-30 04:25:55
问题 I am using spark-sql-2.4.1v with java8 version. I have a scenario where I need to copy current row and create another row modifying few columns data how can this be achieved in spark-sql ? Ex : Given val data = List( ("20", "score", "school", 14 ,12), ("21", "score", "school", 13 , 13), ("22", "rate", "school", 11 ,14) ) val df = data.toDF("id", "code", "entity", "value1","value2") Current Output +---+-----+------+------+------+ | id| code|entity|value1|value2| +---+-----+------+------+------

The group member's supported protocols are incompatible with those of existing members

隐身守侯 提交于 2020-07-18 11:52:12
问题 I'm facing an issue related to Kafka. I'm having my current service ( Producer ) that sends the message to a Kafka topic ( events ). The service is using kafka_2.12 v1.0.0 , written in Java. I'm trying to integrate it with the sample project of spark-streaming as a Consumer service (here using kafka_2.11 v0.10.0, written in Scala) The message is sent successfully from Producer to the Kafka topic. However, I always receive the error stack below: Exception in thread "main" org.apache.kafka