spark-streaming | 易学教程

how to call oracle stored proc in spark?

阅读更多关于 how to call oracle stored proc in spark?

问题 In my spark project , I am using spark-sql-2.4.1v. As part of my code , I need to call oracle stored procs in my spark job. how to call oracle stored procs? 回答1: You can try doing something like this, though I have never tried this personally in any implementation query = "exec SP_NAME" empDF = spark.read \ .format("jdbc") \ .option("url", "jdbc:oracle:thin:username/password@//hostname:portnumber/SID") \ .option("dbtable", query) \ .option("user", "db_user_name") \ .option("password",

SQLContext.gerorCreate is not a value

阅读更多关于 SQLContext.gerorCreate is not a value

问题 I am getting error SQLContext.gerorCreate is not a value of object org.apache.spark.SQLContext. This is my code import org.apache.spark.SparkConf import org.apache.spark.streaming.StreamingContext import org.apache.spark.streaming.Seconds import org.apache.spark.streaming.kafka.KafkaUtils import org.apache.spark.sql.functions import org.apache.spark.sql.SQLContext import org.apache.spark.sql.types import org.apache.spark.SparkContext import java.io.Serializable case class Sensor(id:String

SQLContext.gerorCreate is not a value

阅读更多关于 SQLContext.gerorCreate is not a value

Why does foreachRDD not populate DataFrame with new content using StreamingContext.textFileStream?

阅读更多关于 Why does foreachRDD not populate DataFrame with new content using StreamingContext.textFileStream?

问题 My problem is that, as I change my code into streaming mode and put my data frame into the foreach loop, the data frame shows empty table! I does't fill! I also can not put it into assembler.transform(). The error is: Error:(38, 40) not enough arguments for method map: (mapFunc: String => U)(implicit evidence$2: scala.reflect.ClassTag[U])org.apache.spark.streaming.dstream.DStream[U]. Unspecified value parameter mapFunc. val dataFrame = Train_DStream.map() My train.csv file is like below:

Spark Streaming - Error when reading from Kinesis

阅读更多关于 Spark Streaming - Error when reading from Kinesis

问题 I'm new with Apache Spark Streaming. Trying to build Spark to read value from Kinesis Stream. This is my python script import settings from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream spark_context = SparkContext(master="local[2]", appName=settings.KINESIS_APP_NAME) streaming_context = StreamingContext(sparkContext=spark_context, batchDuration=settings.BATCH_DURATION) kinesis_good_stream

Trying to understand spark streaming windowing

阅读更多关于 Trying to understand spark streaming windowing

问题 I'm investigating Spark Streaming as a solution for an anti-fraud service I am building, but I am struggling to figure out exactly how to apply it to my use case. The use case is: data from a user session is streamed, and a risk score is calculated for a given user, after 10 seconds of data is collected for that user. I am planning on using a batch interval time of 2 seconds, but need to use data from the full 10 second window. At first, updateStateByKey() seemed to be the perfect solution,

slice function in dstream spark streaming not work

阅读更多关于 slice function in dstream spark streaming not work

问题 Spark streaming providing sliding window function for get rdd for last k. But I want to try use slice function to get rdd for last k, in a case I want to query rdd during range time before current time. delta = timedelta(seconds=30) datates = datamap.slice(datetime.now()-delta,datetime.now()) And I get this error when execute the code --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) /home/hduser/spark-1.5.0/<ipython

How to read a file using sparkstreaming and write to a simple file using Scala?

阅读更多关于 How to read a file using sparkstreaming and write to a simple file using Scala?

问题 I'm trying to read a file using a scala SparkStreaming program. The file is stored in a directory on my local machine and trying to write it as a new file on my local machine itself. But whenever I write my stream and store it as parquet I end up getting blank folders. This is my code : Logger.getLogger("org").setLevel(Level.ERROR) val spark = SparkSession .builder() .master("local[*]") .appName("StreamAFile") .config("spark.sql.warehouse.dir", "file:///C:/temp") .getOrCreate() import spark

SparkStreaming creating RDD and doing union in a transform operation with ssc.checkpoint() giving an error

阅读更多关于 SparkStreaming creating RDD and doing union in a transform operation with ssc.checkpoint() giving an error

问题 I am trying to transform an RDD in a dstream by adding changing the log with the maximum timestamp, and adding a duplicate copy of it with some modifications. Please note I am using ssc.checkpoint() and the error seems to go away if I comment it out. The following is the example code: JavaDStream<LogMessage> logMessageWithHB = logMessageMatched.transform(new Function<JavaRDD<LogMessage>, JavaRDD<LogMessage>>() { @Override public JavaRDD<LogMessage> call(JavaRDD<LogMessage> logMessageJavaRDD)

Spark textFileStream on S3

阅读更多关于 Spark textFileStream on S3

问题 Should the file name contain a number for the tetFileStream to pickup? my program is picking up new files only if the file name contains a number. Ignoring all other files even if they are new. Is there any setting I need to change for picking up all the files? Please help 回答1: No. it scans the directory for new files which appear within the window. If you are writing to S3, do a direct write with your code, as the file doesn't appear until the final close() —no need to rename. In constrast,