spark-streaming

how to call oracle stored proc in spark?

心已入冬 提交于 2019-12-26 08:15:49
问题 In my spark project , I am using spark-sql-2.4.1v. As part of my code , I need to call oracle stored procs in my spark job. how to call oracle stored procs? 回答1: You can try doing something like this, though I have never tried this personally in any implementation query = "exec SP_NAME" empDF = spark.read \ .format("jdbc") \ .option("url", "jdbc:oracle:thin:username/password@//hostname:portnumber/SID") \ .option("dbtable", query) \ .option("user", "db_user_name") \ .option("password",

SQLContext.gerorCreate is not a value

徘徊边缘 提交于 2019-12-25 17:58:27
问题 I am getting error SQLContext.gerorCreate is not a value of object org.apache.spark.SQLContext. This is my code import org.apache.spark.SparkConf import org.apache.spark.streaming.StreamingContext import org.apache.spark.streaming.Seconds import org.apache.spark.streaming.kafka.KafkaUtils import org.apache.spark.sql.functions import org.apache.spark.sql.SQLContext import org.apache.spark.sql.types import org.apache.spark.SparkContext import java.io.Serializable case class Sensor(id:String

SQLContext.gerorCreate is not a value

左心房为你撑大大i 提交于 2019-12-25 17:58:06
问题 I am getting error SQLContext.gerorCreate is not a value of object org.apache.spark.SQLContext. This is my code import org.apache.spark.SparkConf import org.apache.spark.streaming.StreamingContext import org.apache.spark.streaming.Seconds import org.apache.spark.streaming.kafka.KafkaUtils import org.apache.spark.sql.functions import org.apache.spark.sql.SQLContext import org.apache.spark.sql.types import org.apache.spark.SparkContext import java.io.Serializable case class Sensor(id:String

Why does foreachRDD not populate DataFrame with new content using StreamingContext.textFileStream?

别说谁变了你拦得住时间么 提交于 2019-12-25 16:59:10
问题 My problem is that, as I change my code into streaming mode and put my data frame into the foreach loop, the data frame shows empty table! I does't fill! I also can not put it into assembler.transform(). The error is: Error:(38, 40) not enough arguments for method map: (mapFunc: String => U)(implicit evidence$2: scala.reflect.ClassTag[U])org.apache.spark.streaming.dstream.DStream[U]. Unspecified value parameter mapFunc. val dataFrame = Train_DStream.map() My train.csv file is like below:

Spark Streaming - Error when reading from Kinesis

扶醉桌前 提交于 2019-12-25 16:44:57
问题 I'm new with Apache Spark Streaming. Trying to build Spark to read value from Kinesis Stream. This is my python script import settings from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream spark_context = SparkContext(master="local[2]", appName=settings.KINESIS_APP_NAME) streaming_context = StreamingContext(sparkContext=spark_context, batchDuration=settings.BATCH_DURATION) kinesis_good_stream

Trying to understand spark streaming windowing

谁说我不能喝 提交于 2019-12-25 14:06:28
问题 I'm investigating Spark Streaming as a solution for an anti-fraud service I am building, but I am struggling to figure out exactly how to apply it to my use case. The use case is: data from a user session is streamed, and a risk score is calculated for a given user, after 10 seconds of data is collected for that user. I am planning on using a batch interval time of 2 seconds, but need to use data from the full 10 second window. At first, updateStateByKey() seemed to be the perfect solution,

slice function in dstream spark streaming not work

陌路散爱 提交于 2019-12-25 10:02:10
问题 Spark streaming providing sliding window function for get rdd for last k. But I want to try use slice function to get rdd for last k, in a case I want to query rdd during range time before current time. delta = timedelta(seconds=30) datates = datamap.slice(datetime.now()-delta,datetime.now()) And I get this error when execute the code --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) /home/hduser/spark-1.5.0/<ipython

How to read a file using sparkstreaming and write to a simple file using Scala?

限于喜欢 提交于 2019-12-25 09:18:35
问题 I'm trying to read a file using a scala SparkStreaming program. The file is stored in a directory on my local machine and trying to write it as a new file on my local machine itself. But whenever I write my stream and store it as parquet I end up getting blank folders. This is my code : Logger.getLogger("org").setLevel(Level.ERROR) val spark = SparkSession .builder() .master("local[*]") .appName("StreamAFile") .config("spark.sql.warehouse.dir", "file:///C:/temp") .getOrCreate() import spark

SparkStreaming creating RDD and doing union in a transform operation with ssc.checkpoint() giving an error

牧云@^-^@ 提交于 2019-12-25 08:20:09
问题 I am trying to transform an RDD in a dstream by adding changing the log with the maximum timestamp, and adding a duplicate copy of it with some modifications. Please note I am using ssc.checkpoint() and the error seems to go away if I comment it out. The following is the example code: JavaDStream<LogMessage> logMessageWithHB = logMessageMatched.transform(new Function<JavaRDD<LogMessage>, JavaRDD<LogMessage>>() { @Override public JavaRDD<LogMessage> call(JavaRDD<LogMessage> logMessageJavaRDD)

Spark textFileStream on S3

老子叫甜甜 提交于 2019-12-25 08:15:58
问题 Should the file name contain a number for the tetFileStream to pickup? my program is picking up new files only if the file name contains a number. Ignoring all other files even if they are new. Is there any setting I need to change for picking up all the files? Please help 回答1: No. it scans the directory for new files which appear within the window. If you are writing to S3, do a direct write with your code, as the file doesn't appear until the final close() —no need to rename. In constrast,