Unable to get any data when spark streaming program in run taking source as textFileStream

徘徊边缘 提交于 2020-03-24 00:02:28

问题


I am running following code on Spark shell

 >`spark-shell
scala> import org.apache.spark.streaming._
import org.apache.spark.streaming._

scala> import org.apache.spark._
import org.apache.spark._

scala> object sparkClient{
 | def main(args : Array[String])
 | {
 | val ssc = new StreamingContext(sc,Seconds(1))
 | val Dstreaminput = ssc.textFileStream("hdfs:///POC/SPARK/DATA/*")
 | val transformed = Dstreaminput.flatMap(word => word.split(" "))
 | val mapped = transformed.map(word => if(word.contains("error"))(word,"defect")else(word,"non-defect"))
 | mapped.print()
 | ssc.start()
 | ssc.awaitTermination()
 | }
 | }
defined object sparkClient

scala> sparkClient.main(null)

Output is blank as follows. No file is read and no streaming took place.


Time: 1510663547000 ms


Time: 1510663548000 ms


Time: 1510663549000 ms


Time: 1510663550000 ms


Time: 1510663551000 ms


Time: 1510663552000 ms


Time: 1510663553000 ms


Time: 1510663554000 ms


Time: 1510663555000 ms


The path which I have given as input in the above code is as follows:

[hadoopadmin@master ~]$ hadoop fs -ls /POC/SPARK/DATA/
17/11/14 18:04:32 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
Found 3 items
-rw-r--r--   2 hadoopadmin supergroup      17881 2017-09-21 11:02 
 /POC/SPARK/DATA/LICENSE
-rw-r--r--   2 hadoopadmin supergroup      24645 2017-09-21 11:04 
 /POC/SPARK/DATA/NOTICE
 -rw-r--r--   2 hadoopadmin supergroup        845 2017-09-21 12:35 
 /POC/SPARK/DATA/confusion.txt

Could anyone please explain where I am going wrong? Or is there anything wrong with the syntax(although I did not encounter any error) as I am new to spark?


回答1:


textFileStream won't read pre-existing data. It will include only new files:

created in the dataDirectory by atomically moving or renaming them into the data directory.

https://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources



来源:https://stackoverflow.com/questions/47286564/unable-to-get-any-data-when-spark-streaming-program-in-run-taking-source-as-text

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!