Print the content of streams (Spark streaming) in Windows system

廉价感情. 提交于 2019-12-20 03:33:16

问题


I want just to print the content of streams to console. I wrote the following code but it does not print anything. Anyone can help me to read text file as stream in Spark?? Is there a problem related to Windows system?

public static void main(String[] args) throws Exception {

     SparkConf sparkConf = new SparkConf().setAppName("My app")
        .setMaster("local[2]")
        .setSparkHome("C:\\Spark\\spark-1.5.1-bin-hadoop2.6")
        .set("spark.executor.memory", "2g");

    JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));

    JavaDStream<String> dataStream = jssc.textFileStream("C://testStream//copy.csv");
    dataStream.print();

    jssc.start();
    jssc.awaitTermination();
}

UPDATE: The content of copy.csv is

0,0,12,5,0
0,0,12,5,0
0,1,2,0,42
0,0,0,0,264
0,0,12,5,0

回答1:


textFileStream is for Monitoring the hadoop Compatible Directories. This operation will watch the provided directory and as you add new files in the provided directory it will read/ stream the data from the newly added files.

You cannot read text/ csv files using textFileStream or rather I would say that you do not need streaming in case you are just reading the files.

My Suggestion would be to monitor some directory (may be HDFS or local file system) and then add files and capture the content of these new files using textFileStream.

May be in your code may be you can replace "C://testStream//copy.csv" with C://testStream" and once your Spark Streaming job is up and running then add file copy.csv to C://testStream folder and see the output on Spark Console.

OR

may be you can write another command line Scala/ Java program which read the files and throw the content over the Socket (at a certain PORT#) and next you can leverage socketTextStream for capturing and reading the data. Once you have read the data, you further apply other transformation or output operations.

You can also think of leveraging Flume too

Refer to API Documentation for more details




回答2:


This worked for me on Windows 7 and Spark 1.6.3: (removing the rest of code, important one is how to define the folder to monitor)

val ssc = ...
val lines = ssc.textFileStream("file:///D:/tmp/data")
...
print 

...

This monitors directory D:/tmp/data, ssc is my streaming context

Steps:

  1. Create a file say 1.txt in D:/tmp/data
  2. Enter some text
  3. Start the spart application
  4. Rename the file to data.txt (i believe any arbitrary name will do as long as it's changed while directory is monitored by spark)

One other thing I noticed is that I had to change the line separator to Unix style (used Notepad++) otherwise file wasn't getting picked up.




回答3:


Try below code, it works:

JavaDStream<String> dataStream = jssc.textFileStream("file:///C:/testStream/");


来源:https://stackoverflow.com/questions/35143402/print-the-content-of-streams-spark-streaming-in-windows-system

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!