Spark file system watcher not working on Windows

问题

Two people tested Apache Spark on their computers...

We downloaded the version of Spark prebuild for Hadoop 2.6, went to the folder /spark-1.6.2-bin-hadoop2.6/, created a "tmp" directory, and ran:

$ bin/run-example org.apache.spark.examples.streaming.HdfsWordCount tmp

I added arbitrary files content1 and content2dssdgdg to that "tmp" directory.

-------------------------------------------
Time: 1467921704000 ms
-------------------------------------------
(content1,1)
(content2dssdgdg,1)

-------------------------------------------
Time: 1467921706000 ms

Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop.

Does Spark's file system watcher not work on Windows?

回答1:

John, I would suggest to use hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries. To use this hadoop version you need to use spark version that is pre-built for user provided hadoop. Make sure to set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html. Also put %HADOOP_HOME%\lib\native on PATH. Once setup, you need to follow steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount you need to pass hdfs:///tmp as directory path arg. All the best.

来源：https://stackoverflow.com/questions/38254405/spark-file-system-watcher-not-working-on-windows

标签

windows

Ubuntu

apache-spark

filesystemwatcher

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!