问题
Two people tested Apache Spark on their computers...
We downloaded the version of Spark prebuild for Hadoop 2.6, went to the folder /spark-1.6.2-bin-hadoop2.6/
, created a "tmp" directory, and ran:
$ bin/run-example org.apache.spark.examples.streaming.HdfsWordCount tmp
I added arbitrary files content1
and content2dssdgdg
to that "tmp" directory.
-------------------------------------------
Time: 1467921704000 ms
-------------------------------------------
(content1,1)
(content2dssdgdg,1)
-------------------------------------------
Time: 1467921706000 ms
Spark detected those files with the above terminal output on my Ubuntu 15.10 laptop, but not on my colleague's Windows 7 Enterprise laptop.
Does Spark's file system watcher not work on Windows?
回答1:
John, I would suggest to use hadoop compiled binaries for 64 bits windows 7 hosted at https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries. To use this hadoop version you need to use spark version that is pre-built for user provided hadoop. Make sure to set SPARK_DIST_CLASSPATH as mentioned in https://spark.apache.org/docs/latest/hadoop-provided.html. Also put %HADOOP_HOME%\lib\native on PATH. Once setup, you need to follow steps 3.1,3.3,3.4 and 3.5 mentioned at https://wiki.apache.org/hadoop/Hadoop2OnWindows to start local HDFS. While running HdfsWordCount you need to pass hdfs:///tmp as directory path arg. All the best.
来源:https://stackoverflow.com/questions/38254405/spark-file-system-watcher-not-working-on-windows