I know there are many threads already on \'spark streaming connection refused\' issues. But most of these are in Linux or at least pointing to HDFS. I am running this on my loca
Within the code for socketTextStream
, Spark creates an instance of SocketInputDStream
which uses java.net.Socket
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L73
java.net.Socket
is a client socket, which means it is expecting there to be a server already running at the address and port you specify. Unless you have some service running a server on port 7777 of your local machine, the error you are seeing is as expected.
To see what I mean, try the following (you may not need to set master
or appName
in your environment).
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.SparkConf
object MyStream
{
def main(args:Array[String])
{
val sc = new StreamingContext(new SparkConf().setMaster("local").setAppName("socketstream"),Seconds(10))
val mystreamRDD = sc.socketTextStream("bbc.co.uk",80)
mystreamRDD.print()
sc.start()
sc.awaitTermination()
}
}
This doesn't return any content because the app doesn't speak HTTP to the bbc website but it does not get a connection refused exception.
To run a local server when on linux, I would use netcat with a simple command such as
cat data.txt | ncat -l -p 7777
I'm not sure what your best approach is in Windows. You could write another application which listens as a server on that port and sends some data.