'Connection Refused' error while running Spark Streaming on local machine

前端 未结 1 1772
猫巷女王i
猫巷女王i 2021-02-14 19:35

I know there are many threads already on \'spark streaming connection refused\' issues. But most of these are in Linux or at least pointing to HDFS. I am running this on my loca

1条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-14 19:52

    Within the code for socketTextStream, Spark creates an instance of SocketInputDStream which uses java.net.Socket https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L73

    java.net.Socket is a client socket, which means it is expecting there to be a server already running at the address and port you specify. Unless you have some service running a server on port 7777 of your local machine, the error you are seeing is as expected.

    To see what I mean, try the following (you may not need to set master or appName in your environment).

    import org.apache.spark.streaming.Seconds
    import org.apache.spark.streaming.StreamingContext
    import org.apache.spark.SparkConf
    
    object MyStream
    {
      def main(args:Array[String])
      {
        val sc = new StreamingContext(new SparkConf().setMaster("local").setAppName("socketstream"),Seconds(10))
        val mystreamRDD = sc.socketTextStream("bbc.co.uk",80)
        mystreamRDD.print()
        sc.start()
        sc.awaitTermination()
      }
    }
    

    This doesn't return any content because the app doesn't speak HTTP to the bbc website but it does not get a connection refused exception.

    To run a local server when on linux, I would use netcat with a simple command such as

    cat data.txt | ncat -l -p 7777
    

    I'm not sure what your best approach is in Windows. You could write another application which listens as a server on that port and sends some data.

    0 讨论(0)
提交回复
热议问题