Spark Streaming - processing binary data file
问题 I'm using pyspark 1.6.0. I have existing pyspark code to read binary data file from AWS S3 bucket. Other Spark/Python code will parse the bits in the data to convert into int, string, boolean and etc. Each binary file has one record of data. In PYSPARK I read the binary file using: sc.binaryFiles("s3n://.......") This is working great as it gives a tuple of (filename and the data) but I'm trying to find an equivalent PYSPARK streaming API to read binary file as a stream (hopefully the