Not able to read json files: Spark Structured Streaming using java

后端 未结 1 854
臣服心动
臣服心动 2021-01-23 08:41

I have a python script which is getting stock data(as below) from NYSE every minute in a new file(single line). It contains data of 4 stocks - MSFT, ADBE, GOOGL and FB, as the b

相关标签:
1条回答
  • 2021-01-23 09:04

    Just figured it out, Keep the following two things in mind-

    1. While defining the schema make sure you name and order the fields exactly the same as in your json file.

    2. Initially, use only StringType for all your fields, you can apply a transformation to change it back to some specific data type.

    This is what worked for me-

        StructType priceData = new StructType()
                .add("open", DataTypes.StringType)
                .add("high", DataTypes.StringType)
                .add("low", DataTypes.StringType)
                .add("close", DataTypes.StringType)
                .add("volume", DataTypes.StringType);
    
        StructType schema = new StructType()
                .add("symbol", DataTypes.StringType)
                .add("timestamp", DataTypes.StringType)
                .add("priceData", priceData);
    
    
        Dataset<Row> rawData = session.readStream().format("json").schema(schema).json("/home/abhinavrawat/streamingData/data/*");
        rawData.writeStream().format("console").start().awaitTermination();
        session.close();
    

    See the output-

    +------+-------------------+--------------------+
    |symbol|          timestamp|           priceData|
    +------+-------------------+--------------------+
    |  MSFT|2019-05-02 15:59:00|[126.0800, 126.10...|
    |  ADBE|2019-05-02 15:59:00|[279.2900, 279.34...|
    | GOOGL|2019-05-02 15:59:00|[1166.4100, 1166....|
    |    FB|2019-05-02 15:59:00|[192.4200, 192.50...|
    |  MSFT|2019-05-02 15:59:00|[126.0800, 126.10...|
    |  ADBE|2019-05-02 15:59:00|[279.2900, 279.34...|
    | GOOGL|2019-05-02 15:59:00|[1166.4100, 1166....|
    |    FB|2019-05-02 15:59:00|[192.4200, 192.50...|
    |  MSFT|2019-05-02 15:59:00|[126.0800, 126.10...|
    |  ADBE|2019-05-02 15:59:00|[279.2900, 279.34...|
    | GOOGL|2019-05-02 15:59:00|[1166.4100, 1166....|
    |    FB|2019-05-02 15:59:00|[192.4200, 192.50...|
    |  MSFT|2019-05-02 15:59:00|[126.0800, 126.10...|
    |  ADBE|2019-05-02 15:59:00|[279.2900, 279.34...|
    | GOOGL|2019-05-02 15:59:00|[1166.4100, 1166....|
    |    FB|2019-05-02 15:59:00|[192.4200, 192.50...|
    |  MSFT|2019-05-02 15:59:00|[126.0800, 126.10...|
    |  ADBE|2019-05-02 15:59:00|[279.2900, 279.34...|
    | GOOGL|2019-05-02 15:59:00|[1166.4100, 1166....|
    |    FB|2019-05-02 15:59:00|[192.4200, 192.50...|
    +------+-------------------+--------------------+
    

    You can now flatten the priceData column using priceData.open, priceData.close etc.

    0 讨论(0)
提交回复
热议问题