Reading the data written to s3 by Amazon Kinesis Firehose stream

后端 未结 9 2020
感情败类
感情败类 2021-02-18 15:17

I am writing record to Kinesis Firehose stream that is eventually written to a S3 file by Amazon Kinesis Firehose.

My record object looks like

ItemPurcha         


        
9条回答
  •  囚心锁ツ
    2021-02-18 15:38

    I also had the same problem, here is how I solved.

    1. replace "}{" with "}\n{"
    2. line split by "\n".

      input_json_rdd.map(lambda x : re.sub("}{", "}\n{", x, flags=re.UNICODE))
                    .flatMap(lambda line: line.split("\n"))
      

    A nested json object has several "}"s, so split line by "}" doesn't solve the problem.

提交回复
热议问题