How do I apply schema with nullable = false to json reading

前端未结

关注

 2  1785

I\'m trying to write some test cases using json files for dataframes (whereas production would be parquet). I\'m using spark-testing-base framework and I\'m running into a s

相关标签:

2条回答

野趣味

2021-01-11 18:14
There is a workaround, where rather than reading the json directly from the file, read it using RDD then it applies the schema. Below is code:
```
val expectedSchema = StructType(
    List(StructField("a", IntegerType, nullable = false),
         StructField("b", IntegerType, nullable = true))
  )


  test("testJSON") {
    val jsonRdd =spark.sparkContext.textFile("src/test/resources/test.json")
    //val readJson =sparksession.read.schema(expectedSchema).json("src/test/resources/test.json")
    val readJson = spark.read.schema(expectedSchema).json(jsonRdd)
    readJson.printSchema()
    assert(readJson.schema == expectedSchema)

  }
```
The test case passes and the print schema result is :
```
root
 |-- a: integer (nullable = false)
 |-- b: integer (nullable = true)
```
There is JIRA https://issues.apache.org/jira/browse/SPARK-10848 with apache Spark for this issue, which they say is not a problem and said that:

This should be resolved in the latest file format refactoring in Spark 2.0. Please reopen it if you still hit the problem. Thanks!

If you are getting the error you can open the JIRA again. I tested in spark 2.1.0, and still see the same issue
0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2021-01-11 18:17

The workAround aboves ensures there is a correct schema, but null values are set to default ones. In my case when an Int does not exist in the json String it is set to 0.

0 讨论(0)
发布评论:

提交评论
- 加载中...