How to create a custom Encoder in Spark 2.X Datasets?

后端未结

关注

 3  1511

Spark Datasets move away from Row\'s to Encoder\'s for Pojo\'s/primitives. The Catalyst engine uses an ExpressionEncoder to convert column

相关标签:

3条回答

别那么骄傲

2021-02-01 06:37

Did you import the implicit encoders?

import spark.implicits._

http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.sql.Encoder

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2021-02-01 06:39
As far as I am aware nothing really changed since 1.6 and the solutions described in How to store custom objects in Dataset? are the only available options. Nevertheless your current code should work just fine with default encoders for product types.

To get some insight why your code worked in 1.x and may not work in 2.0.0 you'll have to check the signatures. In 1.x DataFrame.map is a method which takes function Row => T and transforms RDD[Row] into RDD[T].

In 2.0.0 DataFrame.map takes a function of type Row => T as well, but transforms Dataset[Row] (a.k.a DataFrame) into Dataset[T] hence T requires an Encoder. If you want to get the "old" behavior you should use RDD explicitly:
```
df.rdd.map(row => ???)
```
For Dataset[Row] map see Encoder error while trying to map dataframe row to updated row
0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2021-02-01 06:46

I imported spark.implicits._ Where spark is the SparkSession and it solved the error and custom encoders got imported.

Also, writing a custom encoder is a way out which I've not tried.

Working solution:- Create SparkSession and import the following

import spark.implicits._

0 讨论(0)
发布评论:

提交评论
- 加载中...