I want to start by saying I am forced to use Spark 1.6
I am generating a DataFrame
from a JSON file like this:
{\"id\" : \"1201\", \"nam
The output from the map
is of type (String, Row)
therefore it cannot be encoded using RowEncoder
alone. You have to provide matching tuple encoder:
import org.apache.spark.sql.types._
import org.apache.spark.sql.{Encoder, Encoders}
import org.apache.spark.sql.catalyst.encoders.RowEncoder
val encoder = Encoders.tuple(
Encoders.STRING,
RowEncoder(
// The same as df.schema in your case
StructType(Seq(
StructField("age", StringType),
StructField("id", StringType),
StructField("name", StringType)))))
filterd.map{row => (
row.getAs[String]("age"),
PrintOne(row.getAs[Seq[Row]](0), row.getAs[String]("age")))
}(encoder)
Overall this approach looks like an anti-pattern. If you want to use more functional style you should avoid Dataset[Row]
:
case class Person(age: String, id: String, name: String)
filterd.as[(Seq[Person], String)].map {
case (people, age) => (age, (age, people(0).id, people(1).name))
}
or udf
.
Also please note that o.a.s.sql.catalyst
package, including GenericRowWithSchema
, is intended mostly for internal usage. Unless necessary otherwise, prefer o.a.s.sql.Row
.