How to create DataFrame from Scala's List of Iterables?

后端 未结 5 1202
孤独总比滥情好
孤独总比滥情好 2020-12-02 10:31

I have the following Scala value:

val values: List[Iterable[Any]] = Traces().evaluate(features).toList

and I want to convert it to a DataFr

相关标签:
5条回答
  • 2020-12-02 11:00

    As zero323 mentioned, we need to first convert List[Iterable[Any]] to List[Row] and then put rows in RDD and prepare schema for the spark data frame.

    To convert List[Iterable[Any]] to List[Row], we can say

    val rows = values.map{x => Row(x:_*)}
    

    and then having schema like schema, we can make RDD

    val rdd = sparkContext.makeRDD[RDD](rows)
    

    and finally create a spark data frame

    val df = sqlContext.createDataFrame(rdd, schema)
    
    0 讨论(0)
  • 2020-12-02 11:05

    The most concise way I've found:

    val df = spark.createDataFrame(List("A", "B", "C").map(Tuple1(_)))
    
    0 讨论(0)
  • 2020-12-02 11:05

    In Spark 2 we can use DataSet by just converting list to DS by toDS API

    val ds = list.flatMap(_.split(",")).toDS() // Records split by comma 
    

    or

    val ds = list.toDS()
    

    This more convenient than rdd or df

    0 讨论(0)
  • 2020-12-02 11:17

    Thats what spark implicits object is for. It allows you to convert your common scala collection types into DataFrame / DataSet / RDD. Here is an example with Spark 2.0 but it exists in older versions too

    import org.apache.spark.sql.SparkSession
    val values = List(1,2,3,4,5)
    
    val spark = SparkSession.builder().master("local").getOrCreate()
    import spark.implicits._
    val df = values.toDF()
    

    Edit: Just realised you were after 2d list. Here is something I tried on spark-shell. I converted a 2d List to List of Tuples and used implicit conversion to DataFrame:

    val values = List(List("1", "One") ,List("2", "Two") ,List("3", "Three"),List("4","4")).map(x =>(x(0), x(1)))
    import spark.implicits._
    val df = values.toDF
    

    Edit2: The original question by MTT was How to create spark dataframe from a scala list for a 2d list for which this is a correct answer. The original question is https://stackoverflow.com/revisions/38063195/1 The question was later changed to match an accepted answer. Adding this edit so that if someone else looking for something similar to the original question can find it.

    0 讨论(0)
  • 2020-12-02 11:22

    Simplest approach:

    val newList = yourList.map(Tuple1(_))
    val df = spark.createDataFrame(newList).toDF("stuff")
    
    0 讨论(0)
提交回复
热议问题