Add new rows to pyspark Dataframe

前端 未结 2 1685
悲&欢浪女
悲&欢浪女 2020-12-19 08:56

Am very new pyspark but familiar with pandas. I have a pyspark Dataframe

# instantiate Spark
spark = SparkSession.builder.getOrCreate()

# make some test da         


        
相关标签:
2条回答
  • 2020-12-19 09:47

    As thebluephantom has already said union is the way to go. I'm just answering your question to give you a pyspark example:

    # if not already created automatically, instantiate Sparkcontext
    spark = SparkSession.builder.getOrCreate()
    
    columns = ['id', 'dogs', 'cats']
    vals = [(1, 2, 0), (2, 0, 1)]
    
    df = spark.createDataFrame(vals, columns)
    
    newRow = spark.createDataFrame([(4,5,7)], columns)
    appended = df.union(newRow)
    appended.show()
    

    Please have also a lookat the databricks FAQ: https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html

    0 讨论(0)
  • 2020-12-19 09:54

    From something I did, using union, showing a block partial coding - you need to adapt of course to your own situation:

    val dummySchema = StructType(
    StructField("phrase", StringType, true) :: Nil)
    var dfPostsNGrams2 = spark.createDataFrame(sc.emptyRDD[Row], dummySchema)
    for (i <- i_grams_Cols) {
        val nameCol = col({i})
        dfPostsNGrams2 = dfPostsNGrams2.union(dfPostsNGrams.select(explode({nameCol}).as("phrase")).toDF )
     }
    

    union of DF with itself is the way to go.

    0 讨论(0)
提交回复
热议问题