Concatenate two PySpark dataframes

后端 未结 10 1311
独厮守ぢ
独厮守ぢ 2020-12-02 16:28

I\'m trying to concatenate two PySpark dataframes with some columns that are only on each of them:

from pyspark.sql.functions import randn, rand

df_1 = sqlC         


        
10条回答
  •  有刺的猬
    2020-12-02 17:14

    Maybe, you want to concatenate more of two Dataframes. I found a issue which use pandas Dataframe conversion.

    Suppose you have 3 spark Dataframe who want to concatenate.

    The code is the following:

    list_dfs = []
    list_dfs_ = []
    
    df = spark.read.json('path_to_your_jsonfile.json',multiLine = True)
    df2 = spark.read.json('path_to_your_jsonfile2.json',multiLine = True)
    df3 = spark.read.json('path_to_your_jsonfile3.json',multiLine = True)
    
    list_dfs.extend([df,df2,df3])
    
    for df in list_dfs : 
    
        df = df.select([column for column in df.columns]).toPandas()
        list_dfs_.append(df)
    
    list_dfs.clear()
    
    df_ = sqlContext.createDataFrame(pd.concat(list_dfs_))
    

提交回复
热议问题