Writing R data frames returned from SparkR:::map

后端 未结 1 1500
暖寄归人
暖寄归人 2020-12-21 11:24

I am using SparkR:::map and my function returns a large-ish R dataframe for each input row, each of the same shape. I would like to write these dataframes as parquet files w

相关标签:
1条回答
  • 2020-12-21 11:40

    Assuming your data looks more or less like this:

    rdd <- SparkR:::parallelize(sc, 1:5)
    dfs <- SparkR:::map(rdd, function(x) mtcars[(x * 5):((x + 1) * 5), ])
    

    and all columns have supported types you can convert it to the row-wise format:

    rows <- SparkR:::flatMap(dfs, function(x) {
      data <- as.list(x)
      args <- list(FUN = list, SIMPLIFY = FALSE, USE.NAMES = FALSE)
      do.call(mapply, append(args, data))
    })
    

    call createDataFrame:

    sdf <- createDataFrame(sqlContext, rows)
    head(sdf)
    
    ##    mpg cyl  disp  hp drat   wt  qsec vs am gear carb
    ## 1 18.7   8 360.0 175 3.15 3.44 17.02  0  0    3    2
    ## 2 18.1   6 225.0 105 2.76 3.46 20.22  1  0    3    1
    ## 3 14.3   8 360.0 245 3.21 3.57 15.84  0  0    3    4
    ## 4 24.4   4 146.7  62 3.69 3.19 20.00  1  0    4    2
    ## 5 22.8   4 140.8  95 3.92 3.15 22.90  1  0    4    2
    ## 6 19.2   6 167.6 123 3.92 3.44 18.30  1  0    4    4
    
    printSchema(sdf)
    
    ## root
    ##  |-- mpg: double (nullable = true)
    ##  |-- cyl: double (nullable = true)
    ##  |-- disp: double (nullable = true)
    ##  |-- hp: double (nullable = true)
    ##  |-- drat: double (nullable = true)
    ##  |-- wt: double (nullable = true)
    ##  |-- qsec: double (nullable = true)
    ##  |-- vs: double (nullable = true)
    ##  |-- am: double (nullable = true)
    ##  |-- gear: double (nullable = true)
    ##  |-- carb: double (nullable = true)
    

    and simply use write.df / saveDF.

    Problem is you shouldn't use an internal API in the first place. One of the reasons it was removed in the initial release is not robust enough to be used directly. Not to mention it is still not clear if it will be supported or even available in the future. Just saying...

    0 讨论(0)
提交回复
热议问题