I\'m unable to send each group of dataframe at a time to the executor.
I have a data as below in company_model_vals_df dataframe
.
-----
There are few options here -
var dist_company_model_vals_list = company_model_vals_df
.select("model_id","fiscal_quarter","fiscal_year").distinct().collectAsList
Then filter company_model_vals_df
with output of dist_company_model_vals_list
list which provides several datasets that you can work independently, like
def rowList = {
import org.apache.spark.sql._
var dfList:Seq[DataFrame] = Seq()
for (data <- dist_company_model_vals_list.zipWithIndex) {
val i = data._2
val row = data.-1
val filterCol = col($"model_id").equalTo(row.get(i).getInt(0).and($"fiscal_quarter").equalTo(row.get(i).getInt(1).and($"fiscal_year").equalTo(row.get(i).getInt(2))
val resultDf = company_model_vals_df.filter(filterCol)
dfList +: = resultDf
}
dfList
}
partitionBy("model_id","fiscal_quarter","fiscal_year")
method on dataframeWriterto write them separately.