Analytic in Spark Dataframe

前端未结

关注

 1  1680

In this problem we have two manager M1 and M2 , In team of manager M1 have two employee e1 & e2 and in team of M2 have two employee e4 & e5 Following is the Manager

相关标签:

1条回答

生来不讨喜

2021-01-17 02:58

According to what I understood from your question, here's what I suggest you to do.

First you need to create dataframes of managers with employees under them as

manager1

+---+------+
|sn |emp_id|
+---+------+
|a  |e1    |
|b  |e2    |
+---+------+

manager2

+---+------+
|sn |emp_id|
+---+------+
|a  |e4    |
|b  |e5    |
+---+------+

Then you should write a function that will return a list of employees under a manager as

import org.apache.spark.sql.functions._
def getEmployees(df : DataFrame): List[String] = {
  df.select(collect_list("emp_id")).first().getAs[mutable.WrappedArray[String]](0).toList
}

The final step is to write a function that will filter only the employees passed as

def getEmployeeDetails(df: DataFrame, list: List[String]) : DataFrame ={
  df.filter(df("emp_id").isin(list: _*))
}

now if you want to see employees under manager1(m1) then

getEmployeeDetails(df, getEmployees(m1)).show(false)

will return you

+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1    |1       |66000 |22       |
|e1    |2       |48000 |16       |
|e1    |3       |87000 |29       |
|e2    |1       |75000 |25       |
|e2    |4       |69000 |23       |
|e2    |5       |66000 |22       |
+------+--------+------+---------+

you can do the same for other managers too

you can do the same for employees too as

getEmployeeDetails(df, List("e1")).show(false)

will return the dataframe of employee1 (e1)

+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1    |1       |66000 |22       |
|e1    |2       |48000 |16       |
|e1    |3       |87000 |29       |
+------+--------+------+---------+

I hope the answer is helpful

0 讨论(0)