In this problem we have two manager M1 and M2 , In team of manager M1 have two employee e1 & e2 and in team of M2 have two employee e4 & e5 Following is the Manager
According to what I understood from your question, here's what I suggest you to do.
First you need to create dataframes of managers with employees under them as
manager1
+---+------+
|sn |emp_id|
+---+------+
|a |e1 |
|b |e2 |
+---+------+
manager2
+---+------+
|sn |emp_id|
+---+------+
|a |e4 |
|b |e5 |
+---+------+
Then you should write a function that will return a list of employees under a manager as
import org.apache.spark.sql.functions._
def getEmployees(df : DataFrame): List[String] = {
df.select(collect_list("emp_id")).first().getAs[mutable.WrappedArray[String]](0).toList
}
The final step is to write a function that will filter only the employees passed as
def getEmployeeDetails(df: DataFrame, list: List[String]) : DataFrame ={
df.filter(df("emp_id").isin(list: _*))
}
now if you want to see employees under manager1(m1) then
getEmployeeDetails(df, getEmployees(m1)).show(false)
will return you
+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1 |1 |66000 |22 |
|e1 |2 |48000 |16 |
|e1 |3 |87000 |29 |
|e2 |1 |75000 |25 |
|e2 |4 |69000 |23 |
|e2 |5 |66000 |22 |
+------+--------+------+---------+
you can do the same for other managers too
you can do the same for employees too as
getEmployeeDetails(df, List("e1")).show(false)
will return the dataframe of employee1 (e1)
+------+--------+------+---------+
|emp_id|month_id|salary|work_days|
+------+--------+------+---------+
|e1 |1 |66000 |22 |
|e1 |2 |48000 |16 |
|e1 |3 |87000 |29 |
+------+--------+------+---------+
I hope the answer is helpful