How to compare two dataframe and print columns that are different in scala

前端 未结 2 455
时光取名叫无心
时光取名叫无心 2020-12-02 10:59

We have two data frames here:

the expected dataframe:

+------+---------+--------+----------+-------+--------+
|emp_id| emp_city|emp_name| emp_phone|e         


        
相关标签:
2条回答
  • 2020-12-02 11:19
    
    list_col=[]
    cols=df1.columns
    
    # Prepare list of dataframes/per column
    for col in cols:
      list_col.append(df1.select(col).subtract(df2.select(col)))
    
    # Render/persist
    for  l in list_col :
      if l.count() > 0 :
         l.show()
    
    0 讨论(0)
  • 2020-12-02 11:38

    From the scenario that is described in the above question, it looks like that difference has to found between columns and not rows.

    So, in order to do that we need to apply selective difference here, which will provide us the columns that have different values, along with the values.

    Now, to apply selective difference we have to write code something like this:

    1. First we need to find the columns in expected and actual dataframes.

      val columns = df1.schema.fields.map(_.name)

    2. Then we have to find difference columnwise.

      val selectiveDifferences = columns.map(col => df1.select(col).except(df2.select(col)))

    3. At last we need to find out which columns contains different values.

      selectiveDifferences.map(diff => {if(diff.count > 0) diff.show})

    And, we will get only the columns which contains different values. Like this:

    +--------+
    |emp_name|
    +--------+
    |  romino|
    +--------+
    

    I hope this helps!

    0 讨论(0)
提交回复
热议问题