Iterate over different dataframe

半腔热情 提交于 2021-01-27 18:37:48

问题


I am trying to iterate over three data frames to find the difference between them. I have a master data frame which contains everything and two other data frames which contains partial of master data frame. I am trying to write a python code to identify what is missing in the other two files. Master file looks like following:

ID  Name
1   Mike
2   Dani
3   Scott
4   Josh
5   Nate
6   Sandy

second data frame looks like following:

ID  Name
1   Mike
2   Dani
3   Scott
6   Sandy

Third data frame looks like following:

ID  Name
1   Mike
2   Dani
3   Scott
4   Josh
5   Nate

So there will be two output data frame. Desired output for looks like following for second data frame:

ID  Name
4   Josh
5   Nate

desired output for third data frame looks like following:

ID  Name
6   Sandy

I didn't find anything similar on Google. I tried this:

for i in second['ID'], third['ID']:
   if i not in master['ID']:
     print(i)

It returns all the data in master file.

Also if I try this code :

import pandas as pd

names = ["Mike", "Dani", "Scott", "Josh", "Nate", "Sandy"]
ids = [1, 2, 3, 4, 5, 6]
master = pd.DataFrame({"ID": ids, "Name": names})
# print(master)

names_second = ["Mike", "Dani", "Scott", "Sandy"]
ids_second = [1, 2, 3, 6]
second = pd.DataFrame({"ID": ids_second, "Name": names_second})
# print(second)

names_third = ["Mike", "Dani", "Scott", "Josh", "Nate"]
ids_third = [1, 2, 3, 4, 5]
third = pd.DataFrame({"ID": ids_third, "Name": names_third})
# print(third)
for i in master['ID']:
    if i not in second["ID"]:
        print("NOT IN SECOND", i)
    if i not in third["ID"]:
        print("NOT IN THIRD", i)

OUTPUT ::

NOT IN SECOND 4
NOT IN SECOND 5
NOT IN THIRD 5
NOT IN SECOND 6
NOT IN THIRD 6

Why it says NOT IN SECOND 6 and NOT IN THIRD 5?

Any suggestion? Thanks in advance.


回答1:


You can try using .isin with ~ to filter dataframes. To compare with second you can use master[~master.ID.isin(second.ID)] and similar for third:

cmp_master_second, cmp_master_third = master[~master.ID.isin(second.ID)],  master[~master.ID.isin(third.ID)]

print(cmp_master_second)
print('\n-------- Seperate dataframes -----------\n')
print(cmp_master_third)

Result:

    Name
ID      
4   Josh
5   Nate

-------- Seperate dataframes -----------

     Name
ID       
6   Sandy



回答2:


You could do a set difference on the master and the other DataFrames

In [315]: set(d1[0]) - set(d2[0])
Out[315]: {'Josh', 'Nate'}

In [316]: set(d1[0]) - set(d3[0])
Out[316]: {'Sandy'}


来源:https://stackoverflow.com/questions/50959624/iterate-over-different-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!