问题
I am trying to iterate over three data frames to find the difference between them. I have a master data frame which contains everything and two other data frames which contains partial of master data frame. I am trying to write a python code to identify what is missing in the other two files. Master file looks like following:
ID Name
1 Mike
2 Dani
3 Scott
4 Josh
5 Nate
6 Sandy
second data frame looks like following:
ID Name
1 Mike
2 Dani
3 Scott
6 Sandy
Third data frame looks like following:
ID Name
1 Mike
2 Dani
3 Scott
4 Josh
5 Nate
So there will be two output data frame. Desired output for looks like following for second data frame:
ID Name
4 Josh
5 Nate
desired output for third data frame looks like following:
ID Name
6 Sandy
I didn't find anything similar on Google. I tried this:
for i in second['ID'], third['ID']:
if i not in master['ID']:
print(i)
It returns all the data in master file.
Also if I try this code :
import pandas as pd
names = ["Mike", "Dani", "Scott", "Josh", "Nate", "Sandy"]
ids = [1, 2, 3, 4, 5, 6]
master = pd.DataFrame({"ID": ids, "Name": names})
# print(master)
names_second = ["Mike", "Dani", "Scott", "Sandy"]
ids_second = [1, 2, 3, 6]
second = pd.DataFrame({"ID": ids_second, "Name": names_second})
# print(second)
names_third = ["Mike", "Dani", "Scott", "Josh", "Nate"]
ids_third = [1, 2, 3, 4, 5]
third = pd.DataFrame({"ID": ids_third, "Name": names_third})
# print(third)
for i in master['ID']:
if i not in second["ID"]:
print("NOT IN SECOND", i)
if i not in third["ID"]:
print("NOT IN THIRD", i)
OUTPUT ::
NOT IN SECOND 4
NOT IN SECOND 5
NOT IN THIRD 5
NOT IN SECOND 6
NOT IN THIRD 6
Why it says NOT IN SECOND 6
and NOT IN THIRD 5
?
Any suggestion? Thanks in advance.
回答1:
You can try using .isin
with ~
to filter dataframes
. To compare with second you can use master[~master.ID.isin(second.ID)]
and similar for third:
cmp_master_second, cmp_master_third = master[~master.ID.isin(second.ID)], master[~master.ID.isin(third.ID)]
print(cmp_master_second)
print('\n-------- Seperate dataframes -----------\n')
print(cmp_master_third)
Result:
Name
ID
4 Josh
5 Nate
-------- Seperate dataframes -----------
Name
ID
6 Sandy
回答2:
You could do a set
difference on the master and the other DataFrame
s
In [315]: set(d1[0]) - set(d2[0])
Out[315]: {'Josh', 'Nate'}
In [316]: set(d1[0]) - set(d3[0])
Out[316]: {'Sandy'}
来源:https://stackoverflow.com/questions/50959624/iterate-over-different-dataframe