I have two pandas dataframes defined as such:
_data_orig = [ [1, "Bob", 3.0], [2, "Sam", 2.0], [3, "Jane", 4.0] ] _columns = ["ID", "Name", "GPA"] _data_new = [ [1, "Bob", 3.2], [3, "Jane", 3.9], [4, "John", 1.2], [5, "Lisa", 2.2] ] _columns = ["ID", "Name", "GPA"] df1 = pd.DataFrame(data=_data_orig, columns=_columns) df2 = pd.DataFrame(data=_data_new, columns=_columns)
I need to find the following information:
- Find deletes where df1 is the original data set and df2 is the new data set
- I need to find the row changes for existing record between the two. Example ID == 1 should compare df2's ID == 1 to see if any column value changed for each row.
- Find any adds to df2 verse df1. Example return [4, "John", 1.2] and [5, "Lisa", 2.2]
For operation to find changes in rows, I figured I could look through df2 and check df1, but that seems slow, so I'm hoping to find a faster solution there.
For the other two operations, I really do not know what to do because when I try to compare the two dataframes I get:
ValueError: Can only compare identically-labeled DataFrame objects
Pandas version: '0.16.1'
Suggestions?