As part of a unit test, I need to test two DataFrames for equality. The order of the columns in the DataFrames is not important to me. However, it seems to matter to Panda
Sorting column only works if the row and column labels match across the frames. Say, you have 2 dataframes with identical values in cells but with different labels,then the sort solution will not work. I ran into this scenario when implementing k-modes clustering using pandas.
I got around it with a simple equals function to check cell equality(code below)
def frames_equal(df1,df2) :
if not isinstance(df1,DataFrame) or not isinstance(df2,DataFrame) :
raise Exception(
"dataframes should be an instance of pandas.DataFrame")
if df1.shape != df2.shape:
return False
num_rows,num_cols = df1.shape
for i in range(num_rows):
match = sum(df1.iloc[i] == df2.iloc[i])
if match != num_cols :
return False
return True
The most common intent is handled like this:
def assertFrameEqual(df1, df2, **kwds ):
""" Assert that two dataframes are equal, ignoring ordering of columns"""
from pandas.util.testing import assert_frame_equal
return assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_names=True, **kwds )
Of course see pandas.util.testing.assert_frame_equal
for other parameters you can pass