comparing two DataFrames, specific questions

后端 未结 1 1356
南方客
南方客 2020-12-06 14:56

I was read Andy\'s answer to the question Outputting difference in two Pandas dataframes side by side - highlighting the difference

i have two questions regarding th

相关标签:
1条回答
  • 2020-12-06 15:55

    Question 1

    ne_stacked is a pd.Series that consists of True and False values that indicate where df1 and df2 are not equal.

    ne_stacked[boolean_array] is a way to filter the series ne_stacked by eliminating the rows of ne_stacked where boolean_array is False and keeping the rows of ne_stacked where boolean_array is True.

    It so happens that ne_stacked is also a boolean array and so can be used to filter itself. Why would be want to do this? So we can see what the values of the index are after we've filtered.

    So ne_stacked[ne_stacked] is a subset of ne_stacked with only True values.

    Question 2

    np.where

    np.where does two things, if you only pass a conditional like in np.where(df1 != df2), you get a tuple of arrays where the first is a reference of all row indices to be used in conjunction with the second element of the tuple that is a reference to all column indices. I usually use it like this

    i, j = np.where(df1 != df2)
    

    Now I can get at all elements of df1 or df2 in which there are differences like

    df.values[i, j]
    

    Or I can assign to those cells

    df.values[i, j] = -99
    

    Or lots of other useful things.

    You can also use np.where as an if, then, else for arrays

    np.where(df1 != df2, -99, 99)
    

    To produce an array the same size as df1 or df2 where you have -99 in all the places where df1 != df2 and 99 in the rest.

    df.where

    On the other hand df.where evaluates the first argument of boolean values and returns an object of equal size to df where the cells that evaluated to True are kept and the rest are either np.nan or the values passed in the second argument of df.where

    df1.where(df1 != df2)
    

    Or

    df1.where(df1 != df2, -99)
    

    are they the same?
    Clearly they are not the "same". But you can use them similarly

    np.where(df1 != df2, df1, -99)
    

    Should be the same as

    df1.where(df1 != df2, -99).values
    
    0 讨论(0)
提交回复
热议问题