问题
My goal is to collapse the below table into one single column. For this question specifically, I am asking how I can intelligently delete the yellow row because it is a duplicate of the gray row, although with less information.
The table has three categorical variables and 6 analysis/quantitative variables. Columns C1 and C2 are the only variables that need to match for a successful join; all of the . All blank cells are NaNs and python code for copying is below.
Question 1. (Yellow) All of the quantitative information stored in the yellow row is also stored in the grey row. The grey row has more information. Is there a way to intelligently delete a row of this type, similar to the Pandas drop_duplicates function? A hypothetical option would be
df.drop_duplicates(subset=df.columns[4:], ignoreNaNs=True)
Related Question (Blue) How to join two rows that have the same keys and complementary values
Data table
Current Progress
My current code includes this line to drop all rows where all quantitative variables are NaN.df.dropna(subset=df.columns[4:],how='all', inplace=True)
Also, this line for deleting all rows where all quantitative variables are the same. df.drop_duplicates(subset=df.columns[4:], inplace=True)
Example code that can be copied into an IDE.
import pandas as pd
dfO = [['S1','P3','H1',Timestamp('2004-12-04 00:00:00'),-15.0,-27.4,nan,-10.0,-15.0,nan],
['S1','P3','H1',Timestamp('2004-12-20 00:00:00'),nan,nan,nan,nan,nan,nan],
['S1','P3','H2',Timestamp('2004-12-20 00:00:00'),-15.0,nan,nan,-10.0,nan,nan],
['S1','P3','H3',Timestamp('2004-12-07 00:00:00'),nan,nan,nan,nan,-15.0,-8.0],
['S1','P3','H1', Timestamp('2004-12-04 00:00:00'), -15.0,-27.4,nan,-10.0, -15.0, nan]]
cols = ['C1 (PK)', 'C2 (FK)', 'C3', 'C4', 'Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6']
df = pd.DataFrame(data=dfO,columns=cols)
df.drop_duplicates(inplace=True)
df.dropna(subset=df.columns[4:],how='all', inplace=True)
df.drop_duplicates(subset=df.columns[4:], inplace=True)
来源:https://stackoverflow.com/questions/59772372/how-to-drop-rows-that-are-not-exact-duplicates-but-contain-no-new-information-m