问题
How to combine duplicate rows in pandas, filling in missing values?
In the example below, some rows have missing values in the c1
column, but the c2
column has duplicates that can be used as an index to look up and fill in those missing values.
the input data looks like this:
c1 c2
id
0 10.0 a
1 NaN b
2 30.0 c
3 10.0 a
4 20.0 b
5 NaN c
desired output:
c1 c2
0 10 a
1 20 b
2 30 c
But how to do this?
Here is the code to generate the example data:
import pandas as pd
df = pd.DataFrame({
'c1': [10, float('nan'), 30, 10, 20, float('nan')]
'c2': [100, 200, 300, 100, 200, 300],
})
回答1:
I think need sort_values with drop_duplicates:
df = df.sort_values(['c1','c2']).drop_duplicates(['c2'])
print (df)
c1 c2
0 10.0 100
4 20.0 200
2 30.0 300
Or first remove rows with NaN
s by dropna:
df = df.dropna(subset=['c1']).drop_duplicates(['c2'])
print (df)
c1 c2
0 10.0 100
2 30.0 300
4 20.0 200
df = df.dropna(subset=['c1']).drop_duplicates(['c1','c2'])
print (df)
c1 c2
0 10.0 100
2 30.0 300
4 20.0 200
来源:https://stackoverflow.com/questions/51302813/how-to-combine-duplicate-rows-in-pandas