问题
I have a dataframe storing a date, car_brand, color and a city:
date car_brand color city
"2020-01-01" porsche red paris
"2020-01-02" prosche red paris
"2020-01-03" porsche red london
"2020-01-04" porsche red paris
"2020-01-05" porsche red london
"2020-01-01" audi blue munich
"2020-01-02" audi red munich
"2020-01-03" audi red london
"2020-01-04" audi red london
"2020-01-05" audi red london
I now want to create from that a dataframe in the following way: Merge rows together where for consecutive days the car_brand, color and city match. So in the example I want to end up with a dataframe
date car_brand color city
["2020-01-01","2020-01-02"] porsche red paris
["2020-01-03"] porsche red london
["2020-01-04"] porsche red paris
["2020-01-05"] porsche red london
["2020-01-01"] audi blue munich
["2020-01-02"] audi red munich
["2020-01-03","2020-01-05"] audi red london
How can I achieve that? I tried with pd.concat and pd.merge but nothing worked so far. Thanks!
回答1:
If consecutive is important can check in list comprehension. This is an extension of technique to get a list
from a lambda
function on a group.
df = pd.read_csv(io.StringIO(""" date car_brand color city
"2020-01-01" porsche red paris
"2020-01-02" porsche red paris
"2020-01-03" porsche red london
"2020-01-04" porsche red paris
"2020-01-05" porsche red london
"2020-01-01" audi blue munich
"2020-01-02" audi red munich
"2020-01-03" audi red london
"2020-01-04" audi red london
"2020-01-05" audi red london"""), sep="\s+")
df["date"] = pd.to_datetime(df["date"])
df = (
df
.groupby([c for c in df.columns if c!="date"])["date"]
# only include if first date or if it's a consequetive date
.agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
.reset_index()
)
output
car_brand color city date
audi blue munich [2020-01-01 00:00:00]
audi red london [2020-01-03 00:00:00, 2020-01-04 00:00:00, 2020-01-05 00:00:00]
audi red munich [2020-01-02 00:00:00]
porsche red london [2020-01-03 00:00:00]
porsche red paris [2020-01-01 00:00:00, 2020-01-02 00:00:00]
来源:https://stackoverflow.com/questions/65690254/merge-values-of-a-dataframe-where-other-columns-match