Merge values of a dataframe where other columns match

旧街凉风 提交于 2021-02-11 18:24:48

问题


I have a dataframe storing a date, car_brand, color and a city:

 date              car_brand    color     city
 "2020-01-01"      porsche      red       paris
 "2020-01-02"      prosche      red       paris
 "2020-01-03"      porsche      red       london
 "2020-01-04"      porsche      red       paris
 "2020-01-05"      porsche      red       london
 "2020-01-01"      audi         blue      munich
 "2020-01-02"      audi         red       munich
 "2020-01-03"      audi         red       london
 "2020-01-04"      audi         red       london
 "2020-01-05"      audi         red       london

I now want to create from that a dataframe in the following way: Merge rows together where for consecutive days the car_brand, color and city match. So in the example I want to end up with a dataframe

 date                             car_brand    color     city
 ["2020-01-01","2020-01-02"]      porsche      red       paris
 ["2020-01-03"]                   porsche      red       london
 ["2020-01-04"]                   porsche      red       paris
 ["2020-01-05"]                   porsche      red       london
 ["2020-01-01"]                   audi         blue      munich
 ["2020-01-02"]                   audi         red       munich
 ["2020-01-03","2020-01-05"]      audi         red       london

How can I achieve that? I tried with pd.concat and pd.merge but nothing worked so far. Thanks!


回答1:


If consecutive is important can check in list comprehension. This is an extension of technique to get a list from a lambda function on a group.

df = pd.read_csv(io.StringIO(""" date              car_brand    color     city
 "2020-01-01"      porsche      red       paris
 "2020-01-02"      porsche      red       paris
 "2020-01-03"      porsche      red       london
 "2020-01-04"      porsche      red       paris
 "2020-01-05"      porsche      red       london
 "2020-01-01"      audi         blue      munich
 "2020-01-02"      audi         red       munich
 "2020-01-03"      audi         red       london
 "2020-01-04"      audi         red       london
 "2020-01-05"      audi         red       london"""), sep="\s+")
df["date"] = pd.to_datetime(df["date"])
df = (
    df
    .groupby([c for c in df.columns if c!="date"])["date"]
    # only include if first date or if it's a consequetive date
    .agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
    .reset_index()
)

output

car_brand color   city                                                            date
     audi  blue munich                                           [2020-01-01 00:00:00]
     audi   red london [2020-01-03 00:00:00, 2020-01-04 00:00:00, 2020-01-05 00:00:00]
     audi   red munich                                           [2020-01-02 00:00:00]
  porsche   red london                                           [2020-01-03 00:00:00]
  porsche   red  paris                      [2020-01-01 00:00:00, 2020-01-02 00:00:00]


来源:https://stackoverflow.com/questions/65690254/merge-values-of-a-dataframe-where-other-columns-match

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!