Pandas and Python Dataframes and Conditional Shift Function

馋奶兔 提交于 2020-01-01 17:44:07

问题


Is there a conditional "shift" parameter in data frames?

For example,

Assume I own a used car lot and I have data as follows

SaleDate    Car
12/1/2016   Wrangler
12/2/2016   Camry
12/3/2016   Wrangler
12/7/2016   Prius
12/10/2016  Prius
12/12/2016  Wrangler

I want to find two things out from this list -

1) For each sale, when was the last day that a car was sold? This is simple in Pandas, just a simple shift as follows

df['PriorSaleDate'] = df['SaleDate'].shift()

2) For each sale, when was the prior date that the same type of car was sold? So, for example, the Wrangler sale on 12/3 would point two rows back to 12/1 (the last time the "car" value in row 3 was equal to the "car" value in a prior row).

For the Wrangler sold on 12/12, I would want the value of 12/3

Is there a conditional shift parameter that would allow me to get the row there the value df['Car'] equals the value of df['Car'] in that row?

Thank you so much for your help


回答1:


You can use groupby and shift():

import io
import pandas as pd

text = """SaleDate    Car
12/1/2016   Wrangler
12/2/2016   Camry
12/3/2016   Wrangler
12/7/2016   Prius
12/10/2016  Prius
12/12/2016  Wrangler"""

df = pd.read_csv(io.StringIO(text), delim_whitespace=True, parse_dates=[0])
df["lastSaleDate"] = df.groupby("Car").SaleDate.shift()

the output:

    SaleDate       Car lastSaleDate
0 2016-12-01  Wrangler          NaT
1 2016-12-02     Camry          NaT
2 2016-12-03  Wrangler   2016-12-01
3 2016-12-07     Prius          NaT
4 2016-12-10     Prius   2016-12-07
5 2016-12-12  Wrangler   2016-12-03



回答2:


I'm basically copying HYRY's answer and modifying it slightly. If you like this solution. Choose HYRY's answer as your answer.

from StringIO import StringIO  # this is what I needed to do
import pandas as pd

text = """SaleDate    Car
12/1/2016   Wrangler
12/2/2016   Camry
12/3/2016   Wrangler
12/7/2016   Prius
12/10/2016  Prius
12/12/2016  Wrangler"""

df = pd.read_csv(StringIO(text), delim_whitespace=True, parse_dates=[0])

# what you already did
df['PriorSaleDate'] = df['SaleDate'].shift()

# what HYRY did
df["CarSpecificPriorSaleDate"] = df.groupby("Car").SaleDate.shift()

Looks like

Out[34]:
    SaleDate       Car PriorSaleDate CarSpecificPriorSaleDate
0 2016-12-01  Wrangler           NaT                      NaT
1 2016-12-02     Camry    2016-12-01                      NaT
2 2016-12-03  Wrangler    2016-12-02               2016-12-01
3 2016-12-07     Prius    2016-12-03                      NaT
4 2016-12-10     Prius    2016-12-07               2016-12-07
5 2016-12-12  Wrangler    2016-12-10               2016-12-03


来源:https://stackoverflow.com/questions/36770814/pandas-and-python-dataframes-and-conditional-shift-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!