问题
I have a CSV that looks like this (and when brought into a pandas Dataframe with
read_csv()
, it looks the same).
I want to update the values in column ad_requests according to the following logic:
For a given row, if ad_requests has a value, leave it alone. Else, give it a value of the previous row's value for ad_requests minus the previous row's value for impressions. So in the first example, we would like to end up with:
I get partially there:
df["ad_requests"] = [i if not pd.isnull(i) else ??? for i in df["ad_requests"]]
And this is where I get stuck. After the else
, I want to "go back" and access the previous "row", though I know that this is not how pandas is meant to be used.
Another thing to note that is the rows will always be grouped in threes, by column ad_tag_name. If I pd.groupby["ad_tag_name"]
, I can then turn this into a list
and start slicing and indexing, but again, I think there must be a better way to do this in pandas (as there is many things).
Python: 2.7.10
Pandas: 0.18.0
回答1:
You'll want to do something like this:
pd.options.mode.chained_assignment = None #suppresses "SettingWithCopyWarning"
for index, elem in enumerate(df['ad_requests']):
if pd.isnull(elem):
df['ad_requests'][index]=df['ad_requests'][index-1]-df['impressions'][index-1]
The warning comes from the fact that we're changing the values of a view of a dataframe, which affects the original dataframe. That is what we wish to do, however, so it doesn't really concern us.
(Python 2.7.12 and Pandas 0.19.0)
EDIT:
Changing the last line of code from
df['ad_requests'][index]=df['ad_requests'][index-1]-df['impressions'][index-1]
to
df.at[index,'ad_requests']=df.at[index-1,'ad_requests']-df.at[index-1,'impressions']
removes the need to suppress any warnings:
for index, elem in enumerate(df['ad_requests']):
if pd.isnull(elem):
df.at[index,'ad_requests']=df.at[index-1,'ad_requests']-df.at[index-1,'impressions']
来源:https://stackoverflow.com/questions/40733560/using-the-values-of-a-previous-row-in-a-pandas-series