need to fill the NA values with the past three values before na values in python

让人想犯罪 __ 提交于 2019-12-13 02:14:22

问题


need to fill the NA values with the past three values mean of that NA

this is my dataset

RECEIPT_MONTH_YEAR NET_SALES

0 2014-01-01 818817.20

1 2014-02-01 362377.20

2 2014-03-01 374644.60

3 2014-04-01 NA

4 2014-05-01 NA

5 2014-06-01 NA

6 2014-07-01 NA

7 2014-08-01 46382.50

8 2014-09-01 55933.70

9 2014-10-01 292303.40

10 2014-10-01 382928.60


回答1:


is this dataset a .csv file or a dataframe. This NA is a 'NaN' or a string ?

import pandas as pd
import numpy as np
df=pd.read_csv('your dataset',sep=' ')
df.replace('NA',np.nan)
df.fillna(method='ffill',inplace=True) 

you mention something about mean of 3 values..the above simply forward fills the last observation before the NaNs begin. This is often a good way for forecasting (better than taking means in certain cases, if persistence is important)

 ind = df['NET_SALES'].index[df['NET_SALES'].apply(np.isnan)]
 Meanof3 = df.iloc[ind[0]-3:ind[0]].mean(axis=1,skipna=True)
 df.replace('NA',Meanof3)

Maybe the answer can be generalised and improved if more info about the dataset is known - like if you always want to take the mean of last 3 measurements before any NA. The above will allow you to check the indices that are NaNs and then take mean of 3 before, while ignoring any NaNs




回答2:


This is simple but it is working

df_data.fillna(0,inplace=True)
for i in range(0,len(df_data)):
if df_data['NET_SALES'][i]== 0.00:
    condtn = df_data['NET_SALES'][i-1]+df_data['NET_SALES'][i-2]+df_data['NET_SALES'][i-3]
    df_data['NET_SALES'][i]=condtn/3



回答3:


You could use fillna (assuming that your NA is already np.nan) and rolling mean:

import pandas as pd
import numpy as np

df = pd.DataFrame([818817.2,362377.2,374644.6,np.nan,np.nan,np.nan,np.nan,46382.5,55933.7,292303.4,382928.6], columns=["NET_SALES"])

df["NET_SALES"] = df["NET_SALES"].fillna(df["NET_SALES"].shift(1).rolling(3, min_periods=1).mean())

Out:

NET_SALES
0   818817.2
1   362377.2
2   374644.6
3   518613.0
4   368510.9
5   374644.6
6   NaN
7   46382.5
8   55933.7
9   292303.4
10  382928.6

If you want to include the imputed values I guess you'll need to use a loop.



来源:https://stackoverflow.com/questions/50151197/need-to-fill-the-na-values-with-the-past-three-values-before-na-values-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!