I have calculated the differences between consecutive values in a series, but I cannot reverse / undifference them using diffinv()
:
ds_sqrt =
df.cumsum()
Example:
data = {'a':[1,6,3,9,5], 'b':[13,1,2,5,23]}
df = pd.DataFrame(data)
df =
a b
0 1 13
1 6 1
2 3 2
3 9 5
4 5 23
df.diff()
a b
0 NaN NaN
1 5.0 -12.0
2 -3.0 1.0
3 6.0 3.0
4 -4.0 18.0
df.cumsum()
a b
0 1 13
1 7 14
2 10 16
3 19 21
4 24 44
You can do this via numpy
. Algorithm courtesy of @Divakar.
Of course, you need to know the first item in your series for this to work.
df = pd.DataFrame({'A': np.random.randint(0, 10, 10)})
df['B'] = df['A'].diff()
x, x_diff = df['A'].iloc[0], df['B'].iloc[1:]
df['C'] = np.r_[x, x_diff].cumsum().astype(int)
# A B C
# 0 8 NaN 8
# 1 5 -3.0 5
# 2 4 -1.0 4
# 3 3 -1.0 3
# 4 9 6.0 9
# 5 7 -2.0 7
# 6 4 -3.0 4
# 7 0 -4.0 0
# 8 8 8.0 8
# 9 1 -7.0 1
You can use diff_inv from pmdarima.Docs link
# genarating random table
np.random.seed(10)
vals = np.random.randint(1, 10, 6)
df_t = pd.DataFrame({"a":vals})
#creating two columns with diff 1 and diff 2
df_t['dif_1'] = df_t.a.diff(1)
df_t['dif_2'] = df_t.a.diff(2)
df_t
a dif_1 dif_2
0 5 NaN NaN
1 1 -4.0 NaN
2 2 1.0 -3.0
3 1 -1.0 0.0
4 2 1.0 0.0
5 9 7.0 8.0
Then create a function that will return an array with inverse values of diff.
from pmdarima.utils import diff_inv
def inv_diff (df_orig_column,df_diff_column, periods):
# Generate np.array for the diff_inv function - it includes first n values(n =
# periods) of original data & further diff values of given periods
value = np.array(df_orig_column[:periods].tolist()+df_diff_column[periods:].tolist())
# Generate np.array with inverse diff
inv_diff_vals = diff_inv(value, periods,1 )[periods:]
return inv_diff_vals
Example of Use:
# df_orig_column - column with original values
# df_diff_column - column with differentiated values
# periods - preiods for pd.diff()
inv_diff(df_t.a, df_t.dif_2, 2)
Output:
array([5., 1., 2., 1., 2., 9.])