rolling-computation

Rolling window function for irregular time series that can handle duplicates

旧时模样 提交于 2020-06-27 14:04:24
问题 I have the following data.frame: grp nr yr 1: A 1.0 2009 2: A 2.0 2009 3: A 1.5 2009 4: A 1.0 2010 5: B 3.0 2009 6: B 2.0 2010 7: B NA 2011 8: C 3.0 2014 9: C 3.0 2019 10: C 3.0 2020 11: C 4.0 2021 Desired output: grp nr yr nr_roll_period_3 1 A 1.0 2009 NA 2 A 2.0 2009 NA 3 A 1.5 2009 NA 4 A 1.0 2010 NA 5 B 3.0 2009 NA 6 B 2.0 2010 NA 7 B NA 2011 NA 8 C 3.0 2014 NA 9 C 3.0 2019 NA 10 C 3.0 2020 NA 11 C 4.0 2021 3.333333 The logic: I want to calculate a rolling mean for the period of length k

Taking first and last value in a rolling window

余生长醉 提交于 2020-06-17 13:10:06
问题 Initial problem statement Using pandas, I would like to apply function available for resample() but not for rolling(). This works: df1 = df.resample(to_freq, closed='left', kind='period', ).agg(OrderedDict([('Open', 'first'), ('Close', 'last'), ])) This doesn't: df2 = df.rolling(my_indexer).agg( OrderedDict([('Open', 'first'), ('Close', 'last') ])) >>> AttributeError: 'first' is not a valid function for 'Rolling' object df3 = df.rolling(my_indexer).agg( OrderedDict([ ('Close', 'last') ])) >>>

Taking first value in a rolling window that is not numeric

为君一笑 提交于 2020-04-30 07:15:26
问题 This question follows one I previously asked here, and that was answered for numeric values. I raise this 2nd one now relative to data of Period type. While the example given below appears simple, I have actually windows that are of variable size . Interested in the 1st row of the windows, I am looking for a technic that makes use of this definition. import pandas as pd from random import seed, randint # DataFrame pi1h = pd.period_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00

Pandas - Count frequency of value for last x amount of days

别等时光非礼了梦想. 提交于 2020-04-15 10:48:49
问题 I'm finding some unexpected results. What I am trying to do is create a column that looks at the ID number and the date, and will count how many times that ID number comes up in the last 7 days (I'd also like to make that dynamic for an x amount of days, but just trying out with 7 days). So given this dataframe: import pandas as pd df = pd.DataFrame( [['A', '2020-02-02 20:31:00'], ['A', '2020-02-03 00:52:00'], ['A', '2020-02-07 23:45:00'], ['A', '2020-02-08 13:19:00'], ['A', '2020-02-18 13:16

Rolling regression with expanding window in R

陌路散爱 提交于 2020-04-11 11:26:23
问题 I would like to do a rolling linear regression, with expanding window, between two variables in a data frame, grouped by a third categorical column. For example, in the toy data frame below, I would like to extract coefficient of lm(y~x) grouped by z using all rows until the row of interest. Thus for row 2, data set for regression will be rows 1:2, for row 3 will be rows 1:3, for row 4 will be just row 4 as it is the first row with categorical variable z= b dframe<-data.frame(x=c(1:10),y=c(8

Rolling regression with expanding window in R

一个人想着一个人 提交于 2020-04-11 11:25:09
问题 I would like to do a rolling linear regression, with expanding window, between two variables in a data frame, grouped by a third categorical column. For example, in the toy data frame below, I would like to extract coefficient of lm(y~x) grouped by z using all rows until the row of interest. Thus for row 2, data set for regression will be rows 1:2, for row 3 will be rows 1:3, for row 4 will be just row 4 as it is the first row with categorical variable z= b dframe<-data.frame(x=c(1:10),y=c(8

How to find duplicate based upon multiple columns in a rolling window in pandas?

拜拜、爱过 提交于 2020-04-05 06:43:58
问题 Sample Data {"transaction": {"merchant": "merchantA", "amount": 20, "time": "2019-02-13T10:00:00.000Z"}} {"transaction": {"merchant": "merchantB", "amount": 90, "time": "2019-02-13T11:00:01.000Z"}} {"transaction": {"merchant": "merchantC", "amount": 90, "time": "2019-02-13T11:00:10.000Z"}} {"transaction": {"merchant": "merchantD", "amount": 90, "time": "2019-02-13T11:00:20.000Z"}} {"transaction": {"merchant": "merchantE", "amount": 90, "time": "2019-02-13T11:01:30.000Z"}} {"transaction": {

How to find duplicate based upon multiple columns in a rolling window in pandas?

生来就可爱ヽ(ⅴ<●) 提交于 2020-04-05 06:42:20
问题 Sample Data {"transaction": {"merchant": "merchantA", "amount": 20, "time": "2019-02-13T10:00:00.000Z"}} {"transaction": {"merchant": "merchantB", "amount": 90, "time": "2019-02-13T11:00:01.000Z"}} {"transaction": {"merchant": "merchantC", "amount": 90, "time": "2019-02-13T11:00:10.000Z"}} {"transaction": {"merchant": "merchantD", "amount": 90, "time": "2019-02-13T11:00:20.000Z"}} {"transaction": {"merchant": "merchantE", "amount": 90, "time": "2019-02-13T11:01:30.000Z"}} {"transaction": {

1 Year Rolling mean pandas on column date

佐手、 提交于 2020-03-18 12:37:47
问题 I would like to compute the 1 year rolling average for each line on the Dataframe below test: index id date variation 2313 7034 2018-03-14 4.139148e-06 2314 7034 2018-03-13 4.953194e-07 2315 7034 2018-03-12 2.854749e-06 2316 7034 2018-03-09 3.907458e-06 2317 7034 2018-03-08 1.662412e-06 2318 7034 2018-03-07 1.346433e-06 2319 7034 2018-03-06 8.731700e-06 2320 7034 2018-03-05 7.145597e-06 2321 7034 2018-03-02 4.893283e-06 ... For example, I would need to calculate: mean of variation of 7034

Pandas monthly rolling window

时间秒杀一切 提交于 2020-02-23 03:42:31
问题 I am looking to do a 'monthly' rolling window on daily data grouped by a category. The code below does not work as is, it leads to the following error: ValueError: <DateOffset: months=1> is a non-fixed frequency I know that I could use '30D' offset, however this would shift the date over time. I'm looking for the sum of a window that spans from the x-th day of a month to that same x-th day of the J-th month. E.g. with J=1: 4th of July to 4th of August, 5th of July to 5th of August, 6th of