rolling-computation

How to speed up creation of rolling sum (LTM) in pandas with large dataset?

爷,独闯天下 提交于 2019-12-13 03:24:17
问题 I want to calculate the moving sum (rolling twelve months) of daily sales for a dataset with 400k rows and 7 columns. My current approach appears to work but is pretty slow (between 1-2 minutes). Columns include: date (daily entries), country, item name (product), customer city, customer number (ID) and customer name As other datasets I work with are much larger (2+ million rows and more) it would be great if you have suggestions on how to speed up the current code: import pandas as pd import

Python pandas: apply a function to dataframe.rolling()

元气小坏坏 提交于 2019-12-12 17:12:23
问题 I have this dataframe: In[1]df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]]) In[2]df Out[2]: 0 1 2 3 4 0 1 2 3 4 5 1 6 7 8 9 10 2 11 12 13 14 15 3 16 17 18 19 20 4 21 22 23 24 25 I need to achieve this: for every rows in my dataframe, if 2 or more values within any 3 consecutive cells is greater than 10, then the last of that 3 cells should be marked as True. The resulting dataframe df1 should be same size with True of False in it based on the

gmm estimation error

╄→гoц情女王★ 提交于 2019-12-11 23:20:59
问题 In estimating GMM with more than one independent variables, The codes are do_gmm <- function(X) { DE <- X[, "DE"] rmrf_local <- X[, "rmrf_local"] SMB_L <- X[,"SMB_L"] h <- cbind(as.numeric(DE,rmrf_local,SMB_L)) coef(gmm(DE ~ rmrf_local,~SMB_L, x = h)) } r <- rollapplyr(ALLX0, 24, do_gmm, by.column = FALSE, fill = NA) The code works but in the output, i have only the first variable as follows > r (Intercept) rmrf_local [1,] 0.21 -0.32 [2,] 0.32 -0.04 [3,] -0.43 -0.03 [4,] -0.42 -0.23 I NEED

pandas get 30 day rolling window over n years

放肆的年华 提交于 2019-12-11 17:51:57
问题 I'm trying to grab a 30 day window going backwards from all dates in a dataframe but also look at the same 30 day window across all of the years in the dataset. The dates are from 2000-2019. For for example starting on 1st Feb 2000, I would like to grab the previous 30 days, and the 30 days before 1st Feb in all other years. I can get a rolling window to work over n days for a z-score: dt= pd.date_range(start='2000-01-01', end='2019-03-01') x=[randint(0,100) for x in range(len(dt))] DTX = pd

rolling.apply on custom function that requires multiple columns of dataframe to reduce single column

眉间皱痕 提交于 2019-12-11 16:47:57
问题 I am trying to create an additional column of my df['newc'] through rolling.apply on df['cond'] with a custom function. The custom function requires two columns of df . I am not sure how to get it working. I tried df['newc'] = df['cond'].rolling(4).apply(T_correction, args = (df['temp'].rolling(4))) This is obviously not working and this gives the following error: raise NotImplementedError('See issue #11704 {url}'.format(url=url)) NotImplementedError: See issue #11704 https://github.com

How to compute moving (or rolling, if you will) percentile/quantile for a 1d array in numpy?

て烟熏妆下的殇ゞ 提交于 2019-12-11 07:32:24
问题 In pandas, we have pd.rolling_quantile() . And in numpy, we have np.percentile() , but I'm not sure how to do the rolling/moving version of it. To explain what I meant by moving/rolling percentile/quantile: Given array [1, 5, 7, 2, 4, 6, 9, 3, 8, 10] , the moving quantile 0.5 (i.e. moving percentile 50%) with window size 3 is: 1 5 - 1 5 7 -> 0.5 quantile = 5 7 - 5 7 2 -> 5 2 - 7 2 4 -> 4 4 - 2 4 6 -> 4 6 - 4 6 9 -> 6 9 - 6 9 3 -> 6 3 - 9 3 8 -> 8 8 - 3 8 10 -> 8 10 So [5, 5, 4, 4, 6, 6, 8, 8]

Pandas DataFrame: How to do Set Union Aggregation over a rolling window

六月ゝ 毕业季﹏ 提交于 2019-12-10 17:14:20
问题 I have a Dataframe that contains sets of ids in one column and dates in another: import pandas as pd df = pd.DataFrame([['2018-01-01', {1, 2, 3}], ['2018-01-02', {3}], ['2018-01-03', {3, 4, 5}], ['2018-01-04', {5, 6}]], columns=['timestamp', 'ids']) df['timestamp'] = pd.to_datetime(df['timestamp']) df.set_index('timestamp', inplace=True) ids timestamp 2018-01-01 {1, 2, 3} 2018-01-02 {3} 2018-01-03 {3, 4, 5} 2018-01-04 {5, 6} What I am looking for is a function that can give me the ids for the

Applying lambda function to a pandas rolling window series

五迷三道 提交于 2019-12-10 11:03:10
问题 I have a function which takes an array and a value, and returns a value. I would like to apply it to my Series s on a rolling basis, so the array is always the rolling window. Here's a minimal example of what I've tried (unsuccessfully), using np.random.choice in place of my real function. I find lots of examples for finding rolling means and other built-in functions, but can't get it to work for my arbitrary lambda function. s = pd.Series([1,2,3,4,5,6,7,8,9]) rolling_window = s.rolling(3)

Rolling Average to calculate rainfall intensity

强颜欢笑 提交于 2019-12-09 23:06:15
问题 I have some real rainfall data recorded as the date and time, and the accumulated number of tips on a tipping bucket rain-gauge. The tipping bucket represents 0.5mm of rainfall. I want to cycle through the file and determine the variation in intensity (rainfall/time) So I need a rolling average over multiple fixed time frames: So I want to accumulate rainfall, until 5minutes of rain is accumulated and determine the intensity in mm/hour. So if 3mm is recorded in 5min it is equal to 3/5*60 =

Rolling regression over multiple columns

拥有回忆 提交于 2019-12-09 11:57:48
问题 I have an issue finding the most efficient way to calculate a rolling linear regression over a xts object with multiple columns. I have searched and read several previously questions here on stackoverflow. This question and answer comes close but not enough in my opinion as I want to calculate multiple regressions with the dependent variable unchanged in all the regressions. I have tried to reproduce an example with random data: require(xts) require(RcppArmadillo) # Load libraries data <-