rolling-computation | 易学教程

How to speed up creation of rolling sum (LTM) in pandas with large dataset?

阅读更多关于 How to speed up creation of rolling sum (LTM) in pandas with large dataset?

问题 I want to calculate the moving sum (rolling twelve months) of daily sales for a dataset with 400k rows and 7 columns. My current approach appears to work but is pretty slow (between 1-2 minutes). Columns include: date (daily entries), country, item name (product), customer city, customer number (ID) and customer name As other datasets I work with are much larger (2+ million rows and more) it would be great if you have suggestions on how to speed up the current code: import pandas as pd import

Python pandas: apply a function to dataframe.rolling()

阅读更多关于 Python pandas: apply a function to dataframe.rolling()

问题 I have this dataframe: In[1]df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]]) In[2]df Out[2]: 0 1 2 3 4 0 1 2 3 4 5 1 6 7 8 9 10 2 11 12 13 14 15 3 16 17 18 19 20 4 21 22 23 24 25 I need to achieve this: for every rows in my dataframe, if 2 or more values within any 3 consecutive cells is greater than 10, then the last of that 3 cells should be marked as True. The resulting dataframe df1 should be same size with True of False in it based on the

gmm estimation error

阅读更多关于 gmm estimation error

问题 In estimating GMM with more than one independent variables, The codes are do_gmm <- function(X) { DE <- X[, "DE"] rmrf_local <- X[, "rmrf_local"] SMB_L <- X[,"SMB_L"] h <- cbind(as.numeric(DE,rmrf_local,SMB_L)) coef(gmm(DE ~ rmrf_local,~SMB_L, x = h)) } r <- rollapplyr(ALLX0, 24, do_gmm, by.column = FALSE, fill = NA) The code works but in the output, i have only the first variable as follows > r (Intercept) rmrf_local [1,] 0.21 -0.32 [2,] 0.32 -0.04 [3,] -0.43 -0.03 [4,] -0.42 -0.23 I NEED

pandas get 30 day rolling window over n years

阅读更多关于 pandas get 30 day rolling window over n years

问题 I'm trying to grab a 30 day window going backwards from all dates in a dataframe but also look at the same 30 day window across all of the years in the dataset. The dates are from 2000-2019. For for example starting on 1st Feb 2000, I would like to grab the previous 30 days, and the 30 days before 1st Feb in all other years. I can get a rolling window to work over n days for a z-score: dt= pd.date_range(start='2000-01-01', end='2019-03-01') x=[randint(0,100) for x in range(len(dt))] DTX = pd

rolling.apply on custom function that requires multiple columns of dataframe to reduce single column

阅读更多关于 rolling.apply on custom function that requires multiple columns of dataframe to reduce single column

问题 I am trying to create an additional column of my df['newc'] through rolling.apply on df['cond'] with a custom function. The custom function requires two columns of df . I am not sure how to get it working. I tried df['newc'] = df['cond'].rolling(4).apply(T_correction, args = (df['temp'].rolling(4))) This is obviously not working and this gives the following error: raise NotImplementedError('See issue #11704 {url}'.format(url=url)) NotImplementedError: See issue #11704 https://github.com

How to compute moving (or rolling, if you will) percentile/quantile for a 1d array in numpy?

阅读更多关于 How to compute moving (or rolling, if you will) percentile/quantile for a 1d array in numpy?

问题 In pandas, we have pd.rolling_quantile() . And in numpy, we have np.percentile() , but I'm not sure how to do the rolling/moving version of it. To explain what I meant by moving/rolling percentile/quantile: Given array [1, 5, 7, 2, 4, 6, 9, 3, 8, 10] , the moving quantile 0.5 (i.e. moving percentile 50%) with window size 3 is: 1 5 - 1 5 7 -> 0.5 quantile = 5 7 - 5 7 2 -> 5 2 - 7 2 4 -> 4 4 - 2 4 6 -> 4 6 - 4 6 9 -> 6 9 - 6 9 3 -> 6 3 - 9 3 8 -> 8 8 - 3 8 10 -> 8 10 So [5, 5, 4, 4, 6, 6, 8, 8]

Pandas DataFrame: How to do Set Union Aggregation over a rolling window

阅读更多关于 Pandas DataFrame: How to do Set Union Aggregation over a rolling window

问题 I have a Dataframe that contains sets of ids in one column and dates in another: import pandas as pd df = pd.DataFrame([['2018-01-01', {1, 2, 3}], ['2018-01-02', {3}], ['2018-01-03', {3, 4, 5}], ['2018-01-04', {5, 6}]], columns=['timestamp', 'ids']) df['timestamp'] = pd.to_datetime(df['timestamp']) df.set_index('timestamp', inplace=True) ids timestamp 2018-01-01 {1, 2, 3} 2018-01-02 {3} 2018-01-03 {3, 4, 5} 2018-01-04 {5, 6} What I am looking for is a function that can give me the ids for the

Applying lambda function to a pandas rolling window series

阅读更多关于 Applying lambda function to a pandas rolling window series

问题 I have a function which takes an array and a value, and returns a value. I would like to apply it to my Series s on a rolling basis, so the array is always the rolling window. Here's a minimal example of what I've tried (unsuccessfully), using np.random.choice in place of my real function. I find lots of examples for finding rolling means and other built-in functions, but can't get it to work for my arbitrary lambda function. s = pd.Series([1,2,3,4,5,6,7,8,9]) rolling_window = s.rolling(3)

Rolling Average to calculate rainfall intensity

阅读更多关于 Rolling Average to calculate rainfall intensity

问题 I have some real rainfall data recorded as the date and time, and the accumulated number of tips on a tipping bucket rain-gauge. The tipping bucket represents 0.5mm of rainfall. I want to cycle through the file and determine the variation in intensity (rainfall/time) So I need a rolling average over multiple fixed time frames: So I want to accumulate rainfall, until 5minutes of rain is accumulated and determine the intensity in mm/hour. So if 3mm is recorded in 5min it is equal to 3/5*60 =

Rolling regression over multiple columns

阅读更多关于 Rolling regression over multiple columns

问题 I have an issue finding the most efficient way to calculate a rolling linear regression over a xts object with multiple columns. I have searched and read several previously questions here on stackoverflow. This question and answer comes close but not enough in my opinion as I want to calculate multiple regressions with the dependent variable unchanged in all the regressions. I have tried to reproduce an example with random data: require(xts) require(RcppArmadillo) # Load libraries data <-