cumsum

R trailing cumsum per group

独自空忆成欢 提交于 2019-12-11 08:36:45
问题 I need to compute the running cumsum per group in R but the window over which to cumsum must only be the last 3 observations: If for example I have a table with a person's name, a date and a score as follow: Name Date Score 1 John 2017-01-01 4 2 John 2017-01-02 5 3 John 2017-01-03 3 4 John 2017-01-04 1 5 John 2017-01-05 4 6 John 2017-01-06 4 7 Ben 2017-01-01 4 8 Ben 2017-01-02 4 9 Ben 2017-01-03 5 10 Ben 2017-01-04 2 11 Ben 2017-01-05 3 12 Ben 2017-01-06 4 13 Ben 2017-01-07 4 14 Ben 2017-01

Fast way to get the number of NaNs in a column counted from the last valid value in a DataFrame

两盒软妹~` 提交于 2019-12-11 04:59:23
问题 Say I have a DataFrame like A B 0 0.1880 0.345 1 0.2510 0.585 2 NaN NaN 3 NaN NaN 4 NaN 1.150 5 0.2300 1.210 6 0.1670 1.290 7 0.0835 1.400 8 0.0418 NaN 9 0.0209 NaN 10 NaN NaN 11 NaN NaN 12 NaN NaN I want a new DataFrame of the same shape where each entry represents the number of NaNs counted up to its position started from the last valid value as follows A B 0 0 0 1 0 0 2 1 1 3 2 2 4 3 0 5 0 0 6 0 0 7 0 0 8 0 1 9 0 2 10 1 3 11 2 4 12 3 5 I wonder if this can be done efficiently by utilizing

Python Pandas Running Totals with Resets

人盡茶涼 提交于 2019-12-11 04:57:50
问题 I would like to perform the following task. Given a 2 columns (good and bad) I would like to replace any rows for the two columns with a running total. Here is an example of the current dataframe along with the desired data frame. EDIT: I should have added what my intentions are. I am trying to create equally binned (in this case 20) variable using a continuous variable as the input. I know the pandas cut and qcut functions are available, however the returned results will have zeros for the

Breaking cumsum() function at some threshold in r

社会主义新天地 提交于 2019-12-11 02:43:00
问题 For example I have the following code: cumsum(1:100) And I want to break it,if an element i+1 will be greater than 3000 . How can I do that? So instead of this result: [1] 1 3 6 10 15 21 28 36 45 55 66 78 91 105 120 136 153 171 190 210 231 253 276 300 [25] 325 351 378 406 435 465 496 528 561 595 630 666 703 741 780 820 861 903 946 990 1035 1081 1128 1176 [49] 1225 1275 1326 1378 1431 1485 1540 1596 1653 1711 1770 1830 1891 1953 2016 2080 2145 2211 2278 2346 2415 2485 2556 2628 [73] 2701 2775

Cumsum within group and reset on condition in pandas

风格不统一 提交于 2019-12-11 01:27:45
问题 I have a dataframe with two columns ID and Activity. The activity is either 0 or 1. I want a new column containing a increasing number since the last activity was 1. However, the count should only be within one group (ID). If the activity is 1, the counting column should be reset to 0 and the count starts again. So, I have a dataframe containing the following: What is want is this: Can someone help me? 回答1: We using a new para 'G' here df['G']=df.groupby('ID').Activeity.apply(lambda x :(x

Reset Cumulative sum base on condition Pandas

这一生的挚爱 提交于 2019-12-10 13:51:38
问题 I have a data frame like: customer spend hurdle A 20 50 A 31 50 A 20 50 B 50 100 B 51 100 B 30 100 I want to calculate additional column for Cumulative which will reset base on the same customer when the Cumulative sum greater or equal to the hurdle like following : customer spend hurdle Cumulative A 20 50 20 A 31 50 51 A 20 50 20 B 50 100 50 B 51 100 101 B 30 100 30 I used the cumsum and groupby in pandas to but I do not know how to reset it base on the condition. Following are the code I am

R: Calculate cumulative sums and counts since the last occurrence of a value

十年热恋 提交于 2019-12-10 10:16:48
问题 given the simplified data set.seed(13) user_id = rep(1:2, each = 10) order_id = sample(1:20, replace = FALSE) cost = round(runif(20, 1.5, 75),1) category = sample( c("apples", "pears", "chicken"), 20, replace = TRUE) pit = rep(c(0,0,0,0,1), 4) df = data.frame(cbind(user_id, order_id, cost, category, pit)) user_id order_id cost category pit 1 15 11.6 pears 0 1 5 41.7 apples 0 1 8 51.3 chicken 0 1 2 40.3 pears 0 1 16 7.9 pears 1 1 1 47.1 chicken 0 1 9 3.8 apples 0 1 10 35.4 apples 0 1 11 25.8

“Cumulative difference” function in R

给你一囗甜甜゛ 提交于 2019-12-08 14:42:25
Is there a pre-existing function to calculate the cumulative difference between consequtive values? Context: this is to estimate the change in altitude that a person has to undergo in both directions on a journey generated by CycleStreet.net . Reproducible example: x <- c(27, 24, 24, 27, 28) # create the data Method 1: for loop for(i in 2:length(x)){ # for loop way if(i == 2) cum_change <- 0 cum_change <- Mod(x[i] - x[i - 1]) + cum_change cum_change } ## 7 Method 2: vectorised diffs <- Mod(x[-1] - x[-length(x)]) # vectorised way sum(diffs) ## 7 Both seem to work. I'm just wondering if there's

Unelegant decorate-count-undecorate on data.table cumulative sum

隐身守侯 提交于 2019-12-08 06:46:31
问题 I wish to keep a counter by "description". Can the following code be 1-liner: dt[, dummy:=1] dt[, count:=lapply(.SD,cumsum), by = "description", .SDcols=("dummy")] dt[, dummy:=NULL] 回答1: If I understand correctly, you just want: dt[ , count := rowid(description)] 来源: https://stackoverflow.com/questions/34730544/unelegant-decorate-count-undecorate-on-data-table-cumulative-sum

“Cumulative difference” function in R

天涯浪子 提交于 2019-12-08 06:11:42
问题 Is there a pre-existing function to calculate the cumulative difference between consequtive values? Context: this is to estimate the change in altitude that a person has to undergo in both directions on a journey generated by CycleStreet.net. Reproducible example: x <- c(27, 24, 24, 27, 28) # create the data Method 1: for loop for(i in 2:length(x)){ # for loop way if(i == 2) cum_change <- 0 cum_change <- Mod(x[i] - x[i - 1]) + cum_change cum_change } ## 7 Method 2: vectorised diffs <- Mod(x[