cumsum

In Python Pandas using cumsum with groupby and reset of cumsum when value is 0

谁说胖子不能爱 提交于 2019-12-07 18:20:19
问题 I'm rather new at python. I try to have a cumulative sum for each client to see the consequential months of inactivity (flag: 1 or 0). The cumulative sum of the 1's need therefore to be reset when we have a 0. The reset need to happen as well when we have a new client. See below with example where a is the column of clients and b are the dates. After some research, I found the question 'Cumsum reset at NaN' and 'In Python Pandas using cumsum with groupby'. I assume that I kind of need to put

R: Cumulatively count number of times column value appears in other column

北慕城南 提交于 2019-12-06 14:16:11
It is probably easier to describe what I want to do using an example... Say I have the following dataframe: id1 id2 var 1 2 a 2 3 b 2 1 a 3 2 a 2 3 a 4 2 a 3 1 b Which you can generate as follows df <- data.frame(id1 = c(1,2,2,3,2,4,3), id2 = c(2,3,1,2,3,2,1), var = c('a','b','a','a','a','a','b')) I want a cumulative count of the number of times id2 has appeared in id1 with the same var, so I would end up with id1 id2 var count 1 2 a 0 2 3 b 0 2 1 a 1 3 2 a 1 2 3 a 1 4 2 a 2 3 1 b 0 So the count in row 3 is 1 since we see id1 = 1 and var = 'a' once before row 3 (in row 1), then in row 4 the

vectorize cumsum by factor in R

為{幸葍}努か 提交于 2019-12-06 03:54:08
问题 I am trying to create a column in a very large data frame (~ 2.2 million rows) that calculates the cumulative sum of 1's for each factor level, and resets when a new factor level is reached. Below is some basic data that resembles my own. itemcode <- c('a1', 'a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a5', 'a6', 'a6', 'a6', 'a6') goodp <- c(0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1) df <- data.frame(itemcode, goodp) I would like the output variable, cum.goodp, to look like this:

R: Calculate cumulative sums and counts since the last occurrence of a value

送分小仙女□ 提交于 2019-12-05 23:32:07
given the simplified data set.seed(13) user_id = rep(1:2, each = 10) order_id = sample(1:20, replace = FALSE) cost = round(runif(20, 1.5, 75),1) category = sample( c("apples", "pears", "chicken"), 20, replace = TRUE) pit = rep(c(0,0,0,0,1), 4) df = data.frame(cbind(user_id, order_id, cost, category, pit)) user_id order_id cost category pit 1 15 11.6 pears 0 1 5 41.7 apples 0 1 8 51.3 chicken 0 1 2 40.3 pears 0 1 16 7.9 pears 1 1 1 47.1 chicken 0 1 9 3.8 apples 0 1 10 35.4 apples 0 1 11 25.8 chicken 0 1 20 48.1 chicken 1 2 7 32.6 pears 0 2 18 31.3 pears 0 2 14 69 apples 0 2 4 60.9 chicken 0 2

In Python Pandas using cumsum with groupby and reset of cumsum when value is 0

丶灬走出姿态 提交于 2019-12-05 19:49:33
I'm rather new at python. I try to have a cumulative sum for each client to see the consequential months of inactivity (flag: 1 or 0). The cumulative sum of the 1's need therefore to be reset when we have a 0. The reset need to happen as well when we have a new client. See below with example where a is the column of clients and b are the dates. After some research, I found the question 'Cumsum reset at NaN' and 'In Python Pandas using cumsum with groupby'. I assume that I kind of need to put them together. Adapting the code of 'Cumsum reset at NaN' to the reset towards 0, is successful: cumsum

Rolling sums for groups with uneven time gaps

吃可爱长大的小学妹 提交于 2019-12-04 20:47:00
问题 Here's the tweak to my previously posted question. Here's my data: set.seed(3737) DF2 = data.frame(user_id = c(rep(27, 7), rep(11, 7)), date = as.Date(rep(c('2016-01-01', '2016-01-03', '2016-01-05', '2016-01-07', '2016-01-10', '2016-01-14', '2016-01-16'), 2)), value = round(rnorm(14, 15, 5), 1)) user_id date value 27 2016-01-01 15.0 27 2016-01-03 22.4 27 2016-01-05 13.3 27 2016-01-07 21.9 27 2016-01-10 20.6 27 2016-01-14 18.6 27 2016-01-16 16.4 11 2016-01-01 6.8 11 2016-01-03 21.3 11 2016-01

vectorize cumsum by factor in R

会有一股神秘感。 提交于 2019-12-04 08:59:49
I am trying to create a column in a very large data frame (~ 2.2 million rows) that calculates the cumulative sum of 1's for each factor level, and resets when a new factor level is reached. Below is some basic data that resembles my own. itemcode <- c('a1', 'a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a5', 'a6', 'a6', 'a6', 'a6') goodp <- c(0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1) df <- data.frame(itemcode, goodp) I would like the output variable, cum.goodp, to look like this: cum.goodp <- c(0, 1, 2, 0, 1, 1, 2, 0, 0, 1, 1, 1, 2, 0, 1) I get that there is a lot out there using

draw random element in numpy

大兔子大兔子 提交于 2019-12-04 04:30:15
I have an array of element probabilities, let's say [0.1, 0.2, 0.5, 0.2] . The array sums up to 1.0. Using plain Python or numpy, I want to draw elements proportional to their probability: the first element about 10% of the time, second 20%, third 50% etc. The "draw" should return index of the element drawn. I came up with this: def draw(probs): cumsum = numpy.cumsum(probs / sum(probs)) # sum up to 1.0, just in case return len(numpy.where(numpy.random.rand() >= cumsum)[0]) It works, but it's too convoluted, there must be a better way. Thanks. import numpy as np def random_pick(choices, probs):

ggplot2 and cumsum()

落花浮王杯 提交于 2019-12-03 13:37:10
I have a set of UNIX timestamps and URIs and I'm trying to plot the cumulative count of requests for each URI. I managed to do that for one URI at a time using a dummy column: x.df$count <- apply(x.df,1,function(row) 1) # Create a dummy column for cumsum x.df <- x.df[order(x.df$time, decreasing=FALSE),] # Sort ggplot(x.df, aes(x=time, y=cumsum(count))) + geom_line() However, that would make roughly 30 plots in my case. ggplot2 does allow you to plot multiple lines into one plot (I copied this piece of code from here ): ggplot(data=test_data_long, aes(x=date, y=value, colour=variable)) + geom

Calculating cumulative returns with pandas dataframe

拜拜、爱过 提交于 2019-12-03 05:59:28
I have this dataframe Poloniex_DOGE_BTC Poloniex_XMR_BTC Daily_rets perc_ret 172 0.006085 -0.000839 0.003309 0 173 0.006229 0.002111 0.005135 0 174 0.000000 -0.001651 0.004203 0 175 0.000000 0.007743 0.005313 0 176 0.000000 -0.001013 -0.003466 0 177 0.000000 -0.000550 0.000772 0 178 0.000000 -0.009864 0.001764 0 I'm trying to make a running total of daily_rets in perc_ret however my code just copies the values from daily_rets df['perc_ret'] = ( df['Daily_rets'] + df['perc_ret'].shift(1) ) Poloniex_DOGE_BTC Poloniex_XMR_BTC Daily_rets perc_ret 172 0.006085 -0.000839 0.003309 NaN 173 0.006229 0