cumsum

Replacing more than n consecutive values in Pandas DataFrame column

空扰寡人 提交于 2021-01-27 12:48:16
问题 Supposing I have the following DataFrame df df = pd.DataFrame({"a" : [1,2,2,2,2,2,2,2,2,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5], "b" : [3,3,3,3,3,3,3,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,7,7], "c" : [4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,1,2,2,2,2,2,2,2,2,3,3]}) And I wish to replace number 4's which repeat more than 10 times in a row, in any column (there could be hundreds of columns), with 10 4's and the remainder 5's. So for example, 12 consecutive 4's would be replaced

Calculate cumsum from the end towards the beginning

谁说我不能喝 提交于 2021-01-02 08:21:38
问题 I'm trying to calculate the cumsum starting from the last row towards the first for each group. Sample data: t1 <- data.frame(var = "a", val = c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0)) t2 <- data.frame(var = "b", val = c(0,0,0,0,1,0,0,1,0,0,0,0,0,0,0)) ts <- rbind(t1, t2) Desired format (grouped by var ): ts <- data.frame(var = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"), val = c(2,2,2,2,2,1,1

Cumulative sum from a month ago until the current day for all the rows

送分小仙女□ 提交于 2020-12-30 03:57:35
问题 I have a data.table with ID, dates and values like the following one: DT <- setDT(data.frame(ContractID= c(1,1,1,2,2), Date = c("2018-02-01", "2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Value = c(10,20,30,10,20))) ContractID Date Value 1: 1 2018-02-01 10 2: 1 2018-02-20 20 3: 1 2018-03-12 30 4: 2 2018-02-01 10 5: 2 2018-02-12 20 I'd like to get a new column with the total cumulative sum per ID from a month ago until the current day for each row, like in the table below. NB: the

Cumulative sum from a month ago until the current day for all the rows

痞子三分冷 提交于 2020-12-30 03:55:11
问题 I have a data.table with ID, dates and values like the following one: DT <- setDT(data.frame(ContractID= c(1,1,1,2,2), Date = c("2018-02-01", "2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Value = c(10,20,30,10,20))) ContractID Date Value 1: 1 2018-02-01 10 2: 1 2018-02-20 20 3: 1 2018-03-12 30 4: 2 2018-02-01 10 5: 2 2018-02-12 20 I'd like to get a new column with the total cumulative sum per ID from a month ago until the current day for each row, like in the table below. NB: the

Cumulative sum from a month ago until the current day for all the rows

拜拜、爱过 提交于 2020-12-30 03:54:15
问题 I have a data.table with ID, dates and values like the following one: DT <- setDT(data.frame(ContractID= c(1,1,1,2,2), Date = c("2018-02-01", "2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Value = c(10,20,30,10,20))) ContractID Date Value 1: 1 2018-02-01 10 2: 1 2018-02-20 20 3: 1 2018-03-12 30 4: 2 2018-02-01 10 5: 2 2018-02-12 20 I'd like to get a new column with the total cumulative sum per ID from a month ago until the current day for each row, like in the table below. NB: the

Cumsum Reset based on a condition in Pandas

我怕爱的太早我们不能终老 提交于 2020-07-20 10:32:40
问题 My question is very similar to Cumsum within group and reset on condition in pandas and Pandas: cumsum per category based on additional condition but they don't quite get me there due to my conditional requirements. I have a data frame that looks like this: TransactionId Delta 14 2 14 3 14 1 14 2 15 4 15 2 15 3 I want to create another column "Cumulative" that calculates the cumsum of Delta for each TransactionId. So the result would look like this: TransactionId Delta Cumulative 14 2 2 14 3

Perform a reverse cumulative sum on a numpy array

人走茶凉 提交于 2020-07-15 06:28:07
问题 Can anyone recommend a way to do a reverse cumulative sum on a numpy array? Where 'reverse cumulative sum' is defined as below (I welcome any corrections on the name for this procedure): if x = np.array([0,1,2,3,4]) then np.cumsum(x) gives array([0,1,3,6,10]) However, I would like to get array([10,10,9,7,4] Can anyone suggest a way to do this? 回答1: This does it: np.cumsum(x[::-1])[::-1] 回答2: You can use .flipud() for this as well, which is equivalent to [::-1] https://docs.scipy.org/doc/numpy

Count the number of NA values in a row - reset when 0

前提是你 提交于 2020-06-29 03:39:15
问题 I encountered the question: " Cumulative sum that resets when 0 is encountered " via https://stackoverflow.com/a/32502162/13269143 , which partially, but not fully, answered my question. I first wanted to create a column that, row-wise, accumulates the values of each sequence in column b that is separated by a 0. This I achieved by using the code: setDT(df)[, whatiwant := cumsum(b), by = rleid(b == 0L)] as suggested in https://stackoverflow.com/a/32502162/13269143 (the other solutions

R, dplyr: cumulative version of n_distinct

帅比萌擦擦* 提交于 2020-06-24 09:08:29
问题 I have a dataframe as follows. It is ordered by column time . Input - df = data.frame(time = 1:20, grp = sort(rep(1:5,4)), var1 = rep(c('A','B'),10) ) head(df,10) time grp var1 1 1 1 A 2 2 1 B 3 3 1 A 4 4 1 B 5 5 2 A 6 6 2 B 7 7 2 A 8 8 2 B 9 9 3 A 10 10 3 B I want to create another variable var2 which computes no of distinct var1 values so far i.e. until that point in time for each group grp . This is a little different from what I'd get if I were to use n_distinct . Expected output - time