cumsum | 易学教程

Replacing more than n consecutive values in Pandas DataFrame column

阅读更多关于 Replacing more than n consecutive values in Pandas DataFrame column

问题 Supposing I have the following DataFrame df df = pd.DataFrame({"a" : [1,2,2,2,2,2,2,2,2,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5], "b" : [3,3,3,3,3,3,3,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,7,7], "c" : [4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,1,2,2,2,2,2,2,2,2,3,3]}) And I wish to replace number 4's which repeat more than 10 times in a row, in any column (there could be hundreds of columns), with 10 4's and the remainder 5's. So for example, 12 consecutive 4's would be replaced

Calculate cumsum from the end towards the beginning

阅读更多关于 Calculate cumsum from the end towards the beginning

问题 I'm trying to calculate the cumsum starting from the last row towards the first for each group. Sample data: t1 <- data.frame(var = "a", val = c(0,0,0,0,1,0,0,0,0,1,0,0,0,0,0)) t2 <- data.frame(var = "b", val = c(0,0,0,0,1,0,0,1,0,0,0,0,0,0,0)) ts <- rbind(t1, t2) Desired format (grouped by var ): ts <- data.frame(var = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"), val = c(2,2,2,2,2,1,1

Cumulative sum from a month ago until the current day for all the rows

阅读更多关于 Cumulative sum from a month ago until the current day for all the rows

问题 I have a data.table with ID, dates and values like the following one: DT <- setDT(data.frame(ContractID= c(1,1,1,2,2), Date = c("2018-02-01", "2018-02-20", "2018-03-12", "2018-02-01", "2018-02-12"), Value = c(10,20,30,10,20))) ContractID Date Value 1: 1 2018-02-01 10 2: 1 2018-02-20 20 3: 1 2018-03-12 30 4: 2 2018-02-01 10 5: 2 2018-02-12 20 I'd like to get a new column with the total cumulative sum per ID from a month ago until the current day for each row, like in the table below. NB: the

Cumulative sum from a month ago until the current day for all the rows

阅读更多关于 Cumulative sum from a month ago until the current day for all the rows

Cumulative sum from a month ago until the current day for all the rows

阅读更多关于 Cumulative sum from a month ago until the current day for all the rows

Why is dplyr::cummean(x) not equal to cumsum(x)/seq_along(x)?

阅读更多关于 Why is dplyr::cummean(x) not equal to cumsum(x)/seq_along(x)?

来源： https://stackoverflow.com/questions/62356825/why-is-dplyrcummeanx-not-equal-to-cumsumx-seq-alongx

Cumsum Reset based on a condition in Pandas

阅读更多关于 Cumsum Reset based on a condition in Pandas

问题 My question is very similar to Cumsum within group and reset on condition in pandas and Pandas: cumsum per category based on additional condition but they don't quite get me there due to my conditional requirements. I have a data frame that looks like this: TransactionId Delta 14 2 14 3 14 1 14 2 15 4 15 2 15 3 I want to create another column "Cumulative" that calculates the cumsum of Delta for each TransactionId. So the result would look like this: TransactionId Delta Cumulative 14 2 2 14 3

Perform a reverse cumulative sum on a numpy array

阅读更多关于 Perform a reverse cumulative sum on a numpy array

问题 Can anyone recommend a way to do a reverse cumulative sum on a numpy array? Where 'reverse cumulative sum' is defined as below (I welcome any corrections on the name for this procedure): if x = np.array([0,1,2,3,4]) then np.cumsum(x) gives array([0,1,3,6,10]) However, I would like to get array([10,10,9,7,4] Can anyone suggest a way to do this? 回答1: This does it: np.cumsum(x[::-1])[::-1] 回答2: You can use .flipud() for this as well, which is equivalent to [::-1] https://docs.scipy.org/doc/numpy

Count the number of NA values in a row - reset when 0

阅读更多关于 Count the number of NA values in a row - reset when 0

问题 I encountered the question: " Cumulative sum that resets when 0 is encountered " via https://stackoverflow.com/a/32502162/13269143 , which partially, but not fully, answered my question. I first wanted to create a column that, row-wise, accumulates the values of each sequence in column b that is separated by a 0. This I achieved by using the code: setDT(df)[, whatiwant := cumsum(b), by = rleid(b == 0L)] as suggested in https://stackoverflow.com/a/32502162/13269143 (the other solutions

R, dplyr: cumulative version of n_distinct

阅读更多关于 R, dplyr: cumulative version of n_distinct

问题 I have a dataframe as follows. It is ordered by column time . Input - df = data.frame(time = 1:20, grp = sort(rep(1:5,4)), var1 = rep(c('A','B'),10) ) head(df,10) time grp var1 1 1 1 A 2 2 1 B 3 3 1 A 4 4 1 B 5 5 2 A 6 6 2 B 7 7 2 A 8 8 2 B 9 9 3 A 10 10 3 B I want to create another variable var2 which computes no of distinct var1 values so far i.e. until that point in time for each group grp . This is a little different from what I'd get if I were to use n_distinct . Expected output - time