cumsum

Python pandas cumsum() reset after hitting max

隐身守侯 提交于 2019-12-17 14:51:19
问题 I have a pandas DataFrame with timedeltas as a cumulative sum of those deltas in a separate column expressed in milliseconds. An example is provided below: Transaction_ID Time TimeDelta CumSum[ms] 1 00:00:04.500 00:00:00.000 000 2 00:00:04.600 00:00:00.100 100 3 00:00:04.762 00:00:00.162 262 4 00:00:05.543 00:00:00.781 1043 5 00:00:09.567 00:00:04.024 5067 6 00:00:10.654 00:00:01.087 6154 7 00:00:14.300 00:00:03.646 9800 8 00:00:14.532 00:00:00.232 10032 9 00:00:16.500 00:00:01.968 12000 10

Pandas dataframe - running sum with reset

纵然是瞬间 提交于 2019-12-17 10:58:10
问题 I want to calculate the running sum in a given column(without using loops, of course). The caveat is that I have this other column that specifies when to reset the running sum to the value present in that row. Best explained by the following example: reset val desired_col 0 0 1 1 1 0 5 6 2 0 4 10 3 1 2 2 4 1 -1 -1 5 0 6 5 6 0 4 9 7 1 2 2 desired_col is the value I want to be calculated. 回答1: You can use 2 times cumsum() : # reset val desired_col #0 0 1 1 #1 0 5 6 #2 0 4 10 #3 1 2 2 #4 1 -1 -1

cumsum by group

南笙酒味 提交于 2019-12-17 07:48:51
问题 Suppose data looks like group1 group2 num A sg 1 A sh 2 A sg 4 B at 3 B al 7 a <- cumsum(data[,"num"]) # 1 3 7 10 17 I need something accumulated by groups. In reality,I have multiple columns as grouping indicators. I want to get the accumulated sum by the subgroup I define. E.g If I group by group1 only, then the output should be group1 sum A 1 A 3 A 7 B 3 B 10 If I group by two variables group1,group2 then the output is group1 group2 sum A sg 1 A sh 2 A sg 5 B at 3 B al 7 回答1: library(data

Cumsum reset at NaN

孤街浪徒 提交于 2019-12-17 06:42:21
问题 If I have a pandas.core.series.Series named ts of either 1's or NaN's like this: 3382 NaN 3381 NaN ... 3369 NaN 3368 NaN ... 15 1 10 NaN 11 1 12 1 13 1 9 NaN 8 NaN 7 NaN 6 NaN 3 NaN 4 1 5 1 2 NaN 1 NaN 0 NaN I would like to calculate cumsum of this serie but it should be reset (set to zero) at the location of the NaNs like below: 3382 0 3381 0 ... 3369 0 3368 0 ... 15 1 10 0 11 1 12 2 13 3 9 0 8 0 7 0 6 0 3 0 4 1 5 2 2 0 1 0 0 0 Ideally I would like to have a vectorized solution ! I ever see

How to groupby consecutive values in pandas DataFrame

流过昼夜 提交于 2019-12-16 20:17:27
问题 I have a column in a DataFrame with values: [1, 1, -1, 1, -1, -1] How can I group them like this? [1,1] [-1] [1] [-1, -1] 回答1: You can use groupby by custom Series : df = pd.DataFrame({'a': [1, 1, -1, 1, -1, -1]}) print (df) a 0 1 1 1 2 -1 3 1 4 -1 5 -1 print ((df.a != df.a.shift()).cumsum()) 0 1 1 1 2 2 3 3 4 4 5 4 Name: a, dtype: int32 for i, g in df.groupby([(df.a != df.a.shift()).cumsum()]): print (i) print (g) print (g.a.tolist()) a 0 1 1 1 [1, 1] 2 a 2 -1 [-1] 3 a 3 1 [1] 4 a 4 -1 5 -1

Pandas taking Cumulative Sum with Reset

余生颓废 提交于 2019-12-14 03:56:09
问题 Problem I'm trying to keep a running total of consecutive timestamps (minute frequency). I currently have a way of taking a cumulative sum and resetting it on the condition that two columns do not match, but its done with a for loop. I was wondering if there is a way to do this without the loop. Code cb_arbitrage['shift'] = cb_arbitrage.index.shift(1, freq='T') Returns: cccccccc bbbbbbbb cb_spread shift timestamp 2017-07-07 18:23:00 2535.002000 2524.678462 10.323538 2017-07-07 18:24:00 2017

R: filling up data gaps with NAs and applying cumsum function

馋奶兔 提交于 2019-12-14 03:14:43
问题 It was requested that I would break down my question asked here (R: Applying cumulative sum function and filling data gaps with NA for plotting) a little and post a smaller sample. Here it is and here you can find my sample data: https://dl.dropboxusercontent.com/u/16277659/inputdata.csv NAME; ID; SURVEY_YEAR; REFERENCE_YEAR; VALUE SAMPLE1; 253; 1883; 1883; 0 SAMPLE1; 253; 1884; 1883; NA SAMPLE1; 253; 1885; 1884; 12 SAMPLE1; 253; 1890; 1889; 17 SAMPLE2; 261; 1991; 1991; 0 SAMPLE2; 261; 1992;

R: Applying cumulative sum function and filling data gaps with NA for plotting

为君一笑 提交于 2019-12-13 21:30:22
问题 I have a dataframe which looks like this and I am trying to calculate the cumulative sum for the row VALUE. The input file can also be found here: https://dl.dropboxusercontent.com/u/16277659/input.csv df <-read.csv("input.csv", sep=";", header=TRUE) NAME; ID; SURVEY_YEAR REFERENCE_YEAR; VALUE SAMPLE1; 253; 1880; 1879; 14 SAMPLE1; 253; 1881; 1880; -10 SAMPLE1; 253; 1882; 1881; 4 SAMPLE1; 253; 1883; 1882; 10 SAMPLE1; 253; 1884; 1883; 10 SAMPLE1; 253; 1885; 1884; 12 SAMPLE1; 253; 1889; 1888; 11

Group together two columns with ID, do the cumulative for two columns

寵の児 提交于 2019-12-13 09:29:48
问题 Edit: I wrote the question way to unstructured, let me try again. I want to create two new columns, winner_total_points and loser_total_points to the dataset below. winner <- c(1,2,3,4,1,2) loser <- c(2,3,1,3,3,1) winner_points <- c(5,4,12,2,1,6) loser_points <- c(5,2,2,6,6,2) test_data <- data.frame(winner, loser, winner_points, loser_points) What I want those two columns to do is that winner_total_points to sum all the points the winner has gotten (excluding this match) as both the winner

Cumulative sum at intervals

早过忘川 提交于 2019-12-11 17:54:22
问题 Consider this dataframe: dfgg Out[305]: Parts_needed output Year Month PartId 2018 1 L27849 72 72 2 L27849 75 147 3 L27849 101 248 4 L27849 103 351 5 L27849 77 6 L27849 120 7 L27849 59 8 L27849 79 9 L27849 28 10 L27849 64 11 L27849 511 12 L27849 34 2019 1 L27849 49 2 L27849 68 3 L27849 75 4 L27849 45 5 L27849 84 6 L27849 42 7 L27849 40 8 L27849 52 9 L27849 106 10 L27849 75 11 L27849 176 12 L27849 58 2193 2020 1 L27849 135 2328 2 L27849 45 2301 3 L27849 21 2247 4 L27849 35 5 L27849 17 6 L27849