cumsum

Plotting in ggplot using cumsum

∥☆過路亽.° 提交于 2020-06-17 14:18:29
问题 I am trying to use ggplot2 to plot a date column vs. a numeric column. I have a dataframe that I am trying to manipulate with country as either china or not china, and successfully created the dataframe linked below with: is_china <- confirmed_cases_worldwide %>% filter(country == "China", type=='confirmed') %>% group_by(country) %>% mutate(cumu_cases = cumsum(cases)) is_not_china <- confirmed_cases_worldwide %>% filter(country != "China", type=='confirmed') %>% mutate(cumu_cases = cumsum

Plotting in ggplot using cumsum

心已入冬 提交于 2020-06-17 14:18:12
问题 I am trying to use ggplot2 to plot a date column vs. a numeric column. I have a dataframe that I am trying to manipulate with country as either china or not china, and successfully created the dataframe linked below with: is_china <- confirmed_cases_worldwide %>% filter(country == "China", type=='confirmed') %>% group_by(country) %>% mutate(cumu_cases = cumsum(cases)) is_not_china <- confirmed_cases_worldwide %>% filter(country != "China", type=='confirmed') %>% mutate(cumu_cases = cumsum

Counting consecutive 1's in NumPy array

*爱你&永不变心* 提交于 2020-06-12 06:26:32
问题 [1, 1, 1, 0, 0, 0, 1, 1, 0, 0] I have a NumPy array consisting of 0's and 1's like above. How can I add all consecutive 1's like below? Any time I encounter a 0, I reset. [1, 2, 3, 0, 0, 0, 1, 2, 0, 0] I can do this using a for loop, but is there a vectorized solution using NumPy? 回答1: Here's a vectorized approach - def island_cumsum_vectorized(a): a_ext = np.concatenate(( [0], a, [0] )) idx = np.flatnonzero(a_ext[1:] != a_ext[:-1]) a_ext[1:][idx[1::2]] = idx[::2] - idx[1::2] return a_ext

Dataframe cell to be locked and used for a running balance calculation (follow up question)

天涯浪子 提交于 2020-04-11 17:58:20
问题 (This is a follow up question to my previous question which was answered correctly). Say I have the following dataframe import pandas as pd df = pd.DataFrame() df['E'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL','SHODL','SCLOSE_BUY','BCLOSE_SELL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT') df['F'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 2.00,10.00,10.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00) df.loc[19, 'G'] = 100

Percentage of events before and after a sequence of zeros in pandas rows

廉价感情. 提交于 2020-02-24 00:38:14
问题 I have a dataframe like the following: ID 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89 90 total ----------------------------------------------------------------------------------------------------- 0 A 2 21 0 18 3 0 0 0 2 ... 0 0 0 0 0 0 0 0 0 0 156 1 B 0 20 12 2 0 8 14 23 0 ... 0 0 0 0 0 0 0 0 0 0 231 2 C 0 38 19 3 1 3 3 7 1 ... 0 0 0 0 0 0 0 0 0 0 78 3 D 3 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 5 and I want to know the % of events (the numbers in the cells) before and after the first

Create new group based on cumulative sum and group

吃可爱长大的小学妹 提交于 2020-01-13 19:31:26
问题 I am looking to create a new group based on two conditions. I want all of the cases until the cumulative sum of Value reaches 10 to be grouped together and I want this done within each person. I have managed to get it to work for each of the conditions separately, but not together using for loops and dplyr. However, I need both of these conditions to be applied. Below is what I would like the data to look like (I don't need an RunningSum_Value column, but I kept it in for clarification).

draw random element in numpy

怎甘沉沦 提交于 2020-01-12 19:00:09
问题 I have an array of element probabilities, let's say [0.1, 0.2, 0.5, 0.2] . The array sums up to 1.0. Using plain Python or numpy, I want to draw elements proportional to their probability: the first element about 10% of the time, second 20%, third 50% etc. The "draw" should return index of the element drawn. I came up with this: def draw(probs): cumsum = numpy.cumsum(probs / sum(probs)) # sum up to 1.0, just in case return len(numpy.where(numpy.random.rand() >= cumsum)[0]) It works, but it's

ggplot2 and cumsum()

徘徊边缘 提交于 2020-01-12 08:00:06
问题 I have a set of UNIX timestamps and URIs and I'm trying to plot the cumulative count of requests for each URI. I managed to do that for one URI at a time using a dummy column: x.df$count <- apply(x.df,1,function(row) 1) # Create a dummy column for cumsum x.df <- x.df[order(x.df$time, decreasing=FALSE),] # Sort ggplot(x.df, aes(x=time, y=cumsum(count))) + geom_line() However, that would make roughly 30 plots in my case. ggplot2 does allow you to plot multiple lines into one plot (I copied this

Can R do operations like cumsum in-place?

瘦欲@ 提交于 2020-01-04 16:57:23
问题 In Python I can do this: a = np.arange(100) print id(a) # shows some number a[:] = np.cumsum(a) print(id(a)) # shows the same number What I did here was to replace the contents of a with its cumsum. The address before and after is the same. Now let's try it in R: install.packages('pryr') library(pryr) a = 0:99 print(address(a)) # shows some number a[1:length(a)] = cumsum(a) print(address(a)) # shows a different number! The question is how can I overwrite already-allocated memory in R with the

Filling gaps for cumulative sum with Pandas

十年热恋 提交于 2020-01-04 06:19:42
问题 I'm trying to calculate the inventory of stocks from a table in monthly buckets in Pandas. This is the table: Goods | Incoming | Date -------+------------+----------- 'a' | 10 | 2014-01-10 'a' | 20 | 2014-02-01 'b' | 30 | 2014-01-02 'b' | 40 | 2014-05-13 'a' | 20 | 2014-06-30 'c' | 10 | 2014-02-10 'c' | 50 | 2014-05-10 'b' | 70 | 2014-03-10 'a' | 10 | 2014-02-10 This is my code so far: import pandas as pd df = pd.DataFrame({ 'goods': ['a', 'a', 'b', 'b', 'a', 'c', 'c', 'b', 'a'], 'incoming':