cumsum | 易学教程

Plotting in ggplot using cumsum

阅读更多关于 Plotting in ggplot using cumsum

问题 I am trying to use ggplot2 to plot a date column vs. a numeric column. I have a dataframe that I am trying to manipulate with country as either china or not china, and successfully created the dataframe linked below with: is_china <- confirmed_cases_worldwide %>% filter(country == "China", type=='confirmed') %>% group_by(country) %>% mutate(cumu_cases = cumsum(cases)) is_not_china <- confirmed_cases_worldwide %>% filter(country != "China", type=='confirmed') %>% mutate(cumu_cases = cumsum

Plotting in ggplot using cumsum

阅读更多关于 Plotting in ggplot using cumsum

Counting consecutive 1's in NumPy array

阅读更多关于 Counting consecutive 1's in NumPy array

问题 [1, 1, 1, 0, 0, 0, 1, 1, 0, 0] I have a NumPy array consisting of 0's and 1's like above. How can I add all consecutive 1's like below? Any time I encounter a 0, I reset. [1, 2, 3, 0, 0, 0, 1, 2, 0, 0] I can do this using a for loop, but is there a vectorized solution using NumPy? 回答1: Here's a vectorized approach - def island_cumsum_vectorized(a): a_ext = np.concatenate(( [0], a, [0] )) idx = np.flatnonzero(a_ext[1:] != a_ext[:-1]) a_ext[1:][idx[1::2]] = idx[::2] - idx[1::2] return a_ext

Dataframe cell to be locked and used for a running balance calculation (follow up question)

阅读更多关于 Dataframe cell to be locked and used for a running balance calculation (follow up question)

问题 (This is a follow up question to my previous question which was answered correctly). Say I have the following dataframe import pandas as pd df = pd.DataFrame() df['E'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL','SHODL','SCLOSE_BUY','BCLOSE_SELL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT') df['F'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 2.00,10.00,10.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00) df.loc[19, 'G'] = 100

Percentage of events before and after a sequence of zeros in pandas rows

阅读更多关于 Percentage of events before and after a sequence of zeros in pandas rows

问题 I have a dataframe like the following: ID 0 1 2 3 4 5 6 7 8 ... 81 82 83 84 85 86 87 88 89 90 total ----------------------------------------------------------------------------------------------------- 0 A 2 21 0 18 3 0 0 0 2 ... 0 0 0 0 0 0 0 0 0 0 156 1 B 0 20 12 2 0 8 14 23 0 ... 0 0 0 0 0 0 0 0 0 0 231 2 C 0 38 19 3 1 3 3 7 1 ... 0 0 0 0 0 0 0 0 0 0 78 3 D 3 0 0 1 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 5 and I want to know the % of events (the numbers in the cells) before and after the first

Create new group based on cumulative sum and group

阅读更多关于 Create new group based on cumulative sum and group

问题 I am looking to create a new group based on two conditions. I want all of the cases until the cumulative sum of Value reaches 10 to be grouped together and I want this done within each person. I have managed to get it to work for each of the conditions separately, but not together using for loops and dplyr. However, I need both of these conditions to be applied. Below is what I would like the data to look like (I don't need an RunningSum_Value column, but I kept it in for clarification).

draw random element in numpy

阅读更多关于 draw random element in numpy

问题 I have an array of element probabilities, let's say [0.1, 0.2, 0.5, 0.2] . The array sums up to 1.0. Using plain Python or numpy, I want to draw elements proportional to their probability: the first element about 10% of the time, second 20%, third 50% etc. The "draw" should return index of the element drawn. I came up with this: def draw(probs): cumsum = numpy.cumsum(probs / sum(probs)) # sum up to 1.0, just in case return len(numpy.where(numpy.random.rand() >= cumsum)[0]) It works, but it's

ggplot2 and cumsum()

阅读更多关于 ggplot2 and cumsum()

问题 I have a set of UNIX timestamps and URIs and I'm trying to plot the cumulative count of requests for each URI. I managed to do that for one URI at a time using a dummy column: x.df$count <- apply(x.df,1,function(row) 1) # Create a dummy column for cumsum x.df <- x.df[order(x.df$time, decreasing=FALSE),] # Sort ggplot(x.df, aes(x=time, y=cumsum(count))) + geom_line() However, that would make roughly 30 plots in my case. ggplot2 does allow you to plot multiple lines into one plot (I copied this

Can R do operations like cumsum in-place?

阅读更多关于 Can R do operations like cumsum in-place?

问题 In Python I can do this: a = np.arange(100) print id(a) # shows some number a[:] = np.cumsum(a) print(id(a)) # shows the same number What I did here was to replace the contents of a with its cumsum. The address before and after is the same. Now let's try it in R: install.packages('pryr') library(pryr) a = 0:99 print(address(a)) # shows some number a[1:length(a)] = cumsum(a) print(address(a)) # shows a different number! The question is how can I overwrite already-allocated memory in R with the

Filling gaps for cumulative sum with Pandas

阅读更多关于 Filling gaps for cumulative sum with Pandas

问题 I'm trying to calculate the inventory of stocks from a table in monthly buckets in Pandas. This is the table: Goods | Incoming | Date -------+------------+----------- 'a' | 10 | 2014-01-10 'a' | 20 | 2014-02-01 'b' | 30 | 2014-01-02 'b' | 40 | 2014-05-13 'a' | 20 | 2014-06-30 'c' | 10 | 2014-02-10 'c' | 50 | 2014-05-10 'b' | 70 | 2014-03-10 'a' | 10 | 2014-02-10 This is my code so far: import pandas as pd df = pd.DataFrame({ 'goods': ['a', 'a', 'b', 'b', 'a', 'c', 'c', 'b', 'a'], 'incoming':