plyr

How to calculate percentage change from different rows over different spans

早过忘川 提交于 2020-01-21 10:02:49
问题 I am trying to calculate the percentage change in price for quarterly data of companies recognized by a gvkey (1001, 1384, etc...). and it's corresponding quarterly stock price, PRCCQ . gvkey PRCCQ 1 1004 23.750 2 1004 13.875 3 1004 11.250 4 1004 10.375 5 1004 13.600 6 1004 14.000 7 1004 17.060 8 1004 8.150 9 1004 7.400 10 1004 11.440 11 1004 6.200 12 1004 5.500 13 1004 4.450 14 1004 4.500 15 1004 8.010 What I am trying to do is add 8 columns showing 1 quarter return, 2 quarter return, etc.

How to calculate percentage change from different rows over different spans

拈花ヽ惹草 提交于 2020-01-21 10:02:25
问题 I am trying to calculate the percentage change in price for quarterly data of companies recognized by a gvkey (1001, 1384, etc...). and it's corresponding quarterly stock price, PRCCQ . gvkey PRCCQ 1 1004 23.750 2 1004 13.875 3 1004 11.250 4 1004 10.375 5 1004 13.600 6 1004 14.000 7 1004 17.060 8 1004 8.150 9 1004 7.400 10 1004 11.440 11 1004 6.200 12 1004 5.500 13 1004 4.450 14 1004 4.500 15 1004 8.010 What I am trying to do is add 8 columns showing 1 quarter return, 2 quarter return, etc.

How to expand a large dataframe in R

落爺英雄遲暮 提交于 2020-01-19 15:05:48
问题 I have a dataframe df <- data.frame( id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4), date = c("1985-06-19", "1985-06-19", "1985-06-19", "1985-08-01", "1985-08-01", "1990-06-19", "1990-06-19", "1990-06-19", "1990-06-19", "2000-05-12"), spp = c("a", "b", "c", "c", "d", "b", "c", "d", "a", "b"), y = rpois(10, 5)) id date spp y 1 1 1985-06-19 a 6 2 1 1985-06-19 b 3 3 1 1985-06-19 c 7 4 2 1985-08-01 c 7 5 2 1985-08-01 d 6 6 3 1990-06-19 b 5 7 3 1990-06-19 c 4 8 3 1990-06-19 d 4 9 3 1990-06-19 a 6 10 4 2000

How can I calculate an inner product with an arbitrary number of columns using ddply?

这一生的挚爱 提交于 2020-01-16 09:04:04
问题 I want to perform an inner product of the first D columns for each row in a data frame with a given array, W . I am trying the following: W = (1,2,3); ddply(df, .(id), transform, inner_product=c(col1, col2, col3) %*% W); This works but I typically may have an arbitrary number of columns. Can I generalize the above expression to handle that case? Update: This is an updated example as asked for in the comments: libary(kernlab); data(spam); W = array(); W[1:3] = seq(1,3); spamdf = head(spam);

selecting specific rows etc. using ddply

☆樱花仙子☆ 提交于 2020-01-15 05:20:06
问题 I have a three part question based on a dataframe (df is example rows) of goals scored by soccer players in a season Player Season Goals Teddy Sheringham 1992/3 22 Les Ferdinand 1992/3 20 Dean Holdsworth 1992/3 19 Andy Cole 1993/4 34 Alan Shearer 1993/4 31 Chris Sutton 1993/4 25 If I want to obtain the top scorer each year I can use ddply(df, "Season", summarise, maxGoals = max(Goals), Player=Player[which.max(Goals)]) Questions: 1) It does not apply in this case but does this suffice if there

How do I sub sample data by group using ddply?

丶灬走出姿态 提交于 2020-01-13 08:14:20
问题 I've got a data frame with far too many rows to be able to do a spatial correlogram. Instead, I want to grab 40 rows for each species and run my correlogram on that subset. I wrote a function to subset a data frame as follows: samp <- function(dataf) { dataf[sample(1:dim(dataf)[1], size=40, replace=FALSE),] } Now I want to apply this function to each species in a larger data frame. When I try something like culled_data = ddply (larger_data, .(species), subset, samp) I get this error: Error in

ddply summarise proportional count

寵の児 提交于 2020-01-13 02:12:08
问题 I am having some trouble using the ddply function from the plyr package. I am trying to summarise the following data with counts and proportions within each group. Here's my data: structure(list(X5employf = structure(c(1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 3L, 1L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L

Reshape package masking preventing melt from naming columns

谁说胖子不能爱 提交于 2020-01-12 22:30:50
问题 I have a script which requires both reshape and reshape2 libraries. I know this is poor practise, but I think plyr (or another library I am using) Vennerable is loading reshape and I have personally used reshape2 in a lot of places. The problem is that the masking of reshape2 by reshape is causing problems for the melt function # Example data frame df <- data.frame(id=c(1:5), a=c(rnorm(5)), b=c(rnorm(5))) # With just reshape2, variable and value columns are labelled correctly library(reshape2

Conditional NA filling by group

蓝咒 提交于 2020-01-12 14:32:23
问题 edit The question was originally asked for data.table . A solution with any package would be interesting. I am a little stuck with a particular variation of a more general problem. I have panel data that I am using with data.table and I would like to fill in some missing values using the group by functionality of data.table. Unfortunately they are not numeric, so I can't simply interpolate, but they should only be filled in based on a condition. Is it possible to perform a kind of conditional

Conditional NA filling by group

ε祈祈猫儿з 提交于 2020-01-12 14:30:09
问题 edit The question was originally asked for data.table . A solution with any package would be interesting. I am a little stuck with a particular variation of a more general problem. I have panel data that I am using with data.table and I would like to fill in some missing values using the group by functionality of data.table. Unfortunately they are not numeric, so I can't simply interpolate, but they should only be filled in based on a condition. Is it possible to perform a kind of conditional