plyr

R growth rate calculation week over week on daily timeseries data

两盒软妹~` 提交于 2020-04-30 10:56:38
问题 I'm trying to calculate w/w growth rates entirely in R. I could use excel, or preprocess with ruby, but that's not the point. data.frame example date gpv type 1 2013-04-01 12900 back office 2 2013-04-02 16232 back office 3 2013-04-03 10035 back office I want to do this factored by 'type' and I need to wrap up the Date type column into weeks. And then calculate the week over week growth. I think I need to do ddply to group by week - with a custom function that determines if a date is in a

R growth rate calculation week over week on daily timeseries data

余生颓废 提交于 2020-04-30 10:56:18
问题 I'm trying to calculate w/w growth rates entirely in R. I could use excel, or preprocess with ruby, but that's not the point. data.frame example date gpv type 1 2013-04-01 12900 back office 2 2013-04-02 16232 back office 3 2013-04-03 10035 back office I want to do this factored by 'type' and I need to wrap up the Date type column into weeks. And then calculate the week over week growth. I think I need to do ddply to group by week - with a custom function that determines if a date is in a

如何按组对变量求和

ぃ、小莉子 提交于 2020-03-15 01:21:03
假设我有两列数据。 第一个包含诸如“第一”,“第二”,“第三”等类别。第二个具有代表我看到“第一”的次数的数字。 例如: Category Frequency First 10 First 15 First 5 Second 2 Third 14 Third 20 Second 3 我想按类别对数据进行排序并求和: Category Frequency First 30 Second 5 Third 34 我将如何在R中执行此操作? #1楼 如果 x 是包含数据的数据框,则以下操作将满足您的要求: require(reshape) recast(x, Category ~ ., fun.aggregate=sum) #2楼 library(plyr) ddply(tbl, .(Category), summarise, sum = sum(Frequency)) #3楼 只是添加第三个选项: require(doBy) summaryBy(Frequency~Category, data=yourdataframe, FUN=sum) 编辑:这是一个非常古老的答案。 现在,我建议使用 group_by 并从 dplyr summarise ,如@docendo答案中所示。 #4楼 使用 aggregate : aggregate(x$Frequency, by=list

Aggregate rows by shared values in a variable

安稳与你 提交于 2020-02-20 06:05:13
问题 I have a somewhat dumb R question. If I have a matrix (or dataframe, whichever is easier to work with) like: Year Match 2008 1808 2008 137088 2008 1 2008 56846 2007 2704 2007 169876 2007 75750 2006 2639 2006 193990 2006 2 And I wanted to sum each of the match counts for the years (so, e.g. the 2008 row would be 2008 195743 , how would I go about doing this? I've got a few solutions in my head but they are all needlessly complicated and R tends to have some much easier solution tucked away

Aggregate rows by shared values in a variable

China☆狼群 提交于 2020-02-20 06:04:29
问题 I have a somewhat dumb R question. If I have a matrix (or dataframe, whichever is easier to work with) like: Year Match 2008 1808 2008 137088 2008 1 2008 56846 2007 2704 2007 169876 2007 75750 2006 2639 2006 193990 2006 2 And I wanted to sum each of the match counts for the years (so, e.g. the 2008 row would be 2008 195743 , how would I go about doing this? I've got a few solutions in my head but they are all needlessly complicated and R tends to have some much easier solution tucked away

Aggregate rows by shared values in a variable

天大地大妈咪最大 提交于 2020-02-20 06:04:09
问题 I have a somewhat dumb R question. If I have a matrix (or dataframe, whichever is easier to work with) like: Year Match 2008 1808 2008 137088 2008 1 2008 56846 2007 2704 2007 169876 2007 75750 2006 2639 2006 193990 2006 2 And I wanted to sum each of the match counts for the years (so, e.g. the 2008 row would be 2008 195743 , how would I go about doing this? I've got a few solutions in my head but they are all needlessly complicated and R tends to have some much easier solution tucked away

Replacing column values with maximum by group

风流意气都作罢 提交于 2020-01-25 22:18:27
问题 Say I want to locate the maximum values in one column based on the value of another (i.e. max by group). I found a number of helpful threads on how to do this (ex1 ex2). For example, using the plyr package, ddply(data, .(x), summarise, max.score=max(y)) returns a list of the maximum values of y for each x. However, what if I then wanted to replace all elements in x < max(y) with max(y) itself? (The specific application would be to recode all dates in a particular set with that set's end date.

loop_apply.o: file not recognized: File format not recognized

余生颓废 提交于 2020-01-24 22:12:07
问题 I am trying to install R ’s plyr package. Here is the error message: * installing *source* package ‘plyr’ ... ** package ‘plyr’ successfully unpacked and MD5 sums checked ** libs clang++ -I/opt/R-3.4.1/include -DNDEBUG -I"/home/isomorphismes/R/i686-pc-linux-gnu-library/3.4/Rcpp/include" -I/usr/local/include -fpic -I/opt/boost_1_61_0/boost -c RcppExports.cpp -o RcppExports.o clang -I/opt/R-3.4.1/include -DNDEBUG -I"/home/cd/R/i686-pc-linux-gnu-library/3.4/Rcpp/include" -I/usr/local/include

Return value based on finding closest value between other two columns in df

房东的猫 提交于 2020-01-24 12:24:57
问题 My question is almost identical to this one except instead of finding the closest value between a column value and a fixed number, e.g. "2", I want to find the closest value to the value in another column. . Here's an example of data: df <- data.frame(site_no=c("01010500", "01010500", "01010500","02010500", "02010500", "02010500", "03010500", "03010500", "03010500"), OBS=c(423.9969, 423.9969, 423.9969, 123, 123, 123, 150,150,150), MOD=c(380,400,360,150,155,135,170,180,140), HT=c(14,12,15,3,8

R plyr, data.table, apply certain columns of data.frame

柔情痞子 提交于 2020-01-22 20:58:05
问题 I am looking for ways to speed up my code. I am looking into the apply / ply methods as well as data.table . Unfortunately, I am running into problems. Here is a small sample data: ids1 <- c(1, 1, 1, 1, 2, 2, 2, 2) ids2 <- c(1, 2, 3, 4, 1, 2, 3, 4) chars1 <- c("aa", " bb ", "__cc__", "dd ", "__ee", NA,NA, "n/a") chars2 <- c("vv", "_ ww_", " xx ", "yy__", " zz", NA, "n/a", "n/a") data <- data.frame(col1 = ids1, col2 = ids2, col3 = chars1, col4 = chars2, stringsAsFactors = FALSE) Here is a