plyr

Set column name ddply

十年热恋 提交于 2020-01-11 05:55:31
问题 How to set the column name of the summarized data in library(plyr) ddply(data,.(col1,col2),nrow) like in ddply(data,.(col1,col2),function(x) data.frame(number=nrow(x))) 回答1: Perhaps you are looking for summarize (or mutate or transform , depending on what you want to do). A small example: set.seed(1) data <- data.frame(col1 = c(1, 2, 2, 3, 3, 4), col2 = c(1, 2, 2, 1, 2, 1), z = rnorm(6)) ddply(data,.(col1,col2), summarize, number = length(z), newcol = mean(z)) # col1 col2 number newcol # 1 1

Set column name ddply

柔情痞子 提交于 2020-01-11 05:55:22
问题 How to set the column name of the summarized data in library(plyr) ddply(data,.(col1,col2),nrow) like in ddply(data,.(col1,col2),function(x) data.frame(number=nrow(x))) 回答1: Perhaps you are looking for summarize (or mutate or transform , depending on what you want to do). A small example: set.seed(1) data <- data.frame(col1 = c(1, 2, 2, 3, 3, 4), col2 = c(1, 2, 2, 1, 2, 1), z = rnorm(6)) ddply(data,.(col1,col2), summarize, number = length(z), newcol = mean(z)) # col1 col2 number newcol # 1 1

Accessing grouped data in dplyr

时光总嘲笑我的痴心妄想 提交于 2020-01-10 19:06:08
问题 How can I access the grouped data after applying group_by function from dplyr and using %.% operator For example, If I want to have the first row of each grouped data then I can do this using plyr package as ddply(iris,.(Species),function(df){ df[1,] }) #output # Sepal.Length Sepal.Width Petal.Length Petal.Width Species #1 5.1 3.5 1.4 0.2 setosa #2 7.0 3.2 4.7 1.4 versicolor #3 6.3 3.3 6.0 2.5 virginica 回答1: For your specific case, you can use row_number() : library(dplyr) iris %.% group_by

Efficient multiplication of columns in a data frame

可紊 提交于 2020-01-09 10:07:48
问题 I have a large data frame in which I am multiplying two columns together to get another column. At first I was running a for-loop, like so: for(i in 1:nrow(df)){ df$new_column[i] <- df$column1[i] * df$column2[i] } but this takes like 9 days. Another alternative was plyr , and I actually might be using the variables incorrectly: new_df <- ddply(df, .(column1,column2), transform, new_column = column1 * column2) but this is taking forever 回答1: As Blue Magister said in comments, df$new_column <-

Subsetting based on observations in a month

你说的曾经没有我的故事 提交于 2020-01-05 11:04:29
问题 I'm trying to subset some data and am stuck on the last part of cleaning. What I need to do is calculate the number of observations for each individual (indivID) in months (June, July, and August) and return a percentage for each without missing data and then keep those observations that are over 75%. I was able to create a nested for loop, but it took probably 6 hours to process today. I would like to be able to take advantage of parallel computer by using ddply, or another function, but an

How to include all levels in a ggplot legend when using d_plyr

北战南征 提交于 2020-01-05 03:58:04
问题 goal: a legend which contains two levels of a factor, even if both levels are not represented on the figure minimum reproducible example: library(ggplot2) library(plyr) mre <- data.frame(plotfactor = factor(rep(c("response1", "response2"), c(2, 2))), linefactor = factor(rep(c("line1", "line2"), 2)), x1 = runif(n = 4), x2 = runif(n = 4), y1 = runif(n = 4), y2 = runif(n = 4), ltype = c("foo", "foo", "foo", "bar")) ## this looks great! ggplot(mre, aes(x = x1, xend = x2, y = y1, yend = y2, colour

Matching multiple date values in R

本小妞迷上赌 提交于 2020-01-03 17:04:04
问题 I have the following dataframe DF describing people that have worked on a project on certain dates: ID ProjectName StartDate 1 Health 3/1/06 18:20 2 Education 2/1/07 15:30 1 Education 5/3/09 9:00 3 Wellness 4/1/10 12:00 2 Health 6/1/11 14:20 The goal is to find the first project corresponding to each ID. For example the expected output would be as follows: ID ProjectName StartDate 1 Health 3/1/06 18:20 2 Education 2/1/07 15:30 3 Wellness 4/1/10 12:00 So far I have done the following to get

Selecting specific rows based on values in 2 columns in R

丶灬走出姿态 提交于 2020-01-03 05:05:17
问题 I have a large data set of GPS collar locations that have a varying number of locations each day. I want to separate out only the days that have a single location collected and make a new data frame containing all their information. month day easting northing time ID 6 1 ####### ######## 0:00 ## 6 2 ####### ######## 6:00 ## 6 2 ####### ######## 0:00 ## 6 3 ####### ######## 18:00 ## 6 3 ####### ######## 12:00 ## 6 4 ####### ######## 0:00 ## 6 5 ####### ######## 6:00 ## Currently I have hashed

How can I apply different aggregate functions to different columns in R?

匆匆过客 提交于 2020-01-03 02:27:31
问题 How can I apply different aggregate functions to different columns in R? The aggregate() function only offers one function argument to be passed: V1 V2 V3 1 18.45022 62.24411694 2 90.34637 20.86505214 1 50.77358 27.30074987 2 52.95872 30.26189013 1 61.36935 26.90993530 2 49.31730 70.60387016 1 43.64142 87.64433517 2 36.19730 83.47232907 1 91.51753 0.03056485 ... ... ... > aggregate(sample,by=sample["V1"],FUN=sum) V1 V1 V2 V3 1 1 10 578.5299 489.5307 2 2 20 575.2294 527.2222 How can I apply a

Calculate days since last event, grouped per ID in R

感情迁移 提交于 2020-01-02 23:09:10
问题 Have a look at the MWE below: df <- data.frame(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001","23/05/2001","26/08/2001"), "%d/%m/%Y"), event=c(0,0,1,0,1,1,0)) id date event 01 2000-07-06 0 01 2000-09-15 0 01 2000-10-15 1 01 2001-01-03 0 02 2001-03-17 1 02 2001-05-23 1 02 2001-08-26 0 02 2001-08-28 0 03 2001-08-29 1 03 2001-09-05 1 03 2001-09-30 0 03 2001-10-12 1 I want, grouped per ID, the number of days since the last event. My question is similar to this one