plyr | 易学教程

plyr split_indices function crashes for long vectors

阅读更多关于 plyr split_indices function crashes for long vectors

问题 I am trying to run acast function from the package reshape2 on a large data set, and getting the program crash. I was able to localize this problem: library(plyr) n <- 15784000 g <- 1:n split_indices(g, n) # NOTE for copy/pasters: # this may result in an abort and R exit I am getting the following error message: *** caught segfault *** address 0x7ffffc3c44f0, cause 'memory not mapped' Traceback: 1: .Call("split_indices", group, as.integer(n)) 2: split_indices(g, n) If I reduce the value of n:

How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs

阅读更多关于 How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs

问题 I know it is a very silly question but I could not sort it out that is why asking... How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs as rownames. e.g. IDs Var2 Ae4 2 Ae4 4 Ae4 6 Bc3 3 Bc3 5 Ad2 8 Ad2 7 OutPut Var(x) Ae4 4 Bc3 4 Ad2 7.5 回答1: This kinds of things can easily be done using the plyr function ddply : dat = data.frame(ID = rep(LETTERS[1:5], each = 20), value = runif(100)) > head(dat) ID value 1 A 0

R - Group data but apply different functions to different columns

阅读更多关于 R - Group data but apply different functions to different columns

问题 I'd like to group this data but apply different functions to some columns when grouping. ID type isDesc isImage 1 1 1 0 1 1 0 1 1 1 0 1 4 2 0 1 4 2 1 0 6 1 1 0 6 1 0 1 6 1 0 0 I want to group by ID , columns isDesc and isImage can be summed, but I would like to get the value of type as it is. type will be the same through the whole dataset. The result should look like this: ID type isDesc isImage 1 1 1 2 4 2 1 1 6 1 1 1 Currently I am using library(plyr) summarized = ddply(data, .(ID),

count of events per datetime R

阅读更多关于 count of events per datetime R

问题 I have a dataset that contains 4 different events-types (A, B, C, D) that happens a lot of times daily. I have such a log for over a year. The "EventType" attribute is a 'factor'. For eg, my dataset looks like this: DateTime,EventType 6/5/2013 9:35,B 6/5/2013 9:35,A 6/5/2013 9:35,B 6/5/2013 9:36,D 6/5/2013 9:39,A 6/5/2013 9:40,B 7/5/2013 9:35,B 7/5/2013 9:35,A 7/5/2013 9:35,B 7/5/2013 9:36,D 7/5/2013 9:39,A 7/5/2013 9:40,B 8/5/2013 9:35,A 8/5/2013 9:35,A 8/5/2013 9:35,B 8/5/2013 9:36,B 8/5

dplyr equivalent to ddply in plyr diamonds example

阅读更多关于 dplyr equivalent to ddply in plyr diamonds example

问题 ok, I'm trying to wrap my head around dplyr, using it instead of plyr. In my short time with R I've grown somewhat accustomed to ddply. I'm using a "simple" example for how to use dplyr as opposed to ddply in plyr. Here goes: in the following: t1.table <- ddply(diamonds, c("clarity", "cut"), "nrow") I receive a summary table of counts of diamonds by clarity and cut. In dplyr, the simplest example I can come up with is: diamonds %>% select(clarity, cut) %>% group_by(clarity, cut) %>% summarise

R ddply, applying if and ifelse functions

阅读更多关于 R ddply, applying if and ifelse functions

问题 I'm trying to apply a function to a dataframe using ddply from the plyr package, but I'm getting some results that I don't understand. I have 3 questions about the results Given: mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2) , c(0,1,2,1,1,2)) colnames(mydf)[1] <- 'n' colnames(mydf)[2] <- 'x' colnames(mydf)[3] <- 'x1' mydf looks like this: n x x1 1 12 1 0 2 34 2 1 3 9 1 2 4 3 1 1 5 22 2 1 6 55 2 2 Question #1 If I do: k <- function(x) { mydf$z <- ifelse(x == 1, 0, mydf$n) return (mydf) }

R ddply, applying if and ifelse functions

阅读更多关于 R ddply, applying if and ifelse functions

Converting a data frame to a matrix with plyr daply

阅读更多关于 Converting a data frame to a matrix with plyr daply

问题 I'm trying to use the daply function in the plyr package but I cannot get it to output properly. Even though the variable that makes up the matrix is numeric, the elements of the matrix are lists, not the variable itself. Here is a small subset of the data for example sake: Month Vehicle Samples 1 Oct-10 31057 256 2 Oct-10 31059 316 3 Oct-10 31060 348 4 Nov-10 31057 267 5 Nov-10 31059 293 6 Nov-10 31060 250 7 Dec-10 31057 159 8 Dec-10 31059 268 9 Dec-10 31060 206 And I would like to be able

R - Replace specific value contents with NA [duplicate]

阅读更多关于 R - Replace specific value contents with NA [duplicate]

问题 This question already has answers here : Replacing character values with NA in a data frame (6 answers) Closed 4 months ago . I have a fairly large data frame that has multiple "-" which represent missing data. The data frame consisted of multiple Excel files, which could not use the "na.strings =" or alternative function, so I had to import them with the "-" representation. How can I replace all "-" in the data frame with NA / missing values? The data frame consists of 200 columns of

Using ifelse with transform in ddply

阅读更多关于 Using ifelse with transform in ddply

问题 I am trying to use ddply with transform to populate a new variable ( summary_Date ) in a dataframe with variables ID and Date . The value of the variable is chosen based on the length of the piece that is being evaluated using ifelse : If there are less than five observations for an ID in a given month, I want to have summary_Date be calculated by rounding the date to the nearest month (using round_date from package lubridate ); if there are more than five observations for an ID in a given