plyr

plyr split_indices function crashes for long vectors

亡梦爱人 提交于 2020-01-02 07:52:10
问题 I am trying to run acast function from the package reshape2 on a large data set, and getting the program crash. I was able to localize this problem: library(plyr) n <- 15784000 g <- 1:n split_indices(g, n) # NOTE for copy/pasters: # this may result in an abort and R exit I am getting the following error message: *** caught segfault *** address 0x7ffffc3c44f0, cause 'memory not mapped' Traceback: 1: .Call("split_indices", group, as.integer(n)) 2: split_indices(g, n) If I reduce the value of n:

How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs

邮差的信 提交于 2020-01-02 06:56:09
问题 I know it is a very silly question but I could not sort it out that is why asking... How can I extract the rows from a large data set by common IDs and take the means of these rows and make a column having these IDs as rownames. e.g. IDs Var2 Ae4 2 Ae4 4 Ae4 6 Bc3 3 Bc3 5 Ad2 8 Ad2 7 OutPut Var(x) Ae4 4 Bc3 4 Ad2 7.5 回答1: This kinds of things can easily be done using the plyr function ddply : dat = data.frame(ID = rep(LETTERS[1:5], each = 20), value = runif(100)) > head(dat) ID value 1 A 0

R - Group data but apply different functions to different columns

若如初见. 提交于 2020-01-02 05:47:06
问题 I'd like to group this data but apply different functions to some columns when grouping. ID type isDesc isImage 1 1 1 0 1 1 0 1 1 1 0 1 4 2 0 1 4 2 1 0 6 1 1 0 6 1 0 1 6 1 0 0 I want to group by ID , columns isDesc and isImage can be summed, but I would like to get the value of type as it is. type will be the same through the whole dataset. The result should look like this: ID type isDesc isImage 1 1 1 2 4 2 1 1 6 1 1 1 Currently I am using library(plyr) summarized = ddply(data, .(ID),

count of events per datetime R

你说的曾经没有我的故事 提交于 2020-01-02 05:32:32
问题 I have a dataset that contains 4 different events-types (A, B, C, D) that happens a lot of times daily. I have such a log for over a year. The "EventType" attribute is a 'factor'. For eg, my dataset looks like this: DateTime,EventType 6/5/2013 9:35,B 6/5/2013 9:35,A 6/5/2013 9:35,B 6/5/2013 9:36,D 6/5/2013 9:39,A 6/5/2013 9:40,B 7/5/2013 9:35,B 7/5/2013 9:35,A 7/5/2013 9:35,B 7/5/2013 9:36,D 7/5/2013 9:39,A 7/5/2013 9:40,B 8/5/2013 9:35,A 8/5/2013 9:35,A 8/5/2013 9:35,B 8/5/2013 9:36,B 8/5

dplyr equivalent to ddply in plyr diamonds example

蓝咒 提交于 2020-01-02 04:38:08
问题 ok, I'm trying to wrap my head around dplyr, using it instead of plyr. In my short time with R I've grown somewhat accustomed to ddply. I'm using a "simple" example for how to use dplyr as opposed to ddply in plyr. Here goes: in the following: t1.table <- ddply(diamonds, c("clarity", "cut"), "nrow") I receive a summary table of counts of diamonds by clarity and cut. In dplyr, the simplest example I can come up with is: diamonds %>% select(clarity, cut) %>% group_by(clarity, cut) %>% summarise

R ddply, applying if and ifelse functions

本秂侑毒 提交于 2020-01-01 09:42:53
问题 I'm trying to apply a function to a dataframe using ddply from the plyr package, but I'm getting some results that I don't understand. I have 3 questions about the results Given: mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2) , c(0,1,2,1,1,2)) colnames(mydf)[1] <- 'n' colnames(mydf)[2] <- 'x' colnames(mydf)[3] <- 'x1' mydf looks like this: n x x1 1 12 1 0 2 34 2 1 3 9 1 2 4 3 1 1 5 22 2 1 6 55 2 2 Question #1 If I do: k <- function(x) { mydf$z <- ifelse(x == 1, 0, mydf$n) return (mydf) }

R ddply, applying if and ifelse functions

女生的网名这么多〃 提交于 2020-01-01 09:41:28
问题 I'm trying to apply a function to a dataframe using ddply from the plyr package, but I'm getting some results that I don't understand. I have 3 questions about the results Given: mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2) , c(0,1,2,1,1,2)) colnames(mydf)[1] <- 'n' colnames(mydf)[2] <- 'x' colnames(mydf)[3] <- 'x1' mydf looks like this: n x x1 1 12 1 0 2 34 2 1 3 9 1 2 4 3 1 1 5 22 2 1 6 55 2 2 Question #1 If I do: k <- function(x) { mydf$z <- ifelse(x == 1, 0, mydf$n) return (mydf) }

Converting a data frame to a matrix with plyr daply

风格不统一 提交于 2020-01-01 03:11:10
问题 I'm trying to use the daply function in the plyr package but I cannot get it to output properly. Even though the variable that makes up the matrix is numeric, the elements of the matrix are lists, not the variable itself. Here is a small subset of the data for example sake: Month Vehicle Samples 1 Oct-10 31057 256 2 Oct-10 31059 316 3 Oct-10 31060 348 4 Nov-10 31057 267 5 Nov-10 31059 293 6 Nov-10 31060 250 7 Dec-10 31057 159 8 Dec-10 31059 268 9 Dec-10 31060 206 And I would like to be able

R - Replace specific value contents with NA [duplicate]

佐手、 提交于 2019-12-31 07:41:44
问题 This question already has answers here : Replacing character values with NA in a data frame (6 answers) Closed 4 months ago . I have a fairly large data frame that has multiple "-" which represent missing data. The data frame consisted of multiple Excel files, which could not use the "na.strings =" or alternative function, so I had to import them with the "-" representation. How can I replace all "-" in the data frame with NA / missing values? The data frame consists of 200 columns of

Using ifelse with transform in ddply

核能气质少年 提交于 2019-12-30 18:59:43
问题 I am trying to use ddply with transform to populate a new variable ( summary_Date ) in a dataframe with variables ID and Date . The value of the variable is chosen based on the length of the piece that is being evaluated using ifelse : If there are less than five observations for an ID in a given month, I want to have summary_Date be calculated by rounding the date to the nearest month (using round_date from package lubridate ); if there are more than five observations for an ID in a given