data.table

Error when trying to store list in data.table of length 1

和自甴很熟 提交于 2021-02-11 08:13:40
问题 When trying to store a vector in a data.table, this works only when the data.table has length of more than one. Please find below a simplified version of the problem library(data.table) Working fine dt <- data.table( a = c("a", "b"), l = list()) dt$l[[1]] <- c(1:3) Results in: a l 1: a 1,2,3 2: b Producing Error dt <- data.table( a = c("a"), l = list()) dt$l[[1]] <- c(1:3) Error in [<-.data.table(x, j = name, value = value) : Supplied 3 items to be assigned to 1 items of column 'l'. The RHS

Filtering observations by using grep the reverse way in R

梦想与她 提交于 2021-02-11 07:43:10
问题 Shown as below: df <- data.frame(X1 = rep(letters[1:3],3), X2 = 1:9, X3 = sample(1:50,9)) df ind<- grep("a|c", df$X1) library(data.table) df_ac <- df[ind,] df_b <- df[!ind,] df_ac is created using the regular grep command. If I want to use the grep the reverse way: to select all observations with X1 == 'b' . I know I can do this by: ind2<- grep("a|c", df$X1, invert = T) df_b <-df[ind2,] But, in my original script, why does the command df_b <-df[!ind,] return a data frame with zero observation

Filtering observations by using grep the reverse way in R

五迷三道 提交于 2021-02-11 07:43:06
问题 Shown as below: df <- data.frame(X1 = rep(letters[1:3],3), X2 = 1:9, X3 = sample(1:50,9)) df ind<- grep("a|c", df$X1) library(data.table) df_ac <- df[ind,] df_b <- df[!ind,] df_ac is created using the regular grep command. If I want to use the grep the reverse way: to select all observations with X1 == 'b' . I know I can do this by: ind2<- grep("a|c", df$X1, invert = T) df_b <-df[ind2,] But, in my original script, why does the command df_b <-df[!ind,] return a data frame with zero observation

New behavior in data.table? .N / something with `by` (calculate proportion)

南楼画角 提交于 2021-02-11 06:43:16
问题 I updated to the latest version of data.table - 1.9.4, from a medium-recent prior version (I think 1.8.X), and now I'm getting some unexpected behavior. set.seed(12312014) # a vector of letters a:e, each repeated between 1 and 10 times type <- unlist(mapply(rep, letters[1:5], round(runif(5, 1, 10), 0))) # a random vector of 3 categories category <- sample(c('small', 'med', 'large'), length(type), replace=T) my_dt <- data.table(type, category) Say I want the proportion of category by type. I

New behavior in data.table? .N / something with `by` (calculate proportion)

折月煮酒 提交于 2021-02-11 06:42:41
问题 I updated to the latest version of data.table - 1.9.4, from a medium-recent prior version (I think 1.8.X), and now I'm getting some unexpected behavior. set.seed(12312014) # a vector of letters a:e, each repeated between 1 and 10 times type <- unlist(mapply(rep, letters[1:5], round(runif(5, 1, 10), 0))) # a random vector of 3 categories category <- sample(c('small', 'med', 'large'), length(type), replace=T) my_dt <- data.table(type, category) Say I want the proportion of category by type. I

data.table calculate sums by two variables and add observations for “empty” groups

柔情痞子 提交于 2021-02-11 06:36:28
问题 Sorry for the bad title - I am trying to achieve the following: I have a data.table dt with two categorical variables "a" and "b". As you can see, a has 5 unique values and b has three. Now e.g. the combination of categorical variables ("a = 1" and "b = 3") is not in the data. library(data.table) set.seed(1) a <- sample(1:5, 10, replace = TRUE) b <- sample(1:3, 10, replace = TRUE) y <- rnorm(10) dt <- data.table(a = a, b = b, y = y) dt[order(a, b), .N, by = c("a", "b")] # a b N #1: 1 1 2 #2:

How to filter rows out of data.table where any column is NA without specifying columns individually

扶醉桌前 提交于 2021-02-11 05:21:09
问题 Given a data.table DT<-data.table(a=c(1,2,NA,4,5), b=c(2,3,4,NA,5),c=c(1,2,3,4,5),d=c(2,3,4,5,6)) how can I do the equivalent of DT[!is.na(a) & !is.na(b) & !is.na(c) & !is.na(d)] in a general form without knowing any of the column names or typing out the !is.na() for each individual column. I could also do DT[apply(DT,1,function(x) !any(is.na(x)))] but I'm wondering if there's a better way still. 回答1: I think you are looking for complete.cases : > DT[complete.cases(DT),] a b c d 1: 1 2 1 2 2:

How to filter rows out of data.table where any column is NA without specifying columns individually

大兔子大兔子 提交于 2021-02-11 05:17:07
问题 Given a data.table DT<-data.table(a=c(1,2,NA,4,5), b=c(2,3,4,NA,5),c=c(1,2,3,4,5),d=c(2,3,4,5,6)) how can I do the equivalent of DT[!is.na(a) & !is.na(b) & !is.na(c) & !is.na(d)] in a general form without knowing any of the column names or typing out the !is.na() for each individual column. I could also do DT[apply(DT,1,function(x) !any(is.na(x)))] but I'm wondering if there's a better way still. 回答1: I think you are looking for complete.cases : > DT[complete.cases(DT),] a b c d 1: 1 2 1 2 2:

How to filter rows out of data.table where any column is NA without specifying columns individually

放肆的年华 提交于 2021-02-11 05:16:35
问题 Given a data.table DT<-data.table(a=c(1,2,NA,4,5), b=c(2,3,4,NA,5),c=c(1,2,3,4,5),d=c(2,3,4,5,6)) how can I do the equivalent of DT[!is.na(a) & !is.na(b) & !is.na(c) & !is.na(d)] in a general form without knowing any of the column names or typing out the !is.na() for each individual column. I could also do DT[apply(DT,1,function(x) !any(is.na(x)))] but I'm wondering if there's a better way still. 回答1: I think you are looking for complete.cases : > DT[complete.cases(DT),] a b c d 1: 1 2 1 2 2:

evaluate expression in data.table

流过昼夜 提交于 2021-02-10 18:35:16
问题 I'm trying to evaluate a string as a formula: In dplyr it would look like this: dt = data.table(a = 1:10) expr = 'sum(a)' dt %>% mutate(b := !!parse_expr(expr)) However when I try with data.table I'm getting an error: dt[, b := parse_expr(expr)] Error in [.data.table (dt, , := (b, parse_expr(expr))) : RHS of assignment is not NULL, not an an atomic vector (see ?is.atomic) and not a list column. 回答1: Instead of parse_expr , eval(parse can be used dt[, b := eval(parse(text = expr))] Or wrap