data.table | 易学教程

Error when trying to store list in data.table of length 1

阅读更多关于 Error when trying to store list in data.table of length 1

问题 When trying to store a vector in a data.table, this works only when the data.table has length of more than one. Please find below a simplified version of the problem library(data.table) Working fine dt <- data.table( a = c("a", "b"), l = list()) dt$l[[1]] <- c(1:3) Results in: a l 1: a 1,2,3 2: b Producing Error dt <- data.table( a = c("a"), l = list()) dt$l[[1]] <- c(1:3) Error in [<-.data.table(x, j = name, value = value) : Supplied 3 items to be assigned to 1 items of column 'l'. The RHS

Filtering observations by using grep the reverse way in R

阅读更多关于 Filtering observations by using grep the reverse way in R

问题 Shown as below: df <- data.frame(X1 = rep(letters[1:3],3), X2 = 1:9, X3 = sample(1:50,9)) df ind<- grep("a|c", df$X1) library(data.table) df_ac <- df[ind,] df_b <- df[!ind,] df_ac is created using the regular grep command. If I want to use the grep the reverse way: to select all observations with X1 == 'b' . I know I can do this by: ind2<- grep("a|c", df$X1, invert = T) df_b <-df[ind2,] But, in my original script, why does the command df_b <-df[!ind,] return a data frame with zero observation

Filtering observations by using grep the reverse way in R

阅读更多关于 Filtering observations by using grep the reverse way in R

New behavior in data.table? .N / something with `by` (calculate proportion)

阅读更多关于 New behavior in data.table? .N / something with `by` (calculate proportion)

问题 I updated to the latest version of data.table - 1.9.4, from a medium-recent prior version (I think 1.8.X), and now I'm getting some unexpected behavior. set.seed(12312014) # a vector of letters a:e, each repeated between 1 and 10 times type <- unlist(mapply(rep, letters[1:5], round(runif(5, 1, 10), 0))) # a random vector of 3 categories category <- sample(c('small', 'med', 'large'), length(type), replace=T) my_dt <- data.table(type, category) Say I want the proportion of category by type. I

New behavior in data.table? .N / something with `by` (calculate proportion)

阅读更多关于 New behavior in data.table? .N / something with `by` (calculate proportion)

data.table calculate sums by two variables and add observations for “empty” groups

阅读更多关于 data.table calculate sums by two variables and add observations for “empty” groups

问题 Sorry for the bad title - I am trying to achieve the following: I have a data.table dt with two categorical variables "a" and "b". As you can see, a has 5 unique values and b has three. Now e.g. the combination of categorical variables ("a = 1" and "b = 3") is not in the data. library(data.table) set.seed(1) a <- sample(1:5, 10, replace = TRUE) b <- sample(1:3, 10, replace = TRUE) y <- rnorm(10) dt <- data.table(a = a, b = b, y = y) dt[order(a, b), .N, by = c("a", "b")] # a b N #1: 1 1 2 #2:

How to filter rows out of data.table where any column is NA without specifying columns individually

阅读更多关于 How to filter rows out of data.table where any column is NA without specifying columns individually

问题 Given a data.table DT<-data.table(a=c(1,2,NA,4,5), b=c(2,3,4,NA,5),c=c(1,2,3,4,5),d=c(2,3,4,5,6)) how can I do the equivalent of DT[!is.na(a) & !is.na(b) & !is.na(c) & !is.na(d)] in a general form without knowing any of the column names or typing out the !is.na() for each individual column. I could also do DT[apply(DT,1,function(x) !any(is.na(x)))] but I'm wondering if there's a better way still. 回答1: I think you are looking for complete.cases : > DT[complete.cases(DT),] a b c d 1: 1 2 1 2 2:

How to filter rows out of data.table where any column is NA without specifying columns individually

阅读更多关于 How to filter rows out of data.table where any column is NA without specifying columns individually

How to filter rows out of data.table where any column is NA without specifying columns individually

阅读更多关于 How to filter rows out of data.table where any column is NA without specifying columns individually

evaluate expression in data.table

阅读更多关于 evaluate expression in data.table

问题 I'm trying to evaluate a string as a formula: In dplyr it would look like this: dt = data.table(a = 1:10) expr = 'sum(a)' dt %>% mutate(b := !!parse_expr(expr)) However when I try with data.table I'm getting an error: dt[, b := parse_expr(expr)] Error in [.data.table (dt, , := (b, parse_expr(expr))) : RHS of assignment is not NULL, not an an atomic vector (see ?is.atomic) and not a list column. 回答1: Instead of parse_expr , eval(parse can be used dt[, b := eval(parse(text = expr))] Or wrap