I have a number of columns that I would like to remove from a data frame. I know that we can delete them individually using something like:
df$x <- NULL
<
If you want remove the columns by reference and avoid the internal copying associated with data.frames
then you can use the data.table
package and the function :=
You can pass a character vector names to the left hand side of the :=
operator, and NULL
as the RHS.
library(data.table)
df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
DT <- data.table(df)
# or more simply DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10) #
DT[, c('a','b') := NULL]
If you want to predefine the names as as character vector outside the call to [
, wrap the name of the object in ()
or {}
to force the LHS to be evaluated in the calling scope not as a name within the scope of DT
.
del <- c('a','b')
DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
DT[, (del) := NULL]
DT <- <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
DT[, {del} := NULL]
# force or `c` would also work.
You can also use set
, which avoids the overhead of [.data.table
, and also works for data.frames
!
df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
DT <- data.table(df)
# drop `a` from df (no copying involved)
set(df, j = 'a', value = NULL)
# drop `b` from DT (no copying involved)
set(DT, j = 'b', value = NULL)