I have a number of columns that I would like to remove from a data frame. I know that we can delete them individually using something like:
df$x <- NULL
<
within(df, rm(x))
is probably easiest, or for multiple variables:
within(df, rm(x, y))
Or if you're dealing with data.table
s (per How do you delete a column by name in data.table?):
dt[, x := NULL] # Deletes column x by reference instantly.
dt[, !"x"] # Selects all but x into a new data.table.
or for multiple variables
dt[, c("x","y") := NULL]
dt[, !c("x", "y")]
You could use %in%
like this:
df[, !(colnames(df) %in% c("x","bar","foo"))]
There's a function called dropNamed()
in Bernd Bischl's BBmisc
package that does exactly this.
BBmisc::dropNamed(df, "x")
The advantage is that it avoids repeating the data frame argument and thus is suitable for piping in magrittr
(just like the dplyr
approaches):
df %>% BBmisc::dropNamed("x")
Here is a dplyr
way to go about it:
#df[ -c(1,3:6, 12) ] # original
df.cut <- df %>% select(-col.to.drop.1, -col.to.drop.2, ..., -col.to.drop.6) # with dplyr::select()
I like this because it's intuitive to read & understand without annotation and robust to columns changing position within the data frame. It also follows the vectorized idiom using -
to remove elements.
DF <- data.frame(
x=1:10,
y=10:1,
z=rep(5,10),
a=11:20
)
DF
Output:
x y z a
1 1 10 5 11
2 2 9 5 12
3 3 8 5 13
4 4 7 5 14
5 5 6 5 15
6 6 5 5 16
7 7 4 5 17
8 8 3 5 18
9 9 2 5 19
10 10 1 5 20
DF[c("a","x")] <- list(NULL)
Output:
y z
1 10 5
2 9 5
3 8 5
4 7 5
5 6 5
6 5 5
7 4 5
8 3 5
9 2 5
10 1 5
You can use a simple list of names :
DF <- data.frame(
x=1:10,
y=10:1,
z=rep(5,10),
a=11:20
)
drops <- c("x","z")
DF[ , !(names(DF) %in% drops)]
Or, alternatively, you can make a list of those to keep and refer to them by name :
keeps <- c("y", "a")
DF[keeps]
EDIT :
For those still not acquainted with the drop
argument of the indexing function, if you want to keep one column as a data frame, you do:
keeps <- "y"
DF[ , keeps, drop = FALSE]
drop=TRUE
(or not mentioning it) will drop unnecessary dimensions, and hence return a vector with the values of column y
.