I want to filter rows from a data.frame
based on a logical condition. Let\'s suppose that I have data frame like
expr_value cell_type
1
To select rows according to one 'cell_type' (e.g. 'hesc'), use ==
:
expr[expr$cell_type == "hesc", ]
To select rows according to two or more different 'cell_type', (e.g. either 'hesc' or 'bj fibroblast'), use %in%
:
expr[expr$cell_type %in% c("hesc", "bj fibroblast"), ]
we can use data.table library
library(data.table)
expr <- data.table(expr)
expr[cell_type == "hesc"]
expr[cell_type %in% c("hesc","fibroblast")]
or filter using %like%
operator for pattern matching
expr[cell_type %like% "hesc"|cell_type %like% "fibroblast"]
Use subset
(for interactive use)
subset(expr, cell_type == "hesc")
subset(expr, cell_type %in% c("bj fibroblast", "hesc"))
or better dplyr::filter()
filter(expr, cell_type %in% c("bj fibroblast", "hesc"))
You could use the dplyr
package:
library(dplyr)
filter(expr, cell_type == "hesc")
filter(expr, cell_type == "hesc" | cell_type == "bj fibroblast")
No one seems to have included the which function. It can also prove useful for filtering.
expr[which(expr$cell == 'hesc'),]
This will also handle NAs and drop them from the resulting dataframe.
Running this on a 9840 by 24 dataframe 50000 times, it seems like the which method has a 60% faster run time than the %in% method.
Sometimes the column you want to filter may appear in a different position than column index 2 or have a variable name.
In this case, you can simply refer the column name you want to filter as:
columnNameToFilter = "cell_type"
expr[expr[[columnNameToFilter]] == "hesc", ]