问题
I'm dealing with some RNA-seq count data for which I have ~60,000 columns containing gene names and 24 rows containing sample names. When I did some gene name conversions I was left with a bunch of columns that are named NA
. I know that R handles NA
differently than a typical column name and my question is how do I remove these columns. Here is an example of my data.
"Gene1" "Gene2" "Gene3" NA "Gene4"
1 10 11 12 10 15
2 13 12 50 40 30
3 34 23 23 21 22
I would like it to end up like
"Gene1" "Gene2" "Gene3" "Gene4"
1 10 11 12 15
2 13 12 50 30
3 34 23 23 22
I did identify some R code that worked for others but not for me
df<-df[, grep("^(NA)", names(df), value = TRUE, invert = TRUE)]
回答1:
Looks like you have an actual NA
in your names, instead of "NA"
. The former represents a missing value, the latter is a character string that looks like the symbol that represents the missing value. Use:
df <- df[!is.na(names(df))]
Illustrating with iris
:
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> names(iris)[2:3] <- NA
> head(iris)
Sepal.Length NA NA Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> head(iris[!is.na(names(iris))])
Sepal.Length Petal.Width Species
1 5.1 0.2 setosa
2 4.9 0.2 setosa
3 4.7 0.2 setosa
4 4.6 0.2 setosa
5 5.0 0.2 setosa
6 5.4 0.4 setosa
来源:https://stackoverflow.com/questions/30743304/removing-columns-named-na