问题
I've got a data frame like this one
1 1 1 K 1 K K
2 1 2 K 1 K K
3 8 3 K 1 K K
4 8 2 K 1 K K
1 1 1 K 1 K K
2 1 2 K 1 K K
I want to remove all the columns with the same value, i.e K, so my result will be like this
1 1 1 1
2 1 2 1
3 8 3 1
4 8 2 1
1 1 1 1
2 1 2 1
I try to iterate in a for by columns but I didn't get anything. Any ideas?
回答1:
To select columns with more than one value regardless of type:
uniquelength <- sapply(d,function(x) length(unique(x)))
d <- subset(d, select=uniquelength>1)
?
(Oops, Roman's question is right -- this could knock out your column 5 as well)
Maybe (edit: thanks to comments!)
isfac <- sapply(d,inherits,"factor")
d <- subset(d,select=!isfac | uniquelength>1)
or
d <- d[,!isfac | uniquelength>1]
回答2:
Here's a solution that'll work to remove any replicated columns (including, e.g., pairs of replicated character, numeric, or factor columns). That's how I read the OP's question, and even if it's a misreading, it seems like an interesting question as well.
df <- read.table(text="
1 1 1 K 1 K K
2 1 2 K 1 K K
3 8 3 K 1 K K
4 8 2 K 1 K K
1 1 1 K 1 K K
2 1 2 K 1 K K")
# Need to run duplicated() in 'both directions', since it considers
# the first example to be **not** a duplicate.
repdCols <- as.logical(duplicated(as.list(df), fromLast=FALSE) +
duplicated(as.list(df), fromLast=TRUE))
# [1] FALSE FALSE FALSE TRUE FALSE TRUE TRUE
df[!repdCols]
# V1 V2 V3 V5
# 1 1 1 1 1
# 2 2 1 2 1
# 3 3 8 3 1
# 4 4 8 2 1
# 5 1 1 1 1
# 6 2 1 2 1
回答3:
Another way to do this is using the higher order function Filter
. Here is the code
to_keep <- function(x) any(is.numeric(x), length(unique(x)) > 1)
Filter(to_keep, d)
回答4:
Oneliner solution.
df2 <- df[sapply(df, function(x) !is.factor(x) | length(unique(x))>1 )]
来源:https://stackoverflow.com/questions/8388417/remove-columns-with-same-value-from-a-dataframe