问题
I am trying to extract the colnames of a data frame, based on the values in the cells. My data is a series of a couple hundred categories, with a simple binary 0 or 1 in the cells to indicate which column name I want in my new df.
To illustrate my point:
year cat1 cat2 cat3 ... catN
2000 0 0 1 0
2001 1 0 0 0
2002 0 0 0 1
....
2018 0 1 0 0
I am trying to get a df like:
year category
2000 cat3
2001 cat1
2002 catN
....
2018 cat2
My code:
newdf <- as.data.frame(colnames(mydf)[which(mydf == "1", arr.ind = TRUE)[2]])
But alas this only returns one category name!
Any help would be greatly appreciated!
回答1:
A base R solution:
Using sapply
to find which are the ones and get the names.
out <- data.frame(year = df1$year, category = names(sapply(df1[, -1], function(x) which(x == 1))))
out
year category
1 2000 cat1
2 2001 cat2
3 2002 cat3
4 2018 catN
data:
df1 <- structure(list(year = c(2000L, 2001L, 2002L, 2018L), cat1 = c(0L,
1L, 0L, 0L), cat2 = c(0L, 0L, 0L, 1L), cat3 = c(1L, 0L, 0L, 0L
), catN = c(0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-4L))
回答2:
A possible solution is this:
library(tidyverse)
df = data.frame(year = 2000:2002,
cat1 = c(0,0,1),
cat2 = c(1,0,0),
cat3 = c(0,1,0))
df %>%
gather(category, value, -year) %>% # reshape data
filter(value == 1) %>% # keep rows with 1s
select(-value) %>% # remove that column
arrange(year) # order that column (if needed)
# year category
# 1 2000 cat2
# 2 2001 cat3
# 3 2002 cat1
回答3:
Another option using max.col
from base R
data.frame(year = dat$year, category = names(dat[-1])[max.col(dat[-1])])
# year category
#1 2000 cat3
#2 2001 cat1
#3 2002 catN
#4 2018 cat2
max.col
finds the positions of the 1
s in your data (first column excluded) and returns a numeric vector of column positions that we use to subset the column names, again first column excluded.
Data as supplied by Rui Barradas.
回答4:
This is basicaly a Reshaping data.frame from wide to long format trick. To do it I will use package reshape2
.
Then select the rows with value == 1
.
result <- reshape2::melt(dat, id.vars = "year")
result <- result[result$value == 1, 1:2]
result <- result[order(result[[1]]), ]
names(result)[2] <- "category"
row.names(result) <- NULL
result
# year category
#1 2000 cat3
#2 2001 cat1
#3 2002 catN
#4 2018 cat2
DATA.
dat <- read.table(text = "
year cat1 cat2 cat3 catN
2000 0 0 1 0
2001 1 0 0 0
2002 0 0 0 1
2018 0 1 0 0
", header = TRUE)
回答5:
A simple solution:
# your data
data <-data.frame(year =c(2016,2017,2018),
cat.1=c(0,1,0),
cat.2=c(0,0,1),
cat.3 =c(1,0,0))
# a nice library
library(reshape2)
# from wide to long
filtered <- melt(data, id = 'year') %>% filter(value>0)
# remove the useless column
filtered <- filtered[,1:2]
> filtered
year variable
1 2017 cat.1
2 2018 cat.2
3 2016 cat.3
回答6:
Yet another base R option using stack
subset(data.frame(year = df$year, stack(df, select = -year)), values == 1)[-2]
# year ind
#2 2001 cat1
#8 2018 cat2
#9 2000 cat3
#15 2002 catN
来源:https://stackoverflow.com/questions/51613634/r-how-to-retrieve-a-column-name-of-a-data-frame