I have a roster of employees, and I need to know at what department they are in most often. It is trivial to tabulate employee ID against department name, but it is trickier
One solution could be to reshape the date from wide to long putting all the departments in one column and counts in another, group by the employer id (in this case, the row number), and then filter to the department(s) with the max value. There are a couple of options for handling ties with this approach too.
library(tidyverse)
# sample data frame with a tie
df <- data_frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,5))
# If you aren't worried about ties:
df %>%
rownames_to_column('id') %>% # creates an ID number
gather(dept, cnt, V1:V3) %>%
group_by(id) %>%
slice(which.max(cnt))
# A tibble: 3 x 3
# Groups: id [3]
id dept cnt
<chr> <chr> <dbl>
1 1 V3 9.
2 2 V1 8.
3 3 V2 5.
# If you're worried about keeping ties:
df %>%
rownames_to_column('id') %>%
gather(dept, cnt, V1:V3) %>%
group_by(id) %>%
filter(cnt == max(cnt)) %>% # top_n(cnt, n = 1) also works
arrange(id)
# A tibble: 4 x 3
# Groups: id [3]
id dept cnt
<chr> <chr> <dbl>
1 1 V3 9.
2 2 V1 8.
3 3 V2 5.
4 3 V3 5.
# If you're worried about ties, but only want a certain department, you could use rank() and choose 'first' or 'last'
df %>%
rownames_to_column('id') %>%
gather(dept, cnt, V1:V3) %>%
group_by(id) %>%
mutate(dept_rank = rank(-cnt, ties.method = "first")) %>% # or 'last'
filter(dept_rank == 1) %>%
select(-dept_rank)
# A tibble: 3 x 3
# Groups: id [3]
id dept cnt
<chr> <chr> <dbl>
1 2 V1 8.
2 3 V2 5.
3 1 V3 9.
# if you wanted to keep the original wide data frame
df %>%
rownames_to_column('id') %>%
left_join(
df %>%
rownames_to_column('id') %>%
gather(max_dept, max_cnt, V1:V3) %>%
group_by(id) %>%
slice(which.max(max_cnt)),
by = 'id'
)
# A tibble: 3 x 6
id V1 V2 V3 max_dept max_cnt
<chr> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 2. 7. 9. V3 9.
2 2 8. 3. 6. V1 8.
3 3 1. 5. 5. V2 5.
A simple for
loop can also be handy:
> df<-data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
> df
V1 V2 V3
1 2 7 9
2 8 3 6
3 1 5 4
> df2<-data.frame()
> for (i in 1:nrow(df)){
+ df2[i,1]<-colnames(df[which.max(df[i,])])
+ }
> df2
V1
1 V3
2 V1
3 V2