Separate columns with constant numbers and condense them to one row in R data.frame

后端未结

关注

 4  495

I have a data.frame called d. In this data.frame, some columns consist of constant numbers across the rows of the first column: study.name (see bel

相关标签:

4条回答

孤独总比滥情好

2020-12-11 14:03

You could try something like this, though it feels a bit clumsy. Basically, check which columns have constant values by group for all groups, keep only those columns, and then keep only unique values (since now they are constant by group).

d <- read.table(header = TRUE,
text = "study.name ESL prof scope type ESL.1 prof.1 scope.1 type.1
Shin.Ellis   1    2     1    1     1      2       1      1
Shin.Ellis   1    2     1    1     1      2       1      1
Shin.Ellis   1    2     1    2     1      2       1      1
Shin.Ellis   1    2     1    2     1      2       1      1
Shin.Ellis   1    2    NA   NA     1      2      NA     NA
Shin.Ellis   1    2    NA   NA     1      2      NA     NA
  Trus.Hsu   2    2     2    1     2      2       1      1
  Trus.Hsu   2    2    NA   NA     2      2      NA     NA")

is_constant <- function(x) length(unique(x)) == 1L

keep_constants <- function(df, group_col) {
  data_cols <- colnames(df)[setdiff(1:ncol(df), group_col)]
  check_df <- aggregate(df, by = list(df[[group_col]]), FUN = is_constant)
  cols_to_keep <- sapply(check_df[, -1], all)
  unique(df[, cols_to_keep])
}

keep_constants(d, 1)
#>   study.name ESL prof ESL.1 prof.1
#> 1 Shin.Ellis   1    2     1      2
#> 7   Trus.Hsu   2    2     2      2

^{Created on 2019-10-09 by the reprex package (v0.3.0)}

0 讨论(0)

情书的邮戳

2020-12-11 14:03

If you just want to remove repeated values across all columns unique() is base R

unique(d)

EDIT - Thanks for the clarification @CalumYou - I think this is what OP is looking for in base R.

is_constant = lapply(split(d, d$study.name), function(data){
  unlist(lapply(data,function(col){
    length(unique(col)) == 1
  }))
})
is_constant = as.data.frame(do.call(rbind, is_constant))
all_constant = d[,unlist(lapply(is_constant,all))]
all_constant = unique(all_constant)

0 讨论(0)

孤城傲影

2020-12-11 14:16
May be we need
```
library(dplyr)
d %>%
   group_by(study.name) %>%
   slice(1)
```
Or in base R after grouping by 'study.name', get the first row while specifying the na.action = NULL as the default option is na.omit which can omit any row having NA in any of the columns
```
aggregate(.~ study.name, d, head, 1, na.action = NULL)
```
If we want to subset the columns
```
nm1 <- names(which(!colSums(!do.call(rbind, by(d[-1], d$study.name,
     FUN = function(x) lengths(sapply(x, unique)) == 1)))))
unique(d[c("study.name", nm1)])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

无人及你

2020-12-11 14:25

d_list <- lapply(split(d,d$study.name), 
                 #Find columns with similar values using sapply and length(unique(cols)) 
                 #then get the 1st row
                 function(x) x[1, sapply(x,function(y) length(unique(y))==1)])
do.call('rbind.data.frame',d_list)

           study.name ESL prof ESL.1 prof.1
Bit.KnoA     Bit.KnoA   1    3     1      3
Bit.KnoB     Bit.KnoB   1    2     1      2
ChandlerA   ChandlerA   1    2     1      2
Mubarak       Mubarak   2   NA     2     NA
SheenA         SheenA   1    2     1      2
Shin.Ellis Shin.Ellis   1    2     1      2
Sun               Sun   2    2     2      2
Trus.Hsu     Trus.Hsu   2    2     2      2

0 讨论(0)