Separate columns with constant numbers and condense them to one row in R data.frame

后端 未结 4 470
醉酒成梦
醉酒成梦 2020-12-11 13:46

I have a data.frame called d. In this data.frame, some columns consist of constant numbers across the rows of the first column: study.name (see bel

相关标签:
4条回答
  • 2020-12-11 14:03

    You could try something like this, though it feels a bit clumsy. Basically, check which columns have constant values by group for all groups, keep only those columns, and then keep only unique values (since now they are constant by group).

    d <- read.table(header = TRUE,
    text = "study.name ESL prof scope type ESL.1 prof.1 scope.1 type.1
    Shin.Ellis   1    2     1    1     1      2       1      1
    Shin.Ellis   1    2     1    1     1      2       1      1
    Shin.Ellis   1    2     1    2     1      2       1      1
    Shin.Ellis   1    2     1    2     1      2       1      1
    Shin.Ellis   1    2    NA   NA     1      2      NA     NA
    Shin.Ellis   1    2    NA   NA     1      2      NA     NA
      Trus.Hsu   2    2     2    1     2      2       1      1
      Trus.Hsu   2    2    NA   NA     2      2      NA     NA")
    
    is_constant <- function(x) length(unique(x)) == 1L
    
    keep_constants <- function(df, group_col) {
      data_cols <- colnames(df)[setdiff(1:ncol(df), group_col)]
      check_df <- aggregate(df, by = list(df[[group_col]]), FUN = is_constant)
      cols_to_keep <- sapply(check_df[, -1], all)
      unique(df[, cols_to_keep])
    }
    
    keep_constants(d, 1)
    #>   study.name ESL prof ESL.1 prof.1
    #> 1 Shin.Ellis   1    2     1      2
    #> 7   Trus.Hsu   2    2     2      2
    

    Created on 2019-10-09 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2020-12-11 14:03

    If you just want to remove repeated values across all columns unique() is base R

    unique(d)
    

    EDIT - Thanks for the clarification @CalumYou - I think this is what OP is looking for in base R.

    is_constant = lapply(split(d, d$study.name), function(data){
      unlist(lapply(data,function(col){
        length(unique(col)) == 1
      }))
    })
    is_constant = as.data.frame(do.call(rbind, is_constant))
    all_constant = d[,unlist(lapply(is_constant,all))]
    all_constant = unique(all_constant)
    
    0 讨论(0)
  • 2020-12-11 14:16

    May be we need

    library(dplyr)
    d %>%
       group_by(study.name) %>%
       slice(1)
    

    Or in base R after grouping by 'study.name', get the first row while specifying the na.action = NULL as the default option is na.omit which can omit any row having NA in any of the columns

    aggregate(.~ study.name, d, head, 1, na.action = NULL)
    

    If we want to subset the columns

    nm1 <- names(which(!colSums(!do.call(rbind, by(d[-1], d$study.name,
         FUN = function(x) lengths(sapply(x, unique)) == 1)))))
    unique(d[c("study.name", nm1)])
    
    0 讨论(0)
  • 2020-12-11 14:25
    d_list <- lapply(split(d,d$study.name), 
                     #Find columns with similar values using sapply and length(unique(cols)) 
                     #then get the 1st row
                     function(x) x[1, sapply(x,function(y) length(unique(y))==1)])
    do.call('rbind.data.frame',d_list)
    
               study.name ESL prof ESL.1 prof.1
    Bit.KnoA     Bit.KnoA   1    3     1      3
    Bit.KnoB     Bit.KnoB   1    2     1      2
    ChandlerA   ChandlerA   1    2     1      2
    Mubarak       Mubarak   2   NA     2     NA
    SheenA         SheenA   1    2     1      2
    Shin.Ellis Shin.Ellis   1    2     1      2
    Sun               Sun   2    2     2      2
    Trus.Hsu     Trus.Hsu   2    2     2      2
    
    0 讨论(0)
提交回复
热议问题