Remove rows with all or some NAs (missing values) in data.frame

后端 未结 16 1644
日久生厌
日久生厌 2020-11-21 05:49

I\'d like to remove the lines in this data frame that:

a) contain NAs across all columns. Below is my example data frame.



        
相关标签:
16条回答
  • 2020-11-21 06:00

    We can also use the subset function for this.

    finalData<-subset(data,!(is.na(data["mmul"]) | is.na(data["rnor"])))
    

    This will give only those rows that do not have NA in both mmul and rnor

    0 讨论(0)
  • 2020-11-21 06:03

    tidyr has a new function drop_na:

    library(tidyr)
    df %>% drop_na()
    #              gene hsap mmul mmus rnor cfam
    # 2 ENSG00000199674    0    2    2    2    2
    # 6 ENSG00000221312    0    1    2    3    2
    df %>% drop_na(rnor, cfam)
    #              gene hsap mmul mmus rnor cfam
    # 2 ENSG00000199674    0    2    2    2    2
    # 4 ENSG00000207604    0   NA   NA    1    2
    # 6 ENSG00000221312    0    1    2    3    2
    
    0 讨论(0)
  • 2020-11-21 06:05

    Try na.omit(your.data.frame). As for the second question, try posting it as another question (for clarity).

    0 讨论(0)
  • 2020-11-21 06:06

    I prefer following way to check whether rows contain any NAs:

    row.has.na <- apply(final, 1, function(x){any(is.na(x))})
    

    This returns logical vector with values denoting whether there is any NA in a row. You can use it to see how many rows you'll have to drop:

    sum(row.has.na)
    

    and eventually drop them

    final.filtered <- final[!row.has.na,]
    

    For filtering rows with certain part of NAs it becomes a little trickier (for example, you can feed 'final[,5:6]' to 'apply'). Generally, Joris Meys' solution seems to be more elegant.

    0 讨论(0)
  • 2020-11-21 06:06

    Using dplyr package we can filter NA as follows:

    dplyr::filter(df,  !is.na(columnname))
    
    0 讨论(0)
  • 2020-11-21 06:07

    Also check complete.cases :

    > final[complete.cases(final), ]
                 gene hsap mmul mmus rnor cfam
    2 ENSG00000199674    0    2    2    2    2
    6 ENSG00000221312    0    1    2    3    2
    

    na.omit is nicer for just removing all NA's. complete.cases allows partial selection by including only certain columns of the dataframe:

    > final[complete.cases(final[ , 5:6]),]
                 gene hsap mmul mmus rnor cfam
    2 ENSG00000199674    0    2    2    2    2
    4 ENSG00000207604    0   NA   NA    1    2
    6 ENSG00000221312    0    1    2    3    2
    

    Your solution can't work. If you insist on using is.na, then you have to do something like:

    > final[rowSums(is.na(final[ , 5:6])) == 0, ]
                 gene hsap mmul mmus rnor cfam
    2 ENSG00000199674    0    2    2    2    2
    4 ENSG00000207604    0   NA   NA    1    2
    6 ENSG00000221312    0    1    2    3    2
    

    but using complete.cases is quite a lot more clear, and faster.

    0 讨论(0)
提交回复
热议问题