“NAs introduced by coercion” during Cluster Analysis in R

前端 未结 1 894
悲&欢浪女
悲&欢浪女 2020-12-19 02:21

Guys I\'m new to this language ,I\'m running cluster analysis on a data frame but when I calculate the distance I get this warning \"NAs introduced by coercion\". What does

相关标签:
1条回答
  • 2020-12-19 02:39

    It's that first column that creates the issue:

    > a <- c("1", "2",letters[1:5], "3")
    > as.numeric(a)
    [1]  1  2 NA NA NA NA NA  3
    Warning message:
    NAs introduced by coercion 
    

    Inside dist there must be a coercion to numeric, which generates the NA as above.

    I'd suggestion to apply dist without the first column or better move that to rownames if possible, because the result will be different:

    > dist(df)
              1         2         3         4
    2 1.8842186                              
    3 1.9262360 1.2856110                    
    4 3.2137871 1.7322788 2.9838920          
    5 1.3299455 0.9872963 1.9158079 1.8889050
    Warning message:
    In dist(df) : NAs introduced by coercion
    > dist(df[-1])
             1        2        3        4
    2 1.538458                           
    3 1.572765 1.049697                  
    4 2.624046 1.414400 2.436338         
    5 1.085896 0.806124 1.564251 1.542284
    

    btw: you don't need as.matrix when calling dist. It'll do that anyway internally.

    EDIT: using rownames

    rownames(df) <- df$id
    
    > df
      id       var1       var2
    A  A -0.6264538 -0.8204684
    B  B  0.1836433  0.4874291
    C  C -0.8356286  0.7383247
    D  D  1.5952808  0.5757814
    E  E  0.3295078 -0.3053884
    
    > dist(df[-1]) # you colud also remove the 1st col at all, using df$id <- NULL.
             A        B        C        D
    B 1.538458                           
    C 1.572765 1.049697                  
    D 2.624046 1.414400 2.436338         
    E 1.085896 0.806124 1.564251 1.542284
    
    0 讨论(0)
提交回复
热议问题