Guys I\'m new to this language ,I\'m running cluster analysis on a data frame but when I calculate the distance I get this warning \"NAs introduced by coercion\". What does
It's that first column that creates the issue:
> a <- c("1", "2",letters[1:5], "3")
> as.numeric(a)
[1] 1 2 NA NA NA NA NA 3
Warning message:
NAs introduced by coercion
Inside dist
there must be a coercion to numeric, which generates the NA as above.
I'd suggestion to apply dist
without the first column or better move that to rownames
if possible, because the result will be different:
> dist(df)
1 2 3 4
2 1.8842186
3 1.9262360 1.2856110
4 3.2137871 1.7322788 2.9838920
5 1.3299455 0.9872963 1.9158079 1.8889050
Warning message:
In dist(df) : NAs introduced by coercion
> dist(df[-1])
1 2 3 4
2 1.538458
3 1.572765 1.049697
4 2.624046 1.414400 2.436338
5 1.085896 0.806124 1.564251 1.542284
btw: you don't need as.matrix
when calling dist
. It'll do that anyway internally.
EDIT: using rownames
rownames(df) <- df$id
> df
id var1 var2
A A -0.6264538 -0.8204684
B B 0.1836433 0.4874291
C C -0.8356286 0.7383247
D D 1.5952808 0.5757814
E E 0.3295078 -0.3053884
> dist(df[-1]) # you colud also remove the 1st col at all, using df$id <- NULL.
A B C D
B 1.538458
C 1.572765 1.049697
D 2.624046 1.414400 2.436338
E 1.085896 0.806124 1.564251 1.542284