问题
I am looking to use R's expand.grid
to comprehensively enumerate and investigate options for hierarchical clustering analysis. I have a final function acc
which will take a matrix and analyse it for performance measures like accuracy, precision, F1 etc., returning a named list (with accuracy, F1, etc.): the ultimate output I'm looking for is a table where all the hyperparameter combinations are listed and, in columns next to them, the different performance measures (accuracy, F1,...).
The table of combinations can be set up for example with
hyperparams = expand.grid(meths=c("ward.D","ward.D2","single","complete","average","mcquitty","median","centroid"), dists=c("euclidean", "maximum", "manhattan", "canberra", "binary","minkowski"))
Next we would compare to known labels and get the accuracy, wrapping in a number of functions, which I've tried to omit for brevity (like cutree
):
t1 = table(df$Group, hclust(dist(df[-1],method="euclidean"), method="complete"))
Res1 = acc(t1)
The goal is to vary the method
argument for dist
across those listed in my dists
, and the method
argument for hclust
across those listed in my meths
.
In the final line, recall that I've written acc
, which will take a matrix and output a named list of accuracy, precision, F1,... which I'd like each on a column of a final table, whose rows are the hyperparameter combinations in hyperparams
.
Now, my first issue is, I'm not sure how to use unlist
in a way that will cover all the options above. I'm pretty sure it's the right function but just not sure how to do it. And I also want to create the table without a for-loop, i.e. using apply or something like that (I guess applying along the rows of hyperparams
?...), since I know such solutions are generally better in R.
As suggested, the final desired output would be, effectively, hyperparams
but as a data-frame with additional columns, the third column containing accuracy, fourth containing precision, etc (the measures listed out in my function acc
). Can anyone inform me how to get there?
If you want something to play with for acc
, we could use
first = sum(x)
second = sum(x^2)
return(list(First=first,Second=second))
and the final output table would be the two hyperparameter columns followed by a column for First
(sum of elements in the final confusion matrix, for the hyperparameter combo corresponding to that row) and Second
(sum of elements^2 in the final confusion matrix). Just a hypothetical example in case you like to work with given functions.
I'd really prefer solutions in base R! (Or dplyr if absolutely necessary)
Edit: OK, many people are asking for a df
. Let's use iris
, but of course if we want output we can't avoid some of the intermediate functions, like cutree
.
Now with iris
, you could run
contingtab1 = table(iris$Species, cutree(hclust(dist(iris[,1:4],method="euclidean"),method="complete"),3))
That gives a contingency table. Passing this into acc
would give one row of the desired output (the row corresponding to euclidean
and complete
. The desired output would then look like hyperparams
with each of the two current columns followed by (say) two more columns, one for each of my two performance measures in acc
.
回答1:
We can use Map
in base R
Map(function(x, y) acc(hclust(dist(df[-1],method = x), method = y),
hyperparams[[1]], hyperparams[[2]])
回答2:
One approach might be map2
from purrr
library(purrr)
map2(hyperparams$meths, hyperparams$dists,
~ acc(hclust(dist(df[-1],method = .x), method = .y)))
来源:https://stackoverflow.com/questions/61687341/expand-grid-in-r-with-unlist-and-apply