I have a spark.ml pipeline in Spark 1.5.1 which consists of a series of transformers followed by a k-means estimator. I want to be able to access the KMeansModel.clusterCenters
Answering my own question...I finally stumbled on an example deep in the spark.ml docs that shows how to do this using the stages
member of the PipelineModel
class. So for the example I posted above, in order to access the k-means cluster centers, do:
val centers = fitKmeans.stages(2).asInstanceOf[KMeansModel].clusterCenters
where fitKmeans
is a PipelineModel and 2
is the index of the k-means model in the array of pipeline stages.
Reference: the last line of most of the examples on this page.