Apache Spark MLlib algorithms (e.g., Decision Trees) save the model in a location (e.g., myModelPath
) where it creates two directories, viz. myModelPath/d
Spark >= 2.4
Since Spark 2.4 provides format agnostic writer interfaces and selected models already implement these. For example LinearRegressionModel
:
val lrm: org.apache.spark.ml.regression.LinearRegressionModel = ???
val path: String = ???
lrm.write.format("pmml").save(path)
will create a directory with a single file containing PMML representation.
Spark < 2.4
What are the format of these files?
data/*.parquet
files are in Apache Parquet columnar storage formatmetadata/part-*
looks like JSON Which file/files contain actual model?
model/*.parquet
Can I save the model to somewhere else, for example in a DB?
I am not aware of any direct method but you can load model as a data frame and store it in a database afterwards:
val modelDf = spark.read.parquet("/path/to/data/")
modelDf.write.jdbc(...)