Apache Spark MLlib Model File Format

前端 未结 1 746
盖世英雄少女心
盖世英雄少女心 2020-12-10 17:40

Apache Spark MLlib algorithms (e.g., Decision Trees) save the model in a location (e.g., myModelPath) where it creates two directories, viz. myModelPath/d

相关标签:
1条回答
  • 2020-12-10 18:29

    Spark >= 2.4

    Since Spark 2.4 provides format agnostic writer interfaces and selected models already implement these. For example LinearRegressionModel:

    val lrm: org.apache.spark.ml.regression.LinearRegressionModel = ???
    val path: String = ???
    
    lrm.write.format("pmml").save(path)
    

    will create a directory with a single file containing PMML representation.

    Spark < 2.4

    What are the format of these files?

    • data/*.parquet files are in Apache Parquet columnar storage format
    • metadata/part-* looks like JSON

    Which file/files contain actual model?

    • model/*.parquet

    Can I save the model to somewhere else, for example in a DB?

    I am not aware of any direct method but you can load model as a data frame and store it in a database afterwards:

    val modelDf = spark.read.parquet("/path/to/data/")
    modelDf.write.jdbc(...)
    
    0 讨论(0)
提交回复
热议问题