What is the right way to save\load models in Spark\PySpark

后端 未结 4 992
天命终不由人
天命终不由人 2021-02-04 08:02

I\'m working with Spark 1.3.0 using PySpark and MLlib and I need to save and load my models. I use code like this (taken from the official documentation )

from p         


        
相关标签:
4条回答
  • 2021-02-04 08:16

    I run into this also -- it looks like a bug. I have reported to spark jira.

    0 讨论(0)
  • 2021-02-04 08:31

    Use pipeline in ML to train the model, and then use MLWriter and MLReader to save models and read them back.

    from pyspark.ml import Pipeline
    from pyspark.ml import PipelineModel
    
    pipeTrain.write().overwrite().save(outpath)
    model_in = PipelineModel.load(outpath)
    
    0 讨论(0)
  • 2021-02-04 08:35

    One way to save a model (in Scala; but probably is similar in Python):

    // persist model to HDFS
    sc.parallelize(Seq(model), 1).saveAsObjectFile("linReg.model")
    

    Saved model can then be loaded as:

    val linRegModel = sc.objectFile[LinearRegressionModel]("linReg.model").first()
    

    See also related question

    For more details see (ref)

    0 讨论(0)
  • 2021-02-04 08:38

    As of this pull request merged on Mar 28, 2015 (a day after your question was last edited) this issue has been resolved.

    You just need to clone/fetch the latest version from GitHub (git clone git://github.com/apache/spark.git -b branch-1.3) then build it (following the instructions in spark/README.md) with $ mvn -DskipTests clean package.

    Note: I ran into trouble building Spark because Maven was being wonky. I resolved that issue by using $ update-alternatives --config mvn and selecting the 'path' that had Priority: 150, whatever that means. Explanation here.

    0 讨论(0)
提交回复
热议问题