Tuning parameters for implicit pyspark.ml ALS matrix factorization model through pyspark.ml CrossValidator

前端 未结 3 1556
臣服心动
臣服心动 2020-11-30 01:37

I\'m trying to tune the parameters of an ALS matrix factorization model that uses implicit data. For this, I\'m trying to use pyspark.ml.tuning.CrossValidator to run through

3条回答
  •  有刺的猬
    2020-11-30 01:57

    Very late to the party here, but I'll post in case anyone stumbles upon this question like I did.

    I was getting a similar error when trying to use CrossValidator with an ALS model. I resolved it by setting the coldStartStrategy parameter in ALS to "drop". That is:

    alsImplicit = ALS(implicitPrefs=True, coldStartStrategy="drop")
    

    and keep the rest of the code the same.

    I expect what was happening in my example is that the cross-validation splits created scenarios where I had items in the validation set that did not appear in the training set, which results in NaN prediction values. The best solution is to drop the NaN values when evaluating, as described in the documentation.

    I don't know if we were getting the same error so can't guarantee this would solve OP's problem, but it's good practice to set coldStartStrategy="drop" for cross validation anyway.

    Note: my error message was "Params must be either a param map or a list/tuple of param maps", which didn't seem to imply an issue with the coldStartStrategy parameter or NaN values but this solution resolved the error.

提交回复
热议问题