hyperparameters

Pyspark - Get all parameters of models created with ParamGridBuilder

北城余情 提交于 2019-11-30 14:57:08
I'm using PySpark 2.0 for a Kaggle competition. I'd like to know the behavior of a model ( RandomForest ) depending on different parameters. ParamGridBuilder() allows to specify different values for a single parameters, and then perform (I guess) a Cartesian product of the entire set of parameters. Assuming my DataFrame is already defined: rdc = RandomForestClassifier() pipeline = Pipeline(stages=STAGES + [rdc]) paramGrid = ParamGridBuilder().addGrid(rdc.maxDepth, [3, 10, 20]) .addGrid(rdc.minInfoGain, [0.01, 0.001]) .addGrid(rdc.numTrees, [5, 10, 20, 30]) .build() evaluator =

What is a good range of values for the svm.SVC() hyperparameters to be explored via GridSearchCV()?

纵然是瞬间 提交于 2019-11-29 02:29:27
问题 I am running into the problem that the hyperparameters of my svm.SVC() are too wide such that the GridSearchCV() never gets completed! One idea is to use RandomizedSearchCV() instead. But again, my dataset is relative big such that 500 iterations take about 1 hour! My question is, what is a good set-up (in terms of the range of values for each hyperparameter) in GridSearchCV ( or RandomizedSearchCV ) in order to stop wasting resources? In other words, how to decide whether or not e.g. C