hyperparameters

Pyspark - Get all parameters of models created with ParamGridBuilder

阅读更多关于 Pyspark - Get all parameters of models created with ParamGridBuilder

I'm using PySpark 2.0 for a Kaggle competition. I'd like to know the behavior of a model ( RandomForest ) depending on different parameters. ParamGridBuilder() allows to specify different values for a single parameters, and then perform (I guess) a Cartesian product of the entire set of parameters. Assuming my DataFrame is already defined: rdc = RandomForestClassifier() pipeline = Pipeline(stages=STAGES + [rdc]) paramGrid = ParamGridBuilder().addGrid(rdc.maxDepth, [3, 10, 20]) .addGrid(rdc.minInfoGain, [0.01, 0.001]) .addGrid(rdc.numTrees, [5, 10, 20, 30]) .build() evaluator =

What is a good range of values for the svm.SVC() hyperparameters to be explored via GridSearchCV()?

阅读更多关于 What is a good range of values for the svm.SVC() hyperparameters to be explored via GridSearchCV()?

问题 I am running into the problem that the hyperparameters of my svm.SVC() are too wide such that the GridSearchCV() never gets completed! One idea is to use RandomizedSearchCV() instead. But again, my dataset is relative big such that 500 iterations take about 1 hour! My question is, what is a good set-up (in terms of the range of values for each hyperparameter) in GridSearchCV ( or RandomizedSearchCV ) in order to stop wasting resources? In other words, how to decide whether or not e.g. C

订阅 hyperparameters