random-forest | 易学教程

All probability values are less than 0.5 on unseen data

阅读更多关于 All probability values are less than 0.5 on unseen data

问题 I have 15 features with a binary response variable and I am interested in predicting probabilities than 0 or 1 class labels. When I trained and tested the RF model with 500 trees, CV, balanced class weight, and balanced samples in the data frame, I achieved a good amount of accuracy and also good Brier score. As you can see in the image, the predicted probabilities values of class 1 on test data are in between 0 to 1. Here is the Histogram of predicted probabilities on test data: with

Why RandomForestClassifier on CPU (using SKLearn) and on GPU (using RAPIDs) get differents scores, very different?

阅读更多关于 Why RandomForestClassifier on CPU (using SKLearn) and on GPU (using RAPIDs) get differents scores, very different?

问题 I am using RandomForestClassifier on CPU with SKLearn and on GPU using RAPIDs. I am doing a benchmark between these two libraries about speed up and scoring using Iris dataset (it is a try, in the future, I will change the dataset for a better benchmarking, I am starting with these two libraries). The problem is when I measure the score on CPU always get a value of 1.0 but when I try to measure the score on GPU I get a variable value between 0.2 and 1.0 and I do not understand why could be it

R: variable exclusion from formula not working in presence of missing data

阅读更多关于 R: variable exclusion from formula not working in presence of missing data

问题 I'm building a model in R, while excluding 'office' column in the formula (it sometimes contains hints of the class I predict ). I'm learning on 'train' and predicting on 'test': > model <- randomForest::randomForest(tc ~ . - office, data=train, importance=TRUE,proximity=TRUE ) > prediction <- predict(model, test, type = "class") the prediction resulted with all NAs: > head(prediction) [1] <NA> <NA> <NA> <NA> <NA> <NA> Levels: 2668 2752 2921 3005 the reason is that test$office contains NAs: >

sklearn use RandomizedSearchCV with custom metrics and catch Exceptions

阅读更多关于 sklearn use RandomizedSearchCV with custom metrics and catch Exceptions

问题 I am using the RandomizedSearchCV function in sklearn with a Random Forest Classifier. To see different metrics i am using a custom scoring from sklearn.metrics import make_scorer, roc_auc_score, recall_score, matthews_corrcoef, balanced_accuracy_score, accuracy_score acc = make_scorer(accuracy_score) auc_score = make_scorer(roc_auc_score) recall = make_scorer(recall_score) mcc = make_scorer(matthews_corrcoef) bal_acc = make_scorer(balanced_accuracy_score) scoring = {"roc_auc_score": auc

Random forest tree growing algorithm

阅读更多关于 Random forest tree growing algorithm

问题 I'm doing a Random Forest implementation (for classification), and I have some questions regarding the tree growing algorithm mentioned in literature. When training a decision tree, there are 2 criteria to stop growing a tree: a. Stop when there are no more features left to split a node on. b. Stop when the node has all samples in it belonging to the same class. Based on that, 1. Consider growing one tree in the forest. When splitting a node of the tree, I randomly select m of the M total

cforest party unbalanced classes

阅读更多关于 cforest party unbalanced classes

问题 I want to measure the features importance with the cforest function from the party library. My output variable has something like 2000 samples in class 0 and 100 samples in class 1. I think a good way to avoid bias due to class unbalance is to train each tree of the forest using a subsample such that the number of elements of class 1 is the same of the number of element in class 0. Is there anyway to do that? I am thinking to an option like n_samples = c(20, 20) EDIT: An example of code >

How to run a random classifer in the following case

阅读更多关于 How to run a random classifer in the following case

问题 I am trying to experiment with sentiment analysis case and I am trying to run a random classifier for the following: |Topic |value|label| |Apples are great |-0.99|0 | |Balloon is red |-0.98|1 | |cars are running |-0.93|0 | |dear diary |0.8 |1 | |elephant is huge |0.91 |1 | |facebook is great |0.97 |0 | after splitting it into train test from sklearn library, I am doing the following for the Topic column for the count vectoriser to work upon it: x = train.iloc[:,0:2] #except for alphabets

How to run a random classifer in the following case

阅读更多关于 How to run a random classifer in the following case

How can I get the OOB samples used for each tree in random forest model R?

阅读更多关于 How can I get the OOB samples used for each tree in random forest model R?

问题 Is it possible to get the OOB samples used by random forest algorithm for each tree ? I'm using R language. I know that RandomForest algorithm uses almost 66% of the data (selected randomly) to grow up each tree, and 34 % of the data as OOB samples to measure the OOB error, but I don't know how to get those OOB samples for each tree ? Any idea ? 回答1: Assuming you are using the randomForest package, you just need to set the keep.inbag argument to TRUE . library(randomForest) set.seed(1) rf <-

How can I get the OOB samples used for each tree in random forest model R?

阅读更多关于 How can I get the OOB samples used for each tree in random forest model R?