Is it possible to add a covariate (control for a variable of no interest) to an SVM model?

问题

I'm very new to machine learning and python and I'm trying to build a model to predict patients (N=200) vs controls (N=200) form structural neuroimaging data. After the initial preprocessing were I reshaped the neuroimaging data into a 2D array I built the following model:

from sklearn.svm import SVC
svc = SVC(C=1.0, kernel='linear')


from sklearn.grid_search import GridSearchCV
from numpy import range
k_range = np.arange(0.1,10,0.1)
param_grid=dict(C=k_range)
grid=GridSearchCV(svc, param_grid, cv=10, scoring='accuracy')
grid.fit(img,labels)
grid.grid_scores_
print grid.best_score_
print grid.best_params_

This gives me a decent a result but I'd like to control for the fact that different images were acquired with different scanners (e.g. subjects 1 through 150 were scanned with scanner 1, subjects 101 through 300 were scanned with scanner 2 and subjects 301 through 400 were scanned with scanner 3). Is there anyway this could be added to the model above?

I read that doing a previous feature selection might help. However, I don't want to simply extract meaningful features when those features might be related to the scanner. In fact, I want to classify patients and controls NOT based on the scanner (i.e. controlling for scanner).

Any thoughts on this would be appreciated, thank you

回答1:

For diagnostics, you could take a look at how your data is distributed per scanner to see whether this direction you're pursuing is promising. Normalization (e.g., of mean+variance per scanner) can be one option as someone already suggested. Another option is adding 3 additional dimensions to your feature set as a one-hot encoding for the scanner used (i.e., for each example, you have a 1 in the position of the appropriate scanner and 0 for others)

回答2:

To add it to your model, you can your your normalization parameter for each scanner as a feature and include it in your model.

来源：https://stackoverflow.com/questions/37277647/is-it-possible-to-add-a-covariate-control-for-a-variable-of-no-interest-to-an

标签

python

machine-learning

scikit-learn

svm