问题
I'm very new to machine learning and python and I'm trying to build a model to predict patients (N=200) vs controls (N=200) form structural neuroimaging data. After the initial preprocessing were I reshaped the neuroimaging data into a 2D array I built the following model:
from sklearn.svm import SVC
svc = SVC(C=1.0, kernel='linear')
from sklearn.grid_search import GridSearchCV
from numpy import range
k_range = np.arange(0.1,10,0.1)
param_grid=dict(C=k_range)
grid=GridSearchCV(svc, param_grid, cv=10, scoring='accuracy')
grid.fit(img,labels)
grid.grid_scores_
print grid.best_score_
print grid.best_params_
This gives me a decent a result but I'd like to control for the fact that different images were acquired with different scanners (e.g. subjects 1 through 150 were scanned with scanner 1, subjects 101 through 300 were scanned with scanner 2 and subjects 301 through 400 were scanned with scanner 3). Is there anyway this could be added to the model above?
I read that doing a previous feature selection might help. However, I don't want to simply extract meaningful features when those features might be related to the scanner. In fact, I want to classify patients and controls NOT based on the scanner (i.e. controlling for scanner).
Any thoughts on this would be appreciated, thank you
回答1:
For diagnostics, you could take a look at how your data is distributed per scanner to see whether this direction you're pursuing is promising. Normalization (e.g., of mean+variance per scanner) can be one option as someone already suggested. Another option is adding 3 additional dimensions to your feature set as a one-hot encoding for the scanner used (i.e., for each example, you have a 1 in the position of the appropriate scanner and 0 for others)
回答2:
To add it to your model, you can your your normalization parameter for each scanner as a feature and include it in your model.
来源:https://stackoverflow.com/questions/37277647/is-it-possible-to-add-a-covariate-control-for-a-variable-of-no-interest-to-an