问题
I just want to be sure this is not about my code but it needs to be fixed in the relevant Python package. (By the way, does this look like something I can manually patch even before the vendor ships an update?) I was using scikit-learn-0.15b1 which called these. Thanks!
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Anaconda\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "C:\Anaconda\lib\multiprocessing\forking.py", line 495, in prepare
'__parents_main__', file, path_name, etc
File "H:\Documents\GitHub\health_wealth\code\controls\lasso\scikit_notreat_predictors.py", line 36, in <module>
gs.fit(X_train, y_train)
File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 597, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 379, in _fit
for parameters in parameter_iterable
File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 604, in __call__
self._pool = MemmapingPool(n_jobs, **poolargs)
File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 559, in __init__
super(MemmapingPool, self).__init__(**poolargs)
File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 400, in __init__
super(PicklingPool, self).__init__(**poolargs)
File "C:\Anaconda\lib\multiprocessing\pool.py", line 159, in __init__
self._repopulate_pool()
File "C:\Anaconda\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
w.start()
File "C:\Anaconda\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Anaconda\lib\multiprocessing\forking.py", line 258, in __init__
cmd = get_command_line() + [rhandle]
File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
is not going to be frozen to produce a Windows executable.''')
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
UPDATE: Here is my edited script, but it still leads to the exact same error after it spawned the processes for GridSearchCV. Actually, quite some after the command reported how many folds and fits it will do, but other than that I don't know when it crashes. Shall I put freeze_support somewhere else?
import scipy as sp
import numpy as np
import pandas as pd
import multiprocessing as mp
if __name__=='__main__':
mp.freeze_support()
print("Started.")
# n = 10**6
# notreatadapter = iopro.text_adapter('S:/data/controls/notreat.csv', parser='csv')
# X = notreatadapter[1:][0:n]
# y = notreatadapter[0][0:n]
notreatdata = pd.read_stata('S:/data/controls/notreat.dta')
X = notreatdata.iloc[:,1:]
y = notreatdata.iloc[:,0]
n = y.shape[0]
print("Data lodaded.")
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)
print("Data split.")
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train) # Don't cheat - fit only on training data
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test) # apply same transformation to test data
print("Data scaled.")
# build a model
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True)
#model.fit(X,y)
print("CV starts.")
from sklearn import grid_search
# run grid search
param_grid = [{'alpha' : 10.0**-np.arange(1,7),'l1_ratio':[.05, .15, .5, .7, .9, .95, .99, 1]}]
gs = grid_search.GridSearchCV(model,param_grid,n_jobs=8,verbose=1)
gs.fit(X_train, y_train)
print("Scores for alphas:")
print(gs.grid_scores_)
print("Best estimator:")
print(gs.best_estimator_)
print("Best score:")
print(gs.best_score_)
print("Best parameters:")
print(gs.best_params_)
回答1:
This probably means that you are on Windows and you have forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
回答2:
You can find information related to multiprocessing here in point 16.6.2.3.
So, a working example would be:
from multiprocessing import Process, freeze_support
def f():
print 'hello world!'
if __name__ == '__main__':
freeze_support()
Process(target=f).start()
来源:https://stackoverflow.com/questions/24363300/freeze-support-bug-in-using-scikit-learn-in-the-anaconda-python-distro