freeze_support bug in using scikit-learn in the Anaconda python distro?

倾然丶 夕夏残阳落幕 提交于 2020-01-25 18:18:26

问题


I just want to be sure this is not about my code but it needs to be fixed in the relevant Python package. (By the way, does this look like something I can manually patch even before the vendor ships an update?) I was using scikit-learn-0.15b1 which called these. Thanks!

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 380, in main
    prepare(preparation_data)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 495, in prepare
    '__parents_main__', file, path_name, etc
  File "H:\Documents\GitHub\health_wealth\code\controls\lasso\scikit_notreat_predictors.py", line 36, in <module>
    gs.fit(X_train, y_train)
  File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 597, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 379, in _fit
    for parameters in parameter_iterable
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 604, in __call__
    self._pool = MemmapingPool(n_jobs, **poolargs)
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 559, in __init__
    super(MemmapingPool, self).__init__(**poolargs)
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 400, in __init__
    super(PicklingPool, self).__init__(**poolargs)
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 159, in __init__
    self._repopulate_pool()
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
    w.start()
  File "C:\Anaconda\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 258, in __init__
    cmd = get_command_line() + [rhandle]
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
    is not going to be frozen to produce a Windows executable.''')
RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.

UPDATE: Here is my edited script, but it still leads to the exact same error after it spawned the processes for GridSearchCV. Actually, quite some after the command reported how many folds and fits it will do, but other than that I don't know when it crashes. Shall I put freeze_support somewhere else?

import scipy as sp
import numpy as np
import pandas as pd
import multiprocessing as mp

if __name__=='__main__':
    mp.freeze_support()

print("Started.")
# n = 10**6
# notreatadapter = iopro.text_adapter('S:/data/controls/notreat.csv', parser='csv')
# X = notreatadapter[1:][0:n]
# y = notreatadapter[0][0:n]
notreatdata = pd.read_stata('S:/data/controls/notreat.dta')
X = notreatdata.iloc[:,1:]
y = notreatdata.iloc[:,0]
n = y.shape[0]

print("Data lodaded.")
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)

print("Data split.")
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)  # Don't cheat - fit only on training data
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)  # apply same transformation to test data

print("Data scaled.")
# build a model
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True)
#model.fit(X,y)

print("CV starts.")
from sklearn import grid_search
# run grid search
param_grid = [{'alpha' : 10.0**-np.arange(1,7),'l1_ratio':[.05, .15, .5, .7, .9, .95, .99, 1]}]
gs = grid_search.GridSearchCV(model,param_grid,n_jobs=8,verbose=1)
gs.fit(X_train, y_train)

print("Scores for alphas:")
print(gs.grid_scores_)
print("Best estimator:")
print(gs.best_estimator_)
print("Best score:")
print(gs.best_score_)
print("Best parameters:")
print(gs.best_params_)

回答1:


This probably means that you are on Windows and you have forgotten to use the proper idiom in the main module:

if __name__ == '__main__':
    freeze_support()



回答2:


You can find information related to multiprocessing here in point 16.6.2.3.

So, a working example would be:

from multiprocessing import Process, freeze_support

def f():
    print 'hello world!'

if __name__ == '__main__':
    freeze_support()
    Process(target=f).start()


来源:https://stackoverflow.com/questions/24363300/freeze-support-bug-in-using-scikit-learn-in-the-anaconda-python-distro

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!