Parameter search using dask

折月煮酒 提交于 2019-12-11 05:29:46

问题


How optimally search parameter space using Dask? (no cross validation)

Here is the code (no DASK here):

def build(ntries,param,niter,func,score,train,test):
    res=[]
    for i in range(ntries):
        cparam=param.rvs(size=niter,random_state=i)
        res.append( func(cparam, train, test, score) )
    return res

def score(test,correct):
    return np.linalg.norm(test-correct)

def compute_optimal(res):
    from operator import itemgetter
    _sorted=sorted(res,None,itemgetter(1))
    return _sorted

def func(c,train,test,score):
    dt=1.0/len(c)
    for cc in c:
        train=train - cc*dt
    return (c,score(train,test))

Here is how I use it:

from dask import delayed
from distributed import LocalCluster, Client
cluster=LocalCluster(n_workers=4, threads_per_worker=1)
cli=Client(cluster)

from scipy.stats import uniform
import numpy as np

niter=500
loc=1.0e-09
scale=1.0
ntries=1000
sched=uniform(loc=loc,scale=scale)
train=np.arange(1000)+0.5
test=np.arange(1000)

# HERE IS THE DASK
graph=build(ntries,sched,niter,delayed(func),score,train,test)

# THE QUESTION SECTION
# I do these steps to bring back all the values so that I could search for the score-wise optimal pair: (parameter, score)
res=[cli.compute(g) for g in graph]
results=[r.result() for r in res]
# Actual search for the optimal pair
optimal=compute_optimal(results)
best,worst=optimal[0],optimal[-1]

The questions are:

  1. Am I using Dask correctly here?
  2. Am I fetching data back to the client correctly? Are there more efficient ways to do this?
  3. Is there any way to do search for the optimal pair on workers?

P.S. Recently I posted related question but with different issue (thread.lock during custom parameter search class using Dask distributed). I've solved it and will post an answer shortly and will close that issue.

来源:https://stackoverflow.com/questions/44991053/parameter-search-using-dask

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!