How do I set many elements in parallel in theano

笑着哭i 提交于 2020-01-06 18:31:09

问题


Lets say I create a theano function, how do I run operations in parallel elementwise on theano tensors like on matrices?

# This is in theano function. Instead of for loop, I'd like to run this in parallel
c = np.asarray(shape=(2,200))
            for n in range(0,20):
                # some example in looping this is arbitrary and doesn't matter
                c[0][n] = n % 20
                c[1][n] = n / 20
            # in cuda, we normally use an if statement
            # if (threadIdx.x === some_index) { c[0][n] = some_value; }

The question should be reformed, how do I do parallel operations in a Theanos function? I've looked at http://deeplearning.net/software/theano/tutorial/multi_cores.html#parallel-element-wise-ops-with-openmp which only talks about adding a setting, but does not explain how an operation is parallelized for element wise operations.


回答1:


To an extent, Theano expects you to focus more on what you want computed rather than on how you want it computed. The idea is that the Theano optimizing compiler will automatically parallelize as much as possible (either on GPU or on CPU using OpenMP).

The following is an example based on the original post's example. The difference is that the computation is declared symbolically and, crucially, without any loops. Here one is telling Theano that the results should be a stack of tensors where the first tensor is the values in a range modulo the range size and the second tensor is the elements of the same range divided by the range size. We don't say that a loop should occur but clearly at least one will be required. Theano compiles this down to executable code and will parallelize it if it makes sense.

import theano
import theano.tensor as tt


def symbolic_range_div_mod(size):
    r = tt.arange(size)
    return tt.stack(r % size, r / size)


def main():
    size = tt.dscalar()
    range_div_mod = theano.function(inputs=[size], outputs=symbolic_range_div_mod(size))
    print range_div_mod(20)


main()

You need to be able to specify your computation in terms of Theano operations. If those operations can be parallelized on the GPU, they should be parallelized automatically.



来源:https://stackoverflow.com/questions/31777135/how-do-i-set-many-elements-in-parallel-in-theano

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!