python-multiprocessing

Python Memory Error when reading large files , need ideas to apply mutiprocessing in below case?

守給你的承諾、 提交于 2020-07-16 04:22:46
问题 I have the file which stores the data in the below format TIME[04.26_12:30:30:853664]ID[ROLL:201987623]MARKS[PHY:100|MATH:200|CHEM:400] TIME[03.27_12:29:30.553669]ID[ROLL:201987623]MARKS[PHY:100|MATH:1200|CHEM:900] TIME[03.26_12:28:30.753664]ID[ROLL:2341987623]MARKS[PHY:100|MATH:200|CHEM:400] TIME[03.26_12:29:30.853664]ID[ROLL:201978623]MARKS[PHY:0|MATH:0|CHEM:40] TIME[04.27_12:29:30.553664]ID[ROLL:2034287623]MARKS[PHY:100|MATH:200|CHEM:400] Below method I found to fulfill the need given in

multiprocessing pool: determine process names (unable to terminate its processes)

拜拜、爱过 提交于 2020-07-09 05:29:48
问题 I have some code in which I attempt to create 4 processes within a Pool . Once I get any exception (eg the database it is trying to connect to is down), I want to kill the pool, sleep for 10secs and then create a new pool with 4 processes. However it seems that the Pool is never killed because the processes names keep getting incremented each time. Does the pool have a cache where it keeps name count? def connect_db() pass while True: p = Pool(4) for process in multiprocessing.active_children

Python multiprocessing shared memory issues with C objects involved

拥有回忆 提交于 2020-07-09 04:54:14
问题 I am working on a program that uses an external C library to parse data from external sources and a Python library to run some optimisation problem on it. The optimisation is very time consuming so using several CPU would be a significant plus. Basically, I wrapped the C(++) structures with Cython as follows: cdef class CObject(object): cdef long p_sthg cdef OBJECT* sthg def __cinit__(self, sthg): self.p_sthg = sthg self.sthg = <OBJECT*> self.p_sthg def __reduce__(self): return (rebuildObject

multiprocessing ignores “__setstate__”

耗尽温柔 提交于 2020-06-28 14:31:34
问题 I assumed that the multiprocessing package used pickle to send things between processes. However, pickle pays attention to the __getstate__ and __setstate__ methods of an object. Multiprocessing seems to ignore them. Is this correct? Am I confused? To replicate, install docker, and type into command line $ docker run python:3.4 python -c "import pickle import multiprocessing import os class Tricky: def __init__(self,x): self.data=x def __setstate__(self,d): self.data=10 def __getstate__(self)

Parallel for loop over numpy matrix

你说的曾经没有我的故事 提交于 2020-06-27 19:41:07
问题 I am looking at the joblib examples but I can't figure out how to do a parallel for loop over a matrix. I am computing a pairwise distance metric between the rows of a matrix. So I was doing: N, _ = data.shape upper_triangle = [(i, j) for i in range(N) for j in range(i + 1, N)] dist_mat = np.zeros((N,N)) for (i, j) in upper_triangle: dist_mat[i,j] = dist_fun(data[i], data[j]) dist_mat[j,i] = dist_mat[i,j] where dist_fun takes two vectors and computes a distance. How can I make this loop

Parallel for loop over numpy matrix

依然范特西╮ 提交于 2020-06-27 19:40:13
问题 I am looking at the joblib examples but I can't figure out how to do a parallel for loop over a matrix. I am computing a pairwise distance metric between the rows of a matrix. So I was doing: N, _ = data.shape upper_triangle = [(i, j) for i in range(N) for j in range(i + 1, N)] dist_mat = np.zeros((N,N)) for (i, j) in upper_triangle: dist_mat[i,j] = dist_fun(data[i], data[j]) dist_mat[j,i] = dist_mat[i,j] where dist_fun takes two vectors and computes a distance. How can I make this loop

Why ProcessPoolExecutor on Windows needs __main__ guard when submitting function from another module?

大憨熊 提交于 2020-06-27 17:05:40
问题 Let's say I have a program import othermodule, concurrent.futures pool = concurrent.futures.ProcessPoolExecutor() and then I want to say fut = pool.submit(othermodule.foo, 5) print(fut.result()) Official docs say I need to guard these latter two statements with if __name__ == '__main__' . It's not hard to do, I would just like to know why . foo lives in othermodule , and it knows that ( foo.__module__ == 'othermodule' ). And 5 is a literal int. Both can be pickled and unpickled without any

instance methods with multiprocessing.Pool

浪尽此生 提交于 2020-06-27 06:14:09
问题 I've been playing around with a Pool object while using an instance method as the func argument. It's been a bit surprising with regards to instance state. It seems like the instance gets reset on every chunk. E.g.: import multiprocessing as mp import logging class Worker(object): def __init__(self): self.consumed = set() def consume(self, i): if i not in self.consumed: logging.info(i) self.consumed.add(i) if __name__ == '__main__': n = 1 logging.basicConfig(level='INFO', format='%(process)d:

instance methods with multiprocessing.Pool

喜夏-厌秋 提交于 2020-06-27 06:11:15
问题 I've been playing around with a Pool object while using an instance method as the func argument. It's been a bit surprising with regards to instance state. It seems like the instance gets reset on every chunk. E.g.: import multiprocessing as mp import logging class Worker(object): def __init__(self): self.consumed = set() def consume(self, i): if i not in self.consumed: logging.info(i) self.consumed.add(i) if __name__ == '__main__': n = 1 logging.basicConfig(level='INFO', format='%(process)d:

Replace pickle in Python multiprocessing lib

走远了吗. 提交于 2020-06-25 09:44:25
问题 I need to execute the code below (simplified version of my real code base in Python 3.5): import multiprocessing def forever(do_something=None): while True: do_something() p = multiprocessing.Process(target=forever, args=(lambda: print("do something"),)) p.start() In order to create the new process Python need to pickle the function and the lambda passed as target. Unofrtunately pickle cannot serialize lambdas and the output is like this: _pickle.PicklingError: Can't pickle <function <lambda>