Memory leaks in Jpype with multiprocessing

问题

I have a python code that uses a java library by means of jpype. Currently, each run of my function checks if JVM exists, and creates it if it is not the case

import jpype as jp

def myfunc(i):
  if not jp.isJVMStarted():
    jp.startJVM(jp.getDefaultJVMPath(), '-ea', ('-Djava.class.path=' + jar_location))
  do_something_hard(i)

Further, I want to parallelize my code using python multiprocessing library. Each thread (supposedly) works independently, calculating value of my function with different parameters. For example

import pathos

pool = pathos.multiprocessing.ProcessingPool(8)
params = np.arange(100)
result = pool.map(myfunc, params)

This construction works fine, except it has dramatic memory leaks when using more than 1 core in the pool. I notice that all memory is free up when python is closed, but memory still accumulates over time while pool.map is running, which is undesirable. The jpype documentation is incredibly brief, suggesting to synchronize threads by wrapping python threads with jp.attachThreadToJVM and jp.detachThreadToJVM. However, I cannot find a single example online on how to actually do it. I have tried wrapping the function do_something_hard inside myfunc with these statements, but it had no effect on the leak. I had also attempted to explicitly close JVM at the end of myfunc using jp.shutdownJVM. However, in this case JVM seems to crash as soon as I have more than 1 core, leading me to believe that there is a race condition.

Please help:

What is going on? Why would there be a race condition? Is it not the case, that each thread makes its own JVM?
What is the correct way to free up memory in my scenario?

回答1:

The problem is with the nature of multiprocessing. Python can either fork or spawn a new process. The fork option appears to have significant problems with the JVM. The default on linux is fork.

Using the spawn context (multiprocessing.get_context("spawn")) to create a spawned version of Python will allow a fresh JVM to be created. Each spawned copy is completely independent. There are examples in the subrun.py in the test directory on github as that is what is used to test different JVM options for JPype.

The fork version creates a copy of the original process including the previous running JVM. At least from my testing the forked JVM does not work as expected. Older versions of JPype (0.6.x) would allow the forked version to call startJVM which would create a big memory leak. The current version 0.7.1 gives and exception that the JVM cannot be restarted.

If you are using threads (rather than processes), all threads share the same JVM and do not need to the JVM independently. There is further documentation on the use of multiprocessing with JPype in the latest documentation on github under the "limitations" section.

来源：https://stackoverflow.com/questions/58695140/memory-leaks-in-jpype-with-multiprocessing

标签

java

python

jpype