How to call a java function from python/numpy?

后端未结

关注

 3  411

陌清茗

it is clear to me how to extend Python with C++, but what if I want to write a function in Java to be used with numpy?

Here is a simple scenario: I want to compute

Overview and assessment of tools to call Java from Python

http://pypi.python.org/pypi/JCC: because of no proper documentation this tool is useless.
Py4J: requires to start the Java process before using python. As remarked by others this is a possible point of failure. Moreover, not many examples of use are documented.
JPype: although development seems to be death, it works well and there are many examples on it on the web (e.g. see http://kogs-www.informatik.uni-hamburg.de/~meine/weka-python/ for using data mining libraries written in Java) . Therefore I decided to focus on this tool.

Installing JPype on Fedora 16

I am using Fedora 16, since there are some issues when installing JPype on Linux, I describe my approach. Download JPype, then modify setup.py script by providing the JDK path, in line 48:

self.javaHome = '/usr/java/default'

then run:

sudo python setup.py install

Afters successful installation, check this file:

/usr/lib64/python2.7/site-packages/jpype/_linux.py

and remove or rename the method getDefaultJVMPath() into getDefaultJVMPath_old(), then add the following method:

def getDefaultJVMPath():
    return "/usr/java/default/jre/lib/amd64/server/libjvm.so"

Alternative approach: do not make any change in the above file _linux.py, but never use the method getDefaultJVMPath() (or methods which call this method). At the place of using getDefaultJVMPath() provide directly the path to the JVM. Note that there are several paths, for example in my system I also have the following paths, referring to different versions of the JVM (it is not clear to me whether the client or server JVM is better suited):

/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre/lib/x86_64/client/libjvm.so
/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre/lib/x86_64/server/libjvm.so
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server/libjvm.so

Finally, add the following line to ~/.bashrc (or run it each time before opening a python interpreter):

export JAVA_HOME='/usr/java/default'

(The above directory is in reality just a symbolic link to my last version of JDK, which is located at /usr/java/jdk1.7.0_04).

Note that all the tests in the directory where JPype has been downloaded, i.e. JPype-0.5.4.2/test/testsuite.py will fail (so do not care about them).

To see if it works, test this script in python:

import jpype 
jvmPath = jpype.getDefaultJVMPath() 
jpype.startJVM(jvmPath)
# print a random text using a Java class
jpype.java.lang.System.out.println ('Berlusconi likes women') 
jpype.shutdownJVM()

Calling Java classes from Java also using Numpy

Let's start implementing a Java class containing some functions which I want to apply to numpy arrays. Since there is no concept of state, I use static functions so that I do not need to create any Java object (creating Java objects would not change anything).

/**
 * Cookbook to pass numpy arrays to Java via Jpype
 * @author Mannaggia
 */

package test.java;

public class Average2 {

public static double compute_average(double[] the_array){
    // compute the average
    double result=0;
    int i;
    for (i=0;i<the_array.length;i++){
        result=result+the_array[i];
    }
    return result/the_array.length;
}
// multiplies array by a scalar
public static double[] multiply(double[] the_array, double factor) {

    int i;
    double[] the_result= new double[the_array.length];
    for (i=0;i<the_array.length;i++) {
        the_result[i]=the_array[i]*factor;
    }
    return the_result;
}

/**
 * Matrix multiplication. 
 */
public static double[][] mult_mat(double[][] mat1, double[][] mat2){
    // find sizes
    int n1=mat1.length;
    int n2=mat2.length;
    int m1=mat1[0].length;
    int m2=mat2[0].length;
    // check that we can multiply
    if (n2 !=m1) {
        //System.err.println("Error: The number of columns of the first argument must equal the number of rows of the second");
        //return null;
        throw new IllegalArgumentException("Error: The number of columns of the first argument must equal the number of rows of the second");
    }
    // if we can, then multiply
    double[][] the_results=new double[n1][m2];
    int i,j,k;
    for (i=0;i<n1;i++){
        for (j=0;j<m2;j++){
            // initialize
            the_results[i][j]=0;
            for (k=0;k<m1;k++) {
                the_results[i][j]=the_results[i][j]+mat1[i][k]*mat2[k][j];
            }
        }
    }
    return the_results;
}

/**
 * @param args
 */
public static void main(String[] args) {
    // test case
    double an_array[]={1.0, 2.0,3.0,4.0};
    double res=Average2.compute_average(an_array);
    System.out.println("Average is =" + res);
}
}

The name of the class is a bit misleading, as we do not only aim at computing the average of a numpy vector (using the method compute_average), but also multiply a numpy vector by a scalar (method multiply), and finally, the matrix multiplication (method mult_mat).

After compiling the above Java class we can now run the following Python script:

import numpy as np
import jpype

jvmPath = jpype.getDefaultJVMPath() 
# we to specify the classpath used by the JVM
classpath='/home/mannaggia/workspace/TestJava/bin'
jpype.startJVM(jvmPath,'-Djava.class.path=%s' % classpath)

# numpy array
the_array=np.array([1.1, 2.3, 4, 6,7])
# build a JArray, not that we need to specify the Java double type using the jpype.JDouble wrapper
the_jarray2=jpype.JArray(jpype.JDouble, the_array.ndim)(the_array.tolist())
Class_average2=testPkg.Average2 
res2=Class_average2.compute_average(the_jarray2)
np.abs(np.average(the_array)-res2) # ok perfect match! 

# now try to multiply an array
res3=Class_average2.multiply(the_jarray2,jpype.JDouble(3))
# convert to numpy array
res4=np.array(res3) #ok

# matrix multiplication
the_mat1=np.array([[1,2,3], [4,5,6], [7,8,9]],dtype=float)
#the_mat2=np.array([[1,0,0], [0,1,0], [0,0,1]],dtype=float)
the_mat2=np.array([[1], [1], [1]],dtype=float)
the_mat3=np.array([[1, 2, 3]],dtype=float)

the_jmat1=jpype.JArray(jpype.JDouble, the_mat1.ndim)(the_mat1.tolist())
the_jmat2=jpype.JArray(jpype.JDouble, the_mat2.ndim)(the_mat2.tolist())
res5=Class_average2.mult_mat(the_jmat1,the_jmat2)
res6=np.array(res5) #ok

# other test
the_jmat3=jpype.JArray(jpype.JDouble, the_mat3.ndim)(the_mat3.tolist())
res7=Class_average2.mult_mat(the_jmat3,the_jmat2)
res8=np.array(res7)
res9=Class_average2.mult_mat(the_jmat2,the_jmat3)
res10=np.array(res9)

# test error due to invalid matrix multiplication
the_mat4=np.array([[1], [2]],dtype=float)
the_jmat4=jpype.JArray(jpype.JDouble, the_mat4.ndim)(the_mat4.tolist())
res11=Class_average2.mult_mat(the_jmat1,the_jmat4)

jpype.java.lang.System.out.println ('Goodbye!') 
jpype.shutdownJVM()

0 讨论(0)