Two processes (Java and Python) need to communicate in my application. I noticed that the socket communication takes 93% of the run time. Why is communication so slow? Should I
You have a number of options. Since you are using Linux you could use UNIX domain sockets. Or, you could serialise the data as ASCII or JSon or some other format and feed it through a pipe, SHM (shared memory segment), message queue, DBUS or similar. It's worth thinking about what sort of data you have, as these IPC mechanisms have different performance characteristics. There's a draft USENIX paper with a good analysis of the various trade-offs that is worth reading.
Since you say (in the comments to this answer) that you prefer to use SHM, here are some code samples to start you off. Using the Python posix_ipc library:
import posix_ipc # POSIX-specific IPC
import mmap # From Python stdlib
class SharedMemory(object):
"""Python interface to shared memory.
The create argument tells the object to create a new SHM object,
rather than attaching to an existing one.
"""
def __init__(self, name, size=posix_ipc.PAGE_SIZE, create=True):
self.name = name
self.size = size
if create:
memory = posix_ipc.SharedMemory(self.name, posix_ipc.O_CREX,
size=self.size)
else:
memory = posix_ipc.SharedMemory(self.name)
self.mapfile = mmap.mmap(memory.fd, memory.size)
os.close(memory.fd)
return
def put(self, item):
"""Put item in shared memory.
"""
# TODO: Deal with the case where len(item) > size(self.mapfile)
# TODO: Guard this method with a named semaphore
self.mapfile.seek(0)
pickle.dump(item, self.mapfile, protocol=2)
return
def get(self):
"""Get a Python object from shared memory.
"""
# TODO: Deal with the case where len(item) > size(self.mapfile)
# TODO: Guard this method with a named semaphore
self.mapfile.seek(0)
return pickle.load(self.mapfile)
def __del__(self):
try:
self.mapfile.close()
memory = posix_ipc.SharedMemory(self.name)
memory.unlink()
except:
pass
return
For the Java side you want to create the same class, despite what I said in the comments JTux seems to provide the equivalent functionality and the API you need is in UPosixIPC class.
The code below is an outline of the sort of thing you need to implement. However, there are several things missing -- exception handling is the obvious one, also some flags (find them in UConstant), and you'll want to add in a semaphore to guard the put
/ get
methods. However, this should set you on the right track. Remember that an mmap
or memory-mapped file is a file-like interface to a segment of RAM. So, you can use its file descriptor as if it were the fd
of a normal file.
import jtux.*;
class SHM {
private String name;
private int size;
private long semaphore;
private long mapfile; // File descriptor for mmap file
/* Lookup flags and perms in your system docs */
public SHM(String name, int size, boolean create, int flags, int perms) {
this.name = name;
this.size = size;
int shm;
if (create) {
flags = flags | UConstant.O_CREAT;
shm = UPosixIPC.shm_open(name, flags, UConstant.O_RDWR);
} else {
shm = UPosixIPC.shm_open(name, flags, UConstant.O_RDWR);
}
this.mapfile = UPosixIPC.mmap(..., this.size, ..., flags, shm, 0);
return;
}
public void put(String item) {
UFile.lseek(this.mapfile(this.mapfile, 0, 0));
UFile.write(item.getBytes(), this.mapfile);
return;
}
public String get() {
UFile.lseek(this.mapfile(this.mapfile, 0, 0));
byte[] buffer = new byte[this.size];
UFile.read(this.mapfile, buffer, buffer.length);
return new String(buffer);
}
public void finalize() {
UPosix.shm_unlink(this.name);
UPosix.munmap(this.mapfile, this.size);
}
}
Some thoughts
An odd combination, but is there any reason one cannot call the other sending via stdin, stdout?
Any call to the OS is going to be relatively slow (latency wise). Using shared memory can by pass the kernel. If throughput is you issue, I have found you can reach 1-2 GB/s using sockets if latency isn't such an issue for you.
Making shared memory ideal.
Not sure why this is the case. Why not build a single structure/buffer and write it once. I would use a direct buffer is NIO to minimise latency. Using character translation is pretty expensive, esp if you only need ASCII.
Should be easy to optimise.
I use shared memory via memory mapped files. This is because I need to record every message for auditing purposes. I get an average latency of around 180 ns round trip sustained for millions of messages, and about 490 ns in a real application.
One advantage of this approach is that if there are short delays, the reader can catch up very quickly with the writer. It also support re-start and replication easily.
This is only implemented in Java, but the principle is simple enough and I am sure it would work in python as well.
https://github.com/peter-lawrey/Java-Chronicle