fastest (low latency) method for Inter Process Communication between Java and C/C++

后端 未结 10 843
隐瞒了意图╮
隐瞒了意图╮ 2020-11-29 14:45

I have a Java app, connecting through TCP socket to a \"server\" developed in C/C++.

both app & server are running on the same machine, a Solaris box (but we\'re

相关标签:
10条回答
  • 2020-11-29 15:07

    Have you considered keeping the sockets open, so the connections can be reused?

    0 讨论(0)
  • 2020-11-29 15:09

    DMA is a method by which hardware devices can access physical RAM without interrupting the CPU. E.g. a common example is a harddisk controller which can copy bytes straight from disk to RAM. As such it's not applicable to IPC.

    Shared memory and pipes are both supported directly by modern OSes. As such, they're quite fast. Queues are typically abstractions, e.g. implemented on top of sockets, pipes and/or shared memory. This may look like a slower mechanism, but the alternative is that you create such an abstraction.

    0 讨论(0)
  • 2020-11-29 15:13

    If you ever consider using native access (since both your application and the "server" are on the same machine), consider JNA, it has less boilerplate code for you to deal with.

    0 讨论(0)
  • 2020-11-29 15:22

    Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

    TCP         - 25 microseconds
    Named pipes - 15 microseconds
    

    Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

    TCP, same cores:                      30 microseconds
    TCP, explicit different cores:        22 microseconds
    Named pipes, same core:               4-5 microseconds !!!!
    Named pipes, taskset different cores: 7-8 microseconds !!!!
    

    so

    TCP overhead is visible
    scheduling overhead (or core caches?) is also the culprit
    

    At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

    Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

    P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

    MappedByteBuffer mem =
      new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()
      .map(FileChannel.MapMode.READ_WRITE, 0, 1);
    
    while(true){
      while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request
      mem.put(0, (byte)10); // sending the reply
    }
    

    Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

    P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

    int j=123456789;
    String ret = "my-record-key-" + j  + "-in-db";
    

    P.P.P.S - hope this is not too much off-topic, but finally I tried replacing Thread.sleep(0) with incrementing a static volatile int variable (JVM happens to flush CPU caches when doing so) and obtained - record! - 72 nanoseconds latency java-to-java process communication!

    When forced to same CPU Core, however, volatile-incrementing JVMs never yield control to each other, thus producing exactly 10 millisecond latency - Linux time quantum seems to be 5ms... So this should be used only if there is a spare core - otherwise sleep(0) is safer.

    0 讨论(0)
  • 2020-11-29 15:23

    A late arrival, but wanted to point out an open source project dedicated to measuring ping latency using Java NIO.

    Further explored/explained in this blog post. The results are(RTT in nanos):

    Implementation, Min,   50%,   90%,   99%,   99.9%, 99.99%,Max
    IPC busy-spin,  89,    127,   168,   3326,  6501,  11555, 25131
    UDP busy-spin,  4597,  5224,  5391,  5958,  8466,  10918, 18396
    TCP busy-spin,  6244,  6784,  7475,  8697,  11070, 16791, 27265
    TCP select-now, 8858,  9617,  9845,  12173, 13845, 19417, 26171
    TCP block,      10696, 13103, 13299, 14428, 15629, 20373, 32149
    TCP select,     13425, 15426, 15743, 18035, 20719, 24793, 37877
    

    This is along the lines of the accepted answer. System.nanotime() error (estimated by measuring nothing) is measured at around 40 nanos so for the IPC the actual result might be lower. Enjoy.

    0 讨论(0)
  • 2020-11-29 15:23

    In my former company we used to work with this project, http://remotetea.sourceforge.net/, very easy to understand and integrate.

    0 讨论(0)
提交回复
热议问题