why it is so slow with 100,000 records when using pipeline in redis?

问题

It is said that pipeline is a better way when many set/get is required in redis, so this is my test code:

public class TestPipeline {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
        List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
        list.add(si);
        ShardedJedis jedis = new ShardedJedis(list);
        long startTime = System.currentTimeMillis();
        ShardedJedisPipeline pipeline = jedis.pipelined();
        for (int i = 0; i < 100000; i++) {
            Map<String, String> map = new HashMap<String, String>();
            map.put("id", "" + i);
            map.put("name", "lyj" + i);
            pipeline.hmset("m" + i, map);
        }
        pipeline.sync();
        long endTime = System.currentTimeMillis();
        System.out.println(endTime - startTime);
    }
}

When I ran it, there is no response with this program for a while, but when I don't work with pipe, it takes only 20073 ms, so I am confused why it is even better without pipeline and how a wide gap!

Thanks for answer me, a few questions, how do you calculate 6MB data? When I send 10K data, pipeline is always faster than normal mode, but with 100k, pipeline would no response.I think 100-1000 operations is a advisable choice as below said.Is there anyting with JIT since I don't understand it?

回答1:

There are a few points you need to consider before writing such a benchmark (and especially a benchmark using the JVM):

on most (physical) machines, Redis is able to process more than 100K ops/s when pipelining is used. Your benchmark only deals with 100K item, so it does not last long enough to produce meaningful results. Furthermore, there is no time for the successive stages of the JIT to kick in.
the absolute time is not a very relevant metric. Displaying the throughput (i.e. the number of operation per second) while keeping the benchmark running for at least 10 seconds would be a better and more stable metric.
your inner loop generates a lot of garbage. If you plan to benchmark Jedis+Redis, then you need to keep the overhead of your own program low.
because you have defined everything into the main function, your loop will not be compiled by the JIT (depending on the JVM you use). Only the inner method calls may be. If you want the JIT to be efficient, make sure to encapsulate your code into methods that can be compiled by the JIT.
optionally, you may want to add a warm-up phase before performing the actual measurement to avoid accounting the overhead of running the first iterations with the bare-bone interpreter, and the cost of the JIT itself.

Now, regarding Redis pipelining, your pipeline is way too long. 100K commands in the pipeline means Jedis has to build a 6MB buffer before sending anything to Redis. It means the socket buffers (on client side, and perhaps server-side) will be saturated, and that Redis will have to deal with 6 MB communication buffers as well.

Furthermore, your benchmark is still synchronous (using a pipeline does not magically make it asynchronous). In other words, Jedis will not start reading replies until the last query of your pipeline has been sent to Redis. When the pipeline is too long, it has the potential to block things.

Consider limiting the size of the pipeline to 100-1000 operations. Of course, it will generate more roundtrips, but the pressure on the communication stack will be reduced to an acceptable level. For instance, consider the following program:

import redis.clients.jedis.*;
import java.util.*;

public class TestPipeline {

    /**
     * @param args
     */

    int i = 0; 
    Map<String, String> map = new HashMap<String, String>();
    ShardedJedis jedis;  

    // Number of iterations
    // Use 1000 to test with the pipeline, 100 otherwise
    static final int N = 1000;

    public TestPipeline() {
      JedisShardInfo si = new JedisShardInfo("127.0.0.1", 6379);
      List<JedisShardInfo> list = new ArrayList<JedisShardInfo>();
      list.add(si);
      jedis = new ShardedJedis(list);
    } 

    public void push( int n ) {
     ShardedJedisPipeline pipeline = jedis.pipelined();
     for ( int k = 0; k < n; k++) {
      map.put("id", "" + i);
      map.put("name", "lyj" + i);
      pipeline.hmset("m" + i, map);
      ++i;
     }
     pipeline.sync(); 
    }

    public void push2( int n ) {
     for ( int k = 0; k < n; k++) {
      map.put("id", "" + i);
      map.put("name", "lyj" + i);
      jedis.hmset("m" + i, map);
      ++i;
     }
    }

    public static void main(String[] args) {
      TestPipeline obj = new TestPipeline();
      long startTime = System.currentTimeMillis();
      for ( int j=0; j<N; j++ ) {
       // Use push2 instead to test without pipeline
       obj.push(1000); 
       // Uncomment to see the acceleration
       //System.out.println(obj.i);
     }
     long endTime = System.currentTimeMillis();
     double d = 1000.0 * obj.i;
     d /= (double)(endTime - startTime);
     System.out.println("Throughput: "+d);
   }
 }

With this program, you can test with or without pipelining. Be sure to increase the number of iterations (N parameter) when pipelining is used, so that it runs for at least 10 seconds. If you uncomment the println in the loop, you will realize that the program is slow at the begining and will get quicker as the JIT starts to optimize things (that's why the program should run at least several seconds to give a meaningful result).

On my hardware (an old Athlon box), I can get 8-9 times more throughput when the pipeline is used. The program could be further improved by optimizing key/value formatting in the inner loop and adding a warm-up phase.

来源：https://stackoverflow.com/questions/16697389/why-it-is-so-slow-with-100-000-records-when-using-pipeline-in-redis

标签

java

Redis

pipeline