Performance issues using neo4j rest http client

别来无恙 提交于 2019-12-25 03:57:20

问题


Struggling this after replacing neo4j-jdbc client with Apache http client.

Seems like we still have issues when running only 1k concurrent users that execute our query.

This is how we using the client: https://gist.github.com/IdanFridman/1989b600a0a032329a5e

this is how we execute the query using that rest-client:

https://gist.github.com/IdanFridman/22637f95ba696f498b6c

after profiling we see the above bad performance results:

With avg latency of 3 seconds per request.

Should we ditch neo4j? we getting desperate with performances results

thanks.


回答1:


So, you want to more concurrent requests? Let's explore what we can do here.

Queries

First of all - check that query is performing well enough. Copy-paste it Neo4j Browser, prepend with PROFILE and explore output.

It might be that your query is doing a lot more than you are expecting. And this results in long wait time because Neo4j is still executing a query.

Client

HttpClient configuration

You are using PoolingHttpClientConnectionManager. From documentation:

PoolingHttpClientConnectionManager maintains a maximum limit of connections on a per route basis and in total. Per default this implementation will create no more than 2 concurrent connections per given route and no more 20 connections in total.

So, we should increase our limits. Example:

PoolingHttpClientConnectionManager cnnMgr = new PoolingHttpClientConnectionManager();
cnnMgr.setMaxTotal(500);
cnnMgr.setDefaultMaxPerRoute(100);

HttpRequest

Try to add keep-alive header to request. Example:

request.setHeader("Connection", "keep-alive");

Then, you should always close your response as soon as possible. You shouldn't rely on that fact that when you are exhausting stream content connection is closed. Code:

try(CloseableHttpResponse response = httpClient.execute(request)) {
    // do stuff with response here
    // close response when try-with-resource block ends
}

Remember - content that you are receiving from server transaction endpoint streamed back to a client.

return createResultSet(new JsonObject(IOUtils.toString(response.getEntity().getContent())));

So, in this code sample, we are waiting until we retrieve full response and only after that we start serialization.

In your case you are looking for something like this:

String rawJsonResult = null;
try(CloseableHttpResponse response = httpClient.execute(request);) {
    rawJsonResult = IOUtils.toString(response.getEntity().getContent());
} catch (IOException e) {
    throw new RuntimeException(e);
}
return createResultSet(new JsonObject(rawJsonResult));

By doing this, we ensure that we are retrieving result and closing connection before any serialization occurs. This will free up resources for other concurrent connections.

Server

Neo4j is using Jetty as a web server. Jetty is backed by BlockingQueue. This means that there x amount of concurrent HTTP request which can be processed. This x is queue size. If we have more than x amount of concurrent requests, then there are waiting for a free spot in the queue.

Fortunately, you can configure how large is a queue. You are interested in this property:

org.neo4j.server.webserver.maxthreads=200

Note: there is no magic here. By default, Neo4j is using cpuCount * 4 amount of web server threads. Increasing this number can result in a higher number of concurrent requests, but each request can slow down.

Linux

You should check this. Each TCP connection is a separate file. Usually, default value on most Linux distributions is 1024. You need to increase it. You can try 40000.

Remember - this applies not only to a server, but to the client as well. You not only want to receive connection, but also, you need to open them.

General Notes

You shouldn't believe profiling results that much. It's totally OK that we are waiting while making HTTP requests. Overall - this is most expensive part of communication.

Also, you should ensure that your Client and Server are located on the same local network. Doing request via a public network can significantly degrade performance.

And the last one - there is an upper limit of concurrent HTTP connections. Passing this limit can make database almost unresponsive (similar to any other web application). You might need to consider horizontal scaling (Neo4j Cluster) to be able to make more concurrent requests.


Good luck!



来源:https://stackoverflow.com/questions/35598783/performance-issues-using-neo4j-rest-http-client

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!