问题
I have a 4 node elasticsearch cluster. I have a .net console application that is designed to fill the cluster with data which comes from sql. Everything works fine as long as I keep the rate of records being added (or deleted) fairly low. If I increase the number of threads eventually I will see timeout errors from my console app. The cluster has a total of 48 cores and the average time it takes to index a record is about .1 seconds.
I have been able to get it to do about 7000 records (documents) per second. I never see any exceptions thrown from elasticsearch.net that indicate low resources. I never see any of the indexing queues overloaded. The servers never peak to more than about 10% cpu. It looks like the issue is not the cluster or it's configuration but something in the nest connection. Here is my code for the connection:
//set up the es client
Uri node = new Uri(ConfigurationManager.AppSettings["ESConnectionString"]);
var connectionPool = new SniffingConnectionPool(new[] { node });
ConnectionSettings settings = new ConnectionSettings(connectionPool);
settings.SetDefaultPropertyNameInferrer(p => p); //ditch the camelcase
settings.SniffOnConnectionFault(true);
settings.SniffOnStartup(true);
settings.SniffLifeSpan(TimeSpan.FromMinutes(1));
settings.SetPingTimeout(3000);
settings.SetTimeout(5000);
settings.MaximumRetries(5);
//settings.SetMaximumAsyncConnections(20);
settings.SetDefaultIndex("dummyindex");
settings.SetBasicAuthentication(ConfigurationManager.AppSettings["ESUser"], ConfigurationManager.AppSettings["ESPass"]);
ElasticClient client = new ElasticClient(settings);
I have the cluster set up with http.basic authentication, but I have tried with it turned on and off and there is no difference. Here are some of the pertinent settings from the ES nodes:
discovery.zen.minimum_master_nodes: 2
discovery.zen.fd.ping_timeout: 30s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["CACHE01","CACHE02","CACHE03","CACHE04"]
cluster.routing.allocation.node_concurrent_recoveries: 5
indices.recovery.max_bytes_per_sec: 50mb
http.basic.enabled: true
http.basic.user: "admin"
http.basic.password: "XXXXXXX"
At this point I can't seem to figure out if it's the .Net client that is the issue or the servers? Everything points to the client but I'm at a loss for what to try next. I don't think I can use the BulkAPI because I'm essentially just replicating changes from a SQL server and in order to keep them in sync I execute the change as soon as it's received. It seems when I'm inserting new documents I can go at a much faster pace then when updating. I have read the updating docs and it almost reads like partial updates are better than full updates, but the there is the whole get-update-delete-reindex things that seems to happen with every update.
According to the es docs I'm not supposed to tweak the thread pools or the performance settings. I don't think I'm hitting any of those limits anyway. The ES error logs don't indicate any issue either.
Anyone have advice on what I can do to track down the connection errors?
UPDATE: This is the actual error:
Error: Unexpected result (SaveToES). Elasticsearch.Net.Exceptions.MaxRetryException: Sniffing known nodes in the cluster caused a maxretry exception of its own ---> Elasticsearch.Net.Exceptions.SniffException: Sniffing known nodes in the cluster caused a maxretry exception of its own ---> Elasticsearch.Net.Exceptions.MaxRetryException: Retry timeout 00:00:05 was hit after retrying 1 times: 'GET _nodes/_all/clear?timeout=3000'. InnerException: WebException, InnerMessage: The operation has timed out, InnerStackTrace: at System.Net.HttpWebRequest.GetResponse() at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig) InnerException: WebException, InnerMessage: The operation has timed out, InnerStackTrace: at System.Net.HttpWebRequest.GetResponse() at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig) ---> System.AggregateException: One or more errors occurred. ---> System.Net.WebException: The operation has timed out at System.Net.HttpWebRequest.GetResponse() at Elasticsearch.Net.Connection.HttpConnection.DoSynchronousRequest(HttpWebRequest request, Byte[] data, IRequestConfiguration requestSpecificConfig) --- End of inner exception stack trace --- --- End of inner exception stack trace --- at Elasticsearch.Net.Connection.RequestHandlers.RequestHandlerBase.ThrowMaxRetryExceptionWhenNeeded[T](TransportRequestState
1 requestState, Int32 maxRetries) at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.RetryRequest[T](TransportRequestState
1 requestState) at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.DoRequest[T](TransportRequestState1 requestState) at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.RetryRequest[T](TransportRequestState
1 requestState) at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.DoRequest[T](TransportRequestState1 requestState) at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.Request[T](TransportRequestState
1 requestState, Object data) at Elasticsearch.Net.Connection.Transport.Elasticsearch.Net.Connection.ITransportDelegator.Sniff(ITransportRequestState ownerState) --- End of inner exception stack trace --- --- End of inner exception stack trace --- at Elasticsearch.Net.Connection.Transport.Elasticsearch.Net.Connection.ITransportDelegator.Sniff(ITransportRequestState ownerState) at Elasticsearch.Net.Connection.Transport.Elasticsearch.Net.Connection.ITransportDelegator.SniffClusterState(ITransportRequestState requestState) at Elasticsearch.Net.Connection.Transport.Elasticsearch.Net.Connection.ITransportDelegator.SniffOnConnectionFailure(ITransportRequestState requestState) at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.RetryRequest[T](TransportRequestState1 requestState) at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.DoRequest[T](TransportRequestState
1 requestState) at Elasticsearch.Net.Connection.RequestHandlers.RequestHandler.Request[T](TransportRequestState1 requestState, Object data) at Elasticsearch.Net.Connection.Transport.DoRequest[T](String method, String path, Object data, IRequestParameters requestParameters) at Elasticsearch.Net.ElasticsearchClient.DoRequest[T](String method, String path, Object data, IRequestParameters requestParameters) at Elasticsearch.Net.ElasticsearchClient.IndicesCreatePost[T](String index, Object body, Func
2 requestParameters) at Nest.RawDispatch.IndicesCreateDispatch[T](ElasticsearchPathInfo1 pathInfo, Object body) at Nest.ElasticClient.<CreateIndex>b__281_0(ElasticsearchPathInfo
1 p, ICreateIndexRequest d) at Nest.ElasticClient.Nest.IHighLevelToLowLevelDispatcher.Dispatch[D,Q,R](D descriptor, Func3 dispatch) at Nest.ElasticClient.CreateIndex(Func
2 createIndexSelector) at DCSCache.esvRepository.CreateIndex(String IndexName, String IndexVersion) at DCSCache.esvRepository.Save(esv ItemToSave, String IndexName, String IndexVersion)
来源:https://stackoverflow.com/questions/33838067/elasticsearch-net-and-timeouts