I am using hikari cp with spring boot app which has more that 1000 concurrent users. I have set the max pool size-
spring.datasource.hikari.maximum-pool-size=300
Yes, it's intended. Quoting the documentation:
This property controls the maximum size that the pool is allowed to reach, including both idle and in-use connections. Basically this value will determine the maximum number of actual connections to the database backend. A reasonable value for this is best determined by your execution environment. When the pool reaches this size, and no idle connections are available, calls to
getConnection()
will block for up toconnectionTimeout
milliseconds before timing out. Please read about pool sizing. Default: 10
So basically, when all 300 connections are in use, and you are trying to make your 301st connection, Hikari won't create a new one (as maximumPoolSize
is the absolute maximum), but it will rather wait (by default 30 seconds) until a connection is available again.
This also explains why you get the exception you mentioned, because the default (when not configuring a maximumPoolSize
) is 10 connections, which you'll probably immediately reach.
To solve this issue, you have to find out why these connections are blocked for more than 30 seconds. Even in a situation with 1000 concurrent users, there should be no problem if your query takes a few milliseconds or a few seconds at most.
If you are invoking really complex queries that take a long time, there are a few possibilities. The first one is to increase the pool size. This however is not recommended, as the recommended formula for calculating the maximum pool size is:
connections = ((core_count * 2) + effective_spindle_count)
Quoting the About Pool Sizing article:
A formula which has held up pretty well across a lot of benchmarks for years is that for optimal throughput the number of active connections should be somewhere near
((core_count * 2) + effective_spindle_count)
. Core count should not include HT threads, even if hyperthreading is enabled. Effective spindle count is zero if the active data set is fully cached, and approaches the actual number of spindles as the cache hit rate falls. ... There hasn't been any analysis so far regarding how well the formula works with SSDs.
As described within the same article, that means that a 4 core server with 1 hard disk should only have about 10 connections. Even though you might have more cores, I'm assuming that you don't have enough cores to warrant the 300 connections you're making, let alone increasing it even further.
Another possibility is to increase the connection timeout. As mentioned before, when all connections are in use, it will wait for 30 seconds by default, which is the connection timeout.
You can increase this value so that the application will wait longer before going in timeout. If your complex query takes 20 seconds, and you have a connection pool of 300 and 1000 concurrent users, you should theoretically configure your connection timeout to be at least 20 * 1000 / 300 = 67 seconds
.
Be aware though, that means that your application might take a long time before showing a response to the user. If you have a 67 second connection timeout and an additional 20 seconds before your complex query completes, your user might have to wait up to a minute and a half.
As mentioned before, your primary goal would be to find out why your queries are taking so long. With a connection pool of 300, a connection timeout of 30 seconds and 1000 concurrent users, it means that your queries are taking at least 9 seconds before completing, which is a lot.
Try to improve the execution time by: