I'm using LettuceConnection with a connection pool to connect my application to a Redis server. However, during load testing, I encountered a significant number of command timeout errors. Initially, I suspected that the Redis server might be struggling to handle the high traffic, but after checking, I found that there were still plenty of available resources — CPU usage was around 5%, and memory usage was only about 10%.
This made me wonder: when exactly does the command timeout start? Does it include the time spent waiting for a connection from the pool to become available (in that case increasing connection pool size may help), or does it begin only after the request is sent to the Redis server? I reviewed the Lettuce documentation, but it’s unclear how the command timeout is measured. Please help me understand.
Here's the configuration I used.
RedisStandaloneConfiguration configuration = new RedisStandaloneConfiguration("host");
GenericObjectPoolConfig<?> poolConfig = new GenericObjectPoolConfig<>();
poolConfig.setMaxWaitMillis(50);
poolConfig.setMinIdle(2);
poolConfig.setMaxIdle(2);
poolConfig.setMaxTotal(4);
LettuceClientConfiguration client = LettucePoolingClientConfiguration.builder()
.poolConfig(poolConfig)
.commandTimeout(Duration.ofMillis(50))
.build();
return new LettuceConnectionFactory(configuration, client);
1 Answer 1
The topic of command timeouts is - by far - the most complicated one, when it comes to the Lettuce driver specifically, but also in general.
TL;DR
Typically*1 command timeout starts from the moment a given command is written - by the Lettuce driver logic - to the Netty channel and would be stopped either when the timeout is reached (causing an ExecutionException with cause RedisCommandTimeoutException) or the command is completed (either successfully or not).
*1 if no other command timeout conditions are met and depending on the usage of the driver
Long and boring story
In the reality there are many different reasons that could cause a command to timeout:
- if the channel (socket) is disconnected and the policy the driver is configured with is at-least-once reliability the command could time out in the disconnected buffer, before even being written on the actual socket
- the
asyncAPI relies on the CompletableFuture contract underneath and if you specify a timeout it would follow the rules of the underlying Executor - when working with clusters the actual writing of the command might be preceded with some other events such as - initialize the node connection (as connections to all nodes are initialized on demand) or follow up a
MOVED/ASKreply from the server - in scenarios where a different ReadFrom policy is set the logic to calculate the correct node to route the command to is also included in the command timeout
- ... and other cases
So the full answer is - one needs to provide a much more complete explanation of how the driver is used to know exactly how the timeout is calculated.
Comments
Explore related questions
See similar questions with these tags.