LDAP connection timeout leads to authentication error
I’m using LDAP backend in readonlyrest config. Once for a while users are unable to login into kibana. Logs indicates, that readonlyrest cannot access connection to LDAP backend due to timeout. After several attempts plugin reestablish connection to LDAP and user has granted access. This frustrates users, because situation repeats itself every day.
I assemble testing environment to replicate that behaviour. It apperars that readonlyrest plugin established connection at during initialization - after elasticsearch starts, or when configuration of plugin is changed. Then, after a while (I tested 1h period), when connection are not used, they timeout. When user tries to authenticate, at frst, plugin in elasticsearch tires to use timeouted connection, and get exception. Then after a white it tries to reconnect, but in the meantime plugin in kibana gets info about forbiden access.
I’ve tried to fiddle with avaliable timeout and cache parameters in readonlyrest config to remediate this behaviour, but without success.
I’ve include k8s yaml file with testing deployment and log files of all pods (elasticsearch, kibana and openldap)
Expected behaviour
Elasticsearch plugin should try to reconnect to auth backend before returning “access denied” info to client (kibana plugin).
Unfortunately, problem still persist. Logs from elasticsearch:
[2025-12-18T12:59:59,517][ERROR][t.b.r.a.b.Block ] [elastic-test-1] [1d450c0c-afb4-4455-926d-8ea507636244-1446298784#159411] ldap_group: kibana: ldap_auth rule matching got an error LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered
a client-side timeout after waiting 10001 milliseconds for a response to arrive.'
tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.LdapUnexpectedResult: LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 10001 milliseconds for a response to arrive.'
at tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.LdapUnexpectedResult$.apply(UnboundidLdapUsersService.scala:112) ~[?:?]
at tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.groupsFrom$$anonfun$2(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:86) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:44) ~[?:?]
at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:45) ~[?:?]
at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.groupsFrom(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:75) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:44) ~[?:?]
at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:45) ~[?:?]
at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapUsersService.fetchLdapUser(UnboundidLdapUsersService.scala:66) ~[?:?]
at runAsync @ tech.beshu.ror.es.IndexLevelActionFilter.handleRequest(IndexLevelActionFilter.scala:205) ~[?:?]
at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.doFetchGroupsOf(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:63) ~[?:?]
at runAsync @ tech.beshu.ror.es.IndexLevelActionFilter.handleRequest(IndexLevelActionFilter.scala:205) ~[?:?]
at map @ tech.beshu.ror.utils.TaskOps$.andThen$extension(TaskOps.scala:30) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.rules.auth.base.BaseAuthorizationRule.authorizeLoggedUser(BaseAuthorizationRule.scala:101) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.rules.auth.base.BaseAuthorizationRule.authorizeLoggedUser(BaseAuthorizationRule.scala:102) ~[?:?]
at map @ tech.beshu.ror.utils.TaskOps$.measure$extension$$anonfun$1$$anonfun$2(TaskOps.scala:60) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapAuthenticationService.ldapAuthenticate(UnboundidLdapAuthenticationService.scala:62) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapAuthenticationService.ldapAuthenticate(UnboundidLdapAuthenticationService.scala:62) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:44) ~[?:?]
[2025-12-18T13:00:09,683][ERROR][t.b.r.a.b.d.l.i.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering] [elastic-test-1] [1d450c0c-afb4-4455-926d-8ea507636244-1446298784#159411] LDAP getting user groups returned error: [code=85 (timeout), cause=result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 10012 milliseconds for a response to arrive.']
[2025-12-18T13:00:09,689][ERROR][t.b.r.a.b.Block ] [elastic-test-1] [1d450c0c-afb4-4455-926d-8ea507636244-1446298784#159411] ldap_group: kibana-user: ldap_auth rule matching got an error LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 10012 milliseconds for a response to arrive.'
tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.LdapUnexpectedResult: LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 10012 milliseconds for a response to arrive.'
at tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.LdapUnexpectedResult$.apply(UnboundidLdapUsersService.scala:112) ~[?:?]
at tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.groupsFrom$$anonfun$2(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:86) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:44) ~[?:?]
at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:45) ~[?:?]
at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.groupsFrom(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:75) ~[?:?]
at liftF @ tech.beshu.ror.accesscontrol.blocks.Block$Lifter.apply(Block.scala:200) ~[?:?]
at mapBoth @ tech.beshu.ror.accesscontrol.blocks.Block.execute(Block.scala:67) ~[?:?]
at runAsync @ tech.beshu.ror.es.IndexLevelActionFilter.handleRequest(IndexLevelActionFilter.scala:205) ~[?:?]
at parMap2 @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:61) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.$anonfun$3(BaseHostsRule.scala:64) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.$anonfun$3(BaseHostsRule.scala:66) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.$anonfun$3(BaseHostsRule.scala:66) ~[?:?]
at parMap2 @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:61) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:63) ~[?:?]
at flatMap @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:67) ~[?:?]
at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:68) ~[?:?]
at runAsync @ tech.beshu.ror.es.IndexLevelActionFilter.handleRequest(IndexLevelActionFilter.scala:205) ~[?:?]
at foldMap @ tech.beshu.ror.boot.ReadonlyRest.runStartingFailureProgram(ReadonlyRest.scala:107) ~[?:?]
First and second attempt to login in kibana failed, only the third one succeeded.
Now, the request to LDAP failed due to a request timeout (default 10 seconds).
You should consider changing the request timeout (see docs) or, even better, adding a cache (see docs)
Is it really? This error occured with two different databases - AD from Azure and openldap served on-premise. If it is matter of timeout/cache values, the defaults doesn’t work with two most popular ldap solutions.
Changing request timeout changes only time, you have to wait for error to occur. For request_timeout_in_sec set to 20s I have the same behaviour:
[2025-12-22T06:14:26,754][ERROR][t.b.r.a.b.Block ] [elastic-test-1] [012c0c8d-c3bb-4968-9465-00dfdf5d225e-978310386#4564088] ldap_group: kibana: ldap_auth rule matching got an error LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 20001 milliseconds for a response to arrive.'
I have tried several values here - with the same result.
Cacheing responses does not help here also. When users actively use kibana, problem does not occurs - connections to ldap are kept alive by regular queries. Problem occurs only after long period of inactivity - usualy overnight.
After that time connections to ldap are closed on the database side, which triggers error by first user trying to use kibana in the morning. Keeping cache for that long, (beside not beeing efecitve for user queries) is not good from security perspecive.
I was referring to the “LDAP returned code: timeout [85]”. It means client timeout.
I understand that you are sure (because of the two separate LDAP servers) that the LDAP server was not so busy that it wasn’t able to handle the request within a given time.
Ok, I will try to reproduce it on my side. Will get back to you when I find something
I have one more pre-build to check.
I didn’t reproduce the problem yet, but I noticed one thing that could be improved (in the context of the k8s-based test env, you showed us).
ok, great. I’ve used a different health check to determine connection degradation. It seems that in environments with a proxy (like the k8s one), the previous health check may not be reliable.