LDAP connection timeout leads to authentication error

LDAP connection timeout leads to authentication error

I’m using LDAP backend in readonlyrest config. Once for a while users are unable to login into kibana. Logs indicates, that readonlyrest cannot access connection to LDAP backend due to timeout. After several attempts plugin reestablish connection to LDAP and user has granted access. This frustrates users, because situation repeats itself every day.
I assemble testing environment to replicate that behaviour. It apperars that readonlyrest plugin established connection at during initialization - after elasticsearch starts, or when configuration of plugin is changed. Then, after a while (I tested 1h period), when connection are not used, they timeout. When user tries to authenticate, at frst, plugin in elasticsearch tires to use timeouted connection, and get exception. Then after a white it tries to reconnect, but in the meantime plugin in kibana gets info about forbiden access.
I’ve tried to fiddle with avaliable timeout and cache parameters in readonlyrest config to remediate this behaviour, but without success.

I’ve include k8s yaml file with testing deployment and log files of all pods (elasticsearch, kibana and openldap)

Expected behaviour

Elasticsearch plugin should try to reconnect to auth backend before returning “access denied” info to client (kibana plugin).

Technical details

ROR Version: 1.67.3

Elasticsearch Version: 7.17.1

Logs and config files

Screenshots

{“customer_id”: “9fdfc5d6-ebc4-4311-a12b-4b0f1e9d130e”, “subscription_id”: “e031e519-f2e9-4a01-b815-5d15b49d0665”}

Hi @hrr,

Thanks for reporting this.

In ROR, we use an LDAP connection pool, which is why you see long-lived connections on the LDAP server side.

We’ve improved the health checking of the pool’s connection, so you should not experience the issue anymore. Please, test this pre-build:

ROR 1.68.0-pre11 for 7.17.1

and let us know if the problem is gone.

Unfortunately, problem still persist. Logs from elasticsearch:

[2025-12-18T12:59:59,517][ERROR][t.b.r.a.b.Block          ] [elastic-test-1] [1d450c0c-afb4-4455-926d-8ea507636244-1446298784#159411] ldap_group: kibana: ldap_auth rule matching got an error LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered
a client-side timeout after waiting 10001 milliseconds for a response to arrive.'
tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.LdapUnexpectedResult: LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 10001 milliseconds for a response to arrive.'
        at tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.LdapUnexpectedResult$.apply(UnboundidLdapUsersService.scala:112) ~[?:?]
        at tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.groupsFrom$$anonfun$2(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:86) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:44) ~[?:?]
        at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:45) ~[?:?]
        at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.groupsFrom(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:75) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:44) ~[?:?]
        at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:45) ~[?:?]
        at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapUsersService.fetchLdapUser(UnboundidLdapUsersService.scala:66) ~[?:?]
        at runAsync @ tech.beshu.ror.es.IndexLevelActionFilter.handleRequest(IndexLevelActionFilter.scala:205) ~[?:?]
        at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.doFetchGroupsOf(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:63) ~[?:?]
        at runAsync @ tech.beshu.ror.es.IndexLevelActionFilter.handleRequest(IndexLevelActionFilter.scala:205) ~[?:?]
        at map @ tech.beshu.ror.utils.TaskOps$.andThen$extension(TaskOps.scala:30) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.rules.auth.base.BaseAuthorizationRule.authorizeLoggedUser(BaseAuthorizationRule.scala:101) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.rules.auth.base.BaseAuthorizationRule.authorizeLoggedUser(BaseAuthorizationRule.scala:102) ~[?:?]
        at map @ tech.beshu.ror.utils.TaskOps$.measure$extension$$anonfun$1$$anonfun$2(TaskOps.scala:60) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapAuthenticationService.ldapAuthenticate(UnboundidLdapAuthenticationService.scala:62) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapAuthenticationService.ldapAuthenticate(UnboundidLdapAuthenticationService.scala:62) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:44) ~[?:?]
[2025-12-18T13:00:09,683][ERROR][t.b.r.a.b.d.l.i.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering] [elastic-test-1] [1d450c0c-afb4-4455-926d-8ea507636244-1446298784#159411] LDAP getting user groups returned error: [code=85 (timeout), cause=result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 10012 milliseconds for a response to arrive.']
[2025-12-18T13:00:09,689][ERROR][t.b.r.a.b.Block          ] [elastic-test-1] [1d450c0c-afb4-4455-926d-8ea507636244-1446298784#159411] ldap_group: kibana-user: ldap_auth rule matching got an error LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 10012 milliseconds for a response to arrive.'
tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.LdapUnexpectedResult: LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 10012 milliseconds for a response to arrive.'
        at tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.LdapUnexpectedResult$.apply(UnboundidLdapUsersService.scala:112) ~[?:?]
        at tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.groupsFrom$$anonfun$2(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:86) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:44) ~[?:?]
        at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapConnectionPool.process(UnboundidLdapConnectionPool.scala:45) ~[?:?]
        at flatMap @ tech.beshu.ror.accesscontrol.blocks.definitions.ldap.implementations.UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.groupsFrom(UnboundidLdapDefaultGroupSearchAuthorizationServiceWithServerSideGroupsFiltering.scala:75) ~[?:?]
        at liftF @ tech.beshu.ror.accesscontrol.blocks.Block$Lifter.apply(Block.scala:200) ~[?:?]
        at mapBoth @ tech.beshu.ror.accesscontrol.blocks.Block.execute(Block.scala:67) ~[?:?]
        at runAsync @ tech.beshu.ror.es.IndexLevelActionFilter.handleRequest(IndexLevelActionFilter.scala:205) ~[?:?]
        at parMap2 @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:61) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.$anonfun$3(BaseHostsRule.scala:64) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.$anonfun$3(BaseHostsRule.scala:66) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.$anonfun$3(BaseHostsRule.scala:66) ~[?:?]
        at parMap2 @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:61) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:63) ~[?:?]
        at flatMap @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:67) ~[?:?]
        at map @ tech.beshu.ror.accesscontrol.blocks.rules.tranport.BaseHostsRule.ipMatchesAddress(BaseHostsRule.scala:68) ~[?:?]
        at runAsync @ tech.beshu.ror.es.IndexLevelActionFilter.handleRequest(IndexLevelActionFilter.scala:205) ~[?:?]
        at foldMap @ tech.beshu.ror.boot.ReadonlyRest.runStartingFailureProgram(ReadonlyRest.scala:107) ~[?:?]

First and second attempt to login in kibana failed, only the third one succeeded.

This is a different case.

Now, the request to LDAP failed due to a request timeout (default 10 seconds).
You should consider changing the request timeout (see docs) or, even better, adding a cache (see docs)

Is it really? This error occured with two different databases - AD from Azure and openldap served on-premise. If it is matter of timeout/cache values, the defaults doesn’t work with two most popular ldap solutions.

Changing request timeout changes only time, you have to wait for error to occur. For request_timeout_in_sec set to 20s I have the same behaviour:

[2025-12-22T06:14:26,754][ERROR][t.b.r.a.b.Block          ] [elastic-test-1] [012c0c8d-c3bb-4968-9465-00dfdf5d225e-978310386#4564088] ldap_group: kibana: ldap_auth rule matching got an error LDAP returned code: timeout [85], cause: result code='85 (timeout)' diagnostic message='The asynchronous operation encountered a client-side timeout after waiting 20001 milliseconds for a response to arrive.'

I have tried several values here - with the same result.

Cacheing responses does not help here also. When users actively use kibana, problem does not occurs - connections to ldap are kept alive by regular queries. Problem occurs only after long period of inactivity - usualy overnight.
After that time connections to ldap are closed on the database side, which triggers error by first user trying to use kibana in the morning. Keeping cache for that long, (beside not beeing efecitve for user queries) is not good from security perspecive.

I was referring to the “LDAP returned code: timeout [85]”. It means client timeout.
I understand that you are sure (because of the two separate LDAP servers) that the LDAP server was not so busy that it wasn’t able to handle the request within a given time.

Ok, I will try to reproduce it on my side. Will get back to you when I find something

Nevertheless, please confirm that after installing the sent pre-build you see different LDAP error.

I see in your logs that is was “LDAP error [81]”, now is “LDAP error [85]”. Is it correct?

I have one more pre-build to check.
I didn’t reproduce the problem yet, but I noticed one thing that could be improved (in the context of the k8s-based test env, you showed us).

Please, test it on your side:
ROR 1.68.0-pre13 for 7.17.1

Yes it is. But I can find in logs error 85 for elasticsearch running with readonlyrest version 1.65.1.

Version pre13 so far works good - I’ve managed to log in to kibana without errors after night. I’ll do more tests later this day, but looks promising :slight_smile:

ok, great. I’ve used a different health check to determine connection degradation. It seems that in environments with a proxy (like the k8s one), the previous health check may not be reliable.

More tests confirmed, that this solution works. At least for me :wink:. Thank you.

Great! It will be released with ROR 1.68.0 (we are going to do the release later this year)