Good day. We have been using Kibana PRO plug-in on a set of 3 servers side-by-side, with no load balancer (either session-sticky or otherwise) between the user’s browser and Kibana service, just a plain 3-answer “A” record in DNS and relying on the excellent retry / failover behavior of the typical Chrome browser and underlying TCP sockets. This has been working beautifully for months, because the Kibana plug-in has the “readonlyrest_kbn.cookiePass” parameter, so each of the three servers can construct a cryptographically-protected session cookie that will be instantly recognized as valid by the other servers, with no coordination necessary between the Kibana instances as long as they have the same cookiePass value to hash.
This is great, but it seems to have stopped working in some recent version. Our production cluster has version 1.18.0 for ES 6.7.1, and it’s still working fine. But a separate “advance testing” cluster has the latest 1.18.7 for ES 7.3.2, and the common cookie seems to be broken. My memory suspects that it broke maybe in 1.18.5 or .6, but the problem only happened to me sporadically and so I wasn’t paying enough attention to diagnose it with precision until now.
I can demonstrate on my machine that if I restrict myself to connecting to only one Kibana host, the problem doesn’t occur; I can log in, get a cookie, and then execute Dev Tools queries over and over without issue. But if my browser is allowed to talk to two or more hosts under the same DNS name, then it works as long as it doesn’t happen to hit more than one host, but as soon as it does, it gets “403 FORBIDDEN” errors back, visible in the Kibana logs as well as to me in Dev Tools. If I happen to be using other Kibana functions (not Dev Tools, but Dashboard or Monitoring, etc.), then the Kibana client attempt at session_probe.txt will figure out that I don’t have a valid cookie and force me back to a green login screen.
This is all true regardless of how I restrict my multi-answer DNS name to only connecting to one host. I.e., I’m uniformly connecting to
http://qim-elastic666-kibana.qim.com:5601
and that name is a DNS CNAME to an A record with 3 answers.
dig qim-elastic666-kibana.qim.com
;; ANSWER SECTION:
qim-elastic666-kibana.qim.com. 3600 IN CNAME qim-elastic666-query.qim.com.
qim-elastic666-query.qim.com. 3600 IN A 192.168.48.237
qim-elastic666-query.qim.com. 3600 IN A 192.168.48.236
qim-elastic666-query.qim.com. 3600 IN A 192.168.48.235
So under normal circumstances, my Chrome browser will resolve the name to 3 addresses, then by its own random-but-helpful behavior, it could connect to any of the three with any combination of keepalive HTTP sockets at any time. If I leave it this way, and I keep running queries while tailing the Kibana logs on all three hosts, as soon as I see my browser send a query to more than one host, the invalid-cookie problem happens. But if I do nothing to my workstation, but simply stop the Kibana service on two of the three hosts, then Chrome is only ever able to connect to one host, and the problem doesn’t happen. Similarly, if I leave Kibana service technically running but install an “iptables” filter on two of the three hosts to drop incoming packets toward :5601, then Chrome pauses for a moment while trying to get to them, but eventually it settles on the only accessible host of the three, and again the problem never happens. The symptom is the same whether I set server.ssl.enabled to true or false, i.e., whether the browser-to-Kibana session is HTTPS or not; it was simply harder for me to troubleshoot when I couldn’t see the content in plaintext.
So it appears that I do have a workaround if I want to upgrade my clusters… I can just leave two of the three hosts with Kibana stopped, and if I notice a problem with the only one, I can stop that one and start another. So this is not a work-stopping problem for me, just seemingly a regression from previous desirable behavior. Let me know if there are any other tests I should perform.
Thanks!
– JeffS