I tried to upgrade two different clusters containing 3 or 5 nodes. Both are running elasticsearch 8.2.0 and I tried upgrading to different versions 8.11.4, 8.5.3 and 8.4.3. But all attempts failed with the same result.
After upgrading the first node, ssl handshake between node and cluster fails. This is from the node’s log:
[2024-01-24T14:39:58,753][WARN ][o.e.t.TcpTransport ] [elastic-server-tst2] exception caught on transport layer [Netty4TcpChannel{localAddress=/x.y.z.155:54904, remoteAddress=elastic-server-tst1.my.domain/x.y.z.154:9300, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Hostname or IP address is undefined.
This if from the log of one of the cluster’s nodes:
[2024-01-24T14:48:48,817][WARN ][o.e.t.TcpTransport ] [elastic-server-tst1] exception caught on transport layer [Netty4TcpChannel{localAddress=/x.y.z.154:9300, remoteAddress=/x.y.z.155:52112, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown
I had to change fqdns and ips.
OS is Ubuntu 20.04 and we use Readonlyrest-plugin.
Can someone explain to me, what is going on? Prior updates including the major-update 7.17.3 → 8.2.0 worked out-of-the-box.
with certificate_verification: false the node can connect to the cluster. Although I was spammed by this kind of log message for several minutes, before the log finally went quiet:
[2024-01-26T18:22:56,365][ERROR][o.e.x.c.t.IndexTemplateRegistry] [elastic-server-tst2] error adding lifecycle policy [.fleet-file-tohost-data-ilm-policy] for [fleet]
org.elasticsearch.transport.RemoteTransportException: [elastic-server-tst3][10.0.2.156:9300][cluster:admin/ilm/put]
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: unchecked_i_o_exception: failed to read authentication with key [_xpack_security_authentication]
While following your suggestion I also played a little with the keystore. We have different passwords for the keystore and the key itself and I noticed that the bundled keytool won’t allow changing the private key’s password. I created a new keystore where keystore and private key have the same password but it still behaves the same way.
I’d be very glad if you come up with another idea!
The next idea, out of the box, will be using the xpack SSL instead of ROR SSL and settings xpack.security.enabled: true in elasticsearch.yml. You can try it if you wish
use the keystores where password of keystore and key itself are the same
add password for keystore and truststore to elasticsearch-keystore
# /usr/share/elasticsearch/bin/elasticsearch-keystore list
warning: ignoring JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64/; using bundled JDK
keystore.seed
xpack.security.transport.ssl.keystore.secure_password
xpack.security.transport.ssl.truststore.secure_password
After restart of all nodes I see from the logs that a cluster of three nodes is formed. Also the cluster-status is green. But I cannot at the moment use kibana and or the curl requests I am used to to communicate with the cluster. I guess there would be some more config changes required to shift away from readonlyrest. But obviously that is not what I want. I am not sure what you expected to see from this test. Although I think it showed that readonlyrest was changed in some way in the last versions and I have to adjust my config. More help appreciated. Thanks in advance!
I have the same problem as Ljapunov. I noticed, that problem occure on specific Readonlyrest Version. I have a cluster of elasticsearch nodes in version 8.9.2. When I use ROR plugin of version 1.51.0 all works ok, but when I try to upgrede just a ROR plugin to version 1.52.0 or above with ES keeping on version 8.9.2 , the problem described by Ljapunov apear.
Thank you Peter for your input. I can confirm that ES 8.9.2 and ROR-Plugin 1.51.0 works like a charm. Updating one node and reconnecting that node to the cluster worked without any config changes!
I think I know where the regression could be introduced. We have fixed CVE-2023-4586 in ROR 1.52.0. It seems that the problem is related to this. We will fix it obviously.
@Ljapunov according to xpack SSL configuration, I will advise to check the Elastic official docs.