Upgrade Elasticsearch 8.2 to 8.x leads to ssl problems

Hi everyone,

I tried to upgrade two different clusters containing 3 or 5 nodes. Both are running elasticsearch 8.2.0 and I tried upgrading to different versions 8.11.4, 8.5.3 and 8.4.3. But all attempts failed with the same result.

After upgrading the first node, ssl handshake between node and cluster fails. This is from the node’s log:

[2024-01-24T14:39:58,753][WARN ][o.e.t.TcpTransport       ] [elastic-server-tst2] exception caught on transport layer [Netty4TcpChannel{localAddress=/x.y.z.155:54904, remoteAddress=elastic-server-tst1.my.domain/x.y.z.154:9300, profile=default}], closing connection 
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Hostname or IP address is undefined.

This if from the log of one of the cluster’s nodes:

[2024-01-24T14:48:48,817][WARN ][o.e.t.TcpTransport       ] [elastic-server-tst1] exception caught on transport layer [Netty4TcpChannel{localAddress=/x.y.z.154:9300, remoteAddress=/x.y.z.155:52112, profile=default}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown

I had to change fqdns and ips.

OS is Ubuntu 20.04 and we use Readonlyrest-plugin.

Can someone explain to me, what is going on? Prior updates including the major-update 7.17.3 → 8.2.0 worked out-of-the-box.

/etc/elasticsearch/elasicsearch.yml:

action:
  destructive_requires_name: true
cluster:
  initial_master_nodes:
  - elastic-server-tst1.my.domain
  - elastic-server-tst2.my.domain
  - elastic-server-tst3.my.domain
  name: my-tst-cluster
discovery:
  seed_hosts:
  - elastic-server-tst1.my.domain
  - elastic-server-tst2.my.domain
  - elastic-server-tst3.my.domain
http:
  compression: true
  cors:
    allow-credentials: true
    allow-origin: "/.*/"
    enabled: true
  type: ssl_netty4
network:
  host: x.y.z.155
node:
  attr:
    dc: virtuell
path:
  repo:
  - "/path1"
  - "/path2"
  - "/path3"
path.data: "/elastic/elasticsearch-data"
path.logs: "/var/log/elasticsearch"
transport:
  type: ror_ssl_internode
xpack:
  security:
    enabled: false
    http:
      ssl:
        enabled: false
    transport:
      ssl:
        enabled: false

Hi, please show us ROR internode SSL settings

Thanks for your reply, coutoPL! This is the ssl_internode part of the readonlyrest.yaml

ssl_internode:
    enable: true
    truststore_file: "keystore.jks"
    truststore_pass: "somesecret"
    keystore_file: "keystore.jks"
    keystore_pass: "somesecret"
    key_pass: "somesecret"
    key_alias: clientcert
    certificate_verification: true

and my keystore.jks

root@elastic-server-tst2:/etc/elasticsearch# /usr/share/elasticsearch/jdk/bin/keytool -list -keystore keystore.jks
Keystore-Typ: JKS
Keystore-Provider: SUN     

Keystore enthält 2 Einträge

clientcert, 10.02.2023, PrivateKeyEntry, 
Zertifikat-Fingerprint (SHA-256): xxx
ca, 10.02.2023, trustedCertEntry,
Zertifikat-Fingerprint (SHA-256): xxx

These settings were working in ES 8.2 with the respective ROR-plugin.

could you please change it to:

certificate_verification: false

and tell me if it’s up and running after this change?

with certificate_verification: false the node can connect to the cluster. Although I was spammed by this kind of log message for several minutes, before the log finally went quiet:

[2024-01-26T18:22:56,365][ERROR][o.e.x.c.t.IndexTemplateRegistry] [elastic-server-tst2] error adding lifecycle policy [.fleet-file-tohost-data-ilm-policy] for [fleet]
org.elasticsearch.transport.RemoteTransportException: [elastic-server-tst3][10.0.2.156:9300][cluster:admin/ilm/put]
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: unchecked_i_o_exception: failed to read authentication with key [_xpack_security_authentication]

While following your suggestion I also played a little with the keystore. We have different passwords for the keystore and the key itself and I noticed that the bundled keytool won’t allow changing the private key’s password. I created a new keystore where keystore and private key have the same password but it still behaves the same way.

I’d be very glad if you come up with another idea!

The next idea, out of the box, will be using the xpack SSL instead of ROR SSL and settings xpack.security.enabled: true in elasticsearch.yml. You can try it if you wish

Thanks Mateusz, here is what I tried:
on all three nodes I

  • disabled readonlyrest for internode_communication. readonlyrest.yaml
  ssl_internode:
    enable: false
  • use elasticsearch’s default values for http.type and tranport.type, enable xpack. elasticsearch.yaml
xpack:
  security:
    enabled: true
    http:
      ssl:
        enabled: false
    transport:
      ssl:
        enabled: true
        keystore:
          path: "/etc/elasticsearch/keystore.jks"
        truststore:
          path: "/etc/elasticsearch/keystore.jks"
  • use the keystores where password of keystore and key itself are the same
  • add password for keystore and truststore to elasticsearch-keystore
# /usr/share/elasticsearch/bin/elasticsearch-keystore list
warning: ignoring JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64/; using bundled JDK
keystore.seed
xpack.security.transport.ssl.keystore.secure_password
xpack.security.transport.ssl.truststore.secure_password

After restart of all nodes I see from the logs that a cluster of three nodes is formed. Also the cluster-status is green. But I cannot at the moment use kibana and or the curl requests I am used to to communicate with the cluster. I guess there would be some more config changes required to shift away from readonlyrest. But obviously that is not what I want. I am not sure what you expected to see from this test. Although I think it showed that readonlyrest was changed in some way in the last versions and I have to adjust my config. More help appreciated. Thanks in advance!

Hello,

I have the same problem as Ljapunov. I noticed, that problem occure on specific Readonlyrest Version. I have a cluster of elasticsearch nodes in version 8.9.2. When I use ROR plugin of version 1.51.0 all works ok, but when I try to upgrede just a ROR plugin to version 1.52.0 or above with ES keeping on version 8.9.2 , the problem described by Ljapunov apear.

Hope this info helps.

2 Likes

Thank you Peter for your input. I can confirm that ES 8.9.2 and ROR-Plugin 1.51.0 works like a charm. Updating one node and reconnecting that node to the cluster worked without any config changes!

I think I know where the regression could be introduced. We have fixed CVE-2023-4586 in ROR 1.52.0. It seems that the problem is related to this. We will fix it obviously.

@Ljapunov according to xpack SSL configuration, I will advise to check the Elastic official docs.

2 Likes

The fix will be released with 1.56.0 (probably this weekend).
If someone needs pre-build, let me know.

1 Like

Thank you very much for your support, Mateusz! I just installed the ROR-1.56.0 prebuild with ES 8.12.2 and it works as expected.

1 Like

ROR 1.56.0 with this fix is released