Elasticsearch stuck in reboot loop

bhcopeland · January 11, 2019, 5:14pm

https://gist.githubusercontent.com/bhcopeland/7a5fa2b35fb28dbd314762e2eb500a77/raw/d593a0b2a58fd07a74244f8d617fe0e2448baf6d/gistfile1.txt

I am running letsencrypt cert. The commands I used to create a jks cert.

openssl pkcs12 -export -in certs/fullchain1.pem -inkey certs/privkey1.pem -out keystore.p12 -password pass:${KEY_PASS}

keytool -importkeystore -deststorepass ${KEYSTORE_PASS} -destkeypass ${KEY_PASS} -destkeystore keystore.jks -srckeystore pkcs.p12 -srcstoretype PKCS12 -srcstorepass ${KEYSTORE_PASS}

keytool -importkeystore -srckeystore keystore.jks -destkeystore keystore.jks -deststoretype pkcs12

My Dockerfile:

RUN yes | CONF_DIR=/etc/elasticsearch gosu elasticsearch bin/elasticsearch-plugin \
    install -b file:///tmp/readonlyrest-1.16.32_es6.5.1.zip && \
    rm -r /tmp/*.zip && \
    echo 'xpack.security.enabled: false \n\
http.type: ssl_netty4 \n\
discovery.zen.minimum_master_nodes: 1' >> /etc/elasticsearch/elasticsearch.yml

readonlyrest:
    ssl:
      enable: true
      keystore_file: "/etc/elasticsearch/keystore.jks"
      key_pass: "xx"
      keystore_pass: "xx"
      key_alias: "1"

sscarduzio · January 12, 2019, 9:27am

Hi @bhcopeland,

Can you try with a cURL command? like:

curl -k -vvv 'https://$ES_HOST:9200'

To see if the SSL connection checks out?

Normally, that error shows when you have some health check or clients trying to connect in http (as opposed to https).

bhcopeland · January 16, 2019, 12:18pm

SSL seems to check out okay. Seems odd!

* Rebuilt URL to: https://localhost:9200/
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9200 (#0)
* found 148 certificates in /etc/ssl/certs/ca-certificates.crt
* found 608 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification SKIPPED
*        server certificate status verification SKIPPED
*        common name: elk.xxxx.org (does not match 'localhost')
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=elk.xxxx.org
*        start date: Sat, 17 Nov 2018 09:24:35 GMT
*        expire date: Fri, 15 Feb 2019 09:24:35 GMT
*        issuer: C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3
*        compression: NULL
* ALPN, server did not agree to a protocol
> GET / HTTP/1.1
> Host: localhost:9200
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 493
< 
{
  "name" : "8IJkFOL",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "tZLRd7yVTLWM3IsLii9AyA",
  "version" : {
"number" : "6.5.1",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "8c58350",
"build_date" : "2018-11-16T02:22:42.182257Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}
* Connection #0 to host localhost left intact

sscarduzio · January 16, 2019, 2:09pm

did you use -k flag in curl to get this? Try remove it.

bhcopeland · January 16, 2019, 3:51pm

* Rebuilt URL to: https://elk.xxxx.org:9200/
*   Trying 2a01:4f8:173:1a1d::2...
* Connected to elk.xxxx.org (2a01:4f8:173:1a1d::2) port 9200 (#0)
* found 148 certificates in /etc/ssl/certs/ca-certificates.crt
* found 608 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification OK
*        server certificate status verification SKIPPED
*        common name: elk.xxxx.org (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=elk.xxxx.org
*        start date: Sat, 17 Nov 2018 09:24:35 GMT
*        expire date: Fri, 15 Feb 2019 09:24:35 GMT
*        issuer: C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3
*        compression: NULL
* ALPN, server did not agree to a protocol
> GET / HTTP/1.1
> Host: elk.xxxx.org:9200
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 493
< 
{
  "name" : "8IJkFOL",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "tZLRd7yVTLWM3IsLii9AyA",
  "version" : {
    "number" : "6.5.1",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "8c58350",
    "build_date" : "2018-11-16T02:22:42.182257Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}
* Connection #0 to host elk.xxxx.org left intact

sscarduzio · January 16, 2019, 9:27pm

What are the logs from ES when it goes in “reboot loop”? Also what is reboot loop?

bhcopeland · January 17, 2019, 3:48pm

Elasticsearch keeps restarting.

I have made progress but still can’t get Elasticsearch to start.

[2019-01-17T15:42:30,155][INFO ][t.b.r.e.IndexLevelActionFilter] [QTyJmP3] Settings observer refreshing...
[2019-01-17T15:42:30,157][INFO ][t.b.r.e.IndexLevelActionFilter] [QTyJmP3] Configuration reloaded - ReadonlyREST disabled
[2019-01-17T15:42:30,157][INFO ][t.b.r.e.IndexLevelActionFilter] [QTyJmP3] Readonly REST plugin was loaded...
[2019-01-17T15:42:30,426][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] creating SSL transport
[2019-01-17T15:42:30,428][INFO ][o.e.d.DiscoveryModule    ] [QTyJmP3] using discovery type [zen] and host providers [settings]
[2019-01-17T15:42:30,886][INFO ][o.e.n.Node               ] [QTyJmP3] initialized
[2019-01-17T15:42:30,886][INFO ][o.e.n.Node               ] [QTyJmP3] starting ...
[2019-01-17T15:42:31,008][INFO ][o.e.t.TransportService   ] [QTyJmP3] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2019-01-17T15:42:34,051][INFO ][o.e.c.s.MasterService    ] [QTyJmP3] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {QTyJmP3}{QTyJmP3oSpCJS89HjhLbKg}{lq_nZWBQQW-xeUWaJlJ6AA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=67381219328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
[2019-01-17T15:42:34,054][INFO ][o.e.c.s.ClusterApplierService] [QTyJmP3] new_master {QTyJmP3}{QTyJmP3oSpCJS89HjhLbKg}{lq_nZWBQQW-xeUWaJlJ6AA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=67381219328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {QTyJmP3}{QTyJmP3oSpCJS89HjhLbKg}{lq_nZWBQQW-xeUWaJlJ6AA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=67381219328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2019-01-17T15:42:34,066][INFO ][t.b.r.c.s.SettingsPoller ] [QTyJmP3] [CLUSTERWIDE SETTINGS] Cluster not ready...
[2019-01-17T15:42:34,115][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: attempting with JKS keystore..
[2019-01-17T15:42:34,264][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: ssl.key_alias not configured, took first alias in keystore: 1
[2019-01-17T15:42:34,326][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Discovered key from JKS
[2019-01-17T15:42:34,329][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Discovered cert chain from JKS
[2019-01-17T15:42:34,366][INFO ][o.e.l.LicenseService     ] [QTyJmP3] license [a2247f66-555d-48ea-b19f-70c8d278ea2f] mode [basic] - valid
[2019-01-17T15:42:34,389][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Using SSL provider: JDK
[2019-01-17T15:42:34,392][INFO ][o.e.g.GatewayService     ] [QTyJmP3] recovered [0] indices into cluster_state
[2019-01-17T15:42:34,522][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Available ciphers: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA
[2019-01-17T15:42:34,523][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Available SSL protocols: TLSv1.2,TLSv1.1,TLSv1
[2019-01-17T15:42:34,664][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2019-01-17T15:42:34,664][INFO ][o.e.n.Node               ] [QTyJmP3] started
[2019-01-17T15:42:35,070][INFO ][t.b.r.e.SettingsObservableImpl] [QTyJmP3] [CLUSTERWIDE SETTINGS] index settings not found. Will keep on using the local YAML file. Learn more about clusterwide settings at https://readonlyrest.com/pro.html 


readonlyrest:
  ssl:
    keystore_file: "keystore.jks"
    key_pass: "xxxx"
    keystore_pass: "xxx"

  access_control_rules:

    - name: Accept all requests from localhost
      hosts: [127.0.0.1]

I am saving keystore.jks in /usr/share/elasticsearch/config/ (using the offical docker elasticsearch image).

sscarduzio · January 17, 2019, 6:52pm

The cert is discovered and used. I think it’s a docker networks problem.

bhcopeland · January 17, 2019, 7:46pm

How do you mean? What should I look into changing?

My dockerfile

ADD readonlyrest.yml /usr/share/elasticsearch/config/
ADD readonlyrest-1.16.32_es6.5.1.zip /tmp
ADD keystore.jks /usr/share/elasticsearch/config/
ADD elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml

RUN /usr/share/elasticsearch/bin/elasticsearch-plugin \
    install -b file:///tmp/readonlyrest-1.16.32_es6.5.1.zip && \
    rm -r /tmp/*.zip

sscarduzio · January 17, 2019, 8:17pm

I mean that when you launch two docker containers with an exposed port, you can connect to them from the host computer, but generally they can’t connect to each other (on my mac at least). So I had to use docker-compose.

bhcopeland · January 17, 2019, 10:39pm

I am exposing port 9200. When I remove readonlyrest I am able to connect to the cluster okay so exposed port is correct.

What does this mean?

[2019-01-17T15:42:35,070][INFO ][t.b.r.e.SettingsObservableImpl] [QTyJmP3] [CLUSTERWIDE SETTINGS] index settings not found. Will keep on using the local YAML file. Learn more about clusterwide settings at Pro - ReadonlyREST

sscarduzio · January 19, 2019, 12:53pm

OK then I was wrong, it’s not a docker network issue.

This means that ROR is reading the settings from the readonlyrest.yml file, rather than from the .readonlyrest index. The latter happens when you have our PRO/Enterprise Kibana plugin installed and you use the GUI to save the settings.

sscarduzio · January 19, 2019, 12:55pm

This I don’t understand: what do you mean ES can’t start? If it goes all the way to binding to port 9300 and 9200, and you even have the time to test curl, I guess there’s no startup problem. It stays up, right?

bhcopeland · January 22, 2019, 8:19pm

Yup it stays up but I just cannot access elasticsearch.

What am I missing? Do you need me to show you anything?

sscarduzio · January 23, 2019, 1:13pm

So I had a look at your logs in the gist. The instance is receiving requests, but they are in HTTP, not HTTPS.

[2019-01-11T16:34:43,806][INFO ][t.b.r.c.s.SettingsPoller ] [8IJkFOL] [CLUSTERWIDE SETTINGS] Cluster not ready...
[2019-01-11T16:34:44,649][WARN ][t.b.r.e.SSLTransportNetty4] [8IJkFOL] io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f20485454502f312e310d0a486f73743a206c6f63616c686f73743a393230300d0a557365722d4167656e743a206375726c2f372e34372e300d0a4163636570743a202a2f2a0d0a0d0a
[2019-01-11T16:34:44,659][WARN ][t.b.r.e.SSLTransportNetty4] [8IJkFOL] io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record:

Make sure that whatever is trying to connect to it, it’s trying to do so in https.

What is it? Kibana? Have you changed in kibana.yml this:

elasticsearch.url: "http://<es_host>:9200"

into this?

elasticsearch.url: "https://<es_host>:9200"