Elasticsearch stuck in reboot loop


(Benjamin Copeland) #1

https://gist.githubusercontent.com/bhcopeland/7a5fa2b35fb28dbd314762e2eb500a77/raw/d593a0b2a58fd07a74244f8d617fe0e2448baf6d/gistfile1.txt

I am running letsencrypt cert. The commands I used to create a jks cert.

openssl pkcs12 -export -in certs/fullchain1.pem -inkey certs/privkey1.pem -out keystore.p12 -password pass:${KEY_PASS}

keytool -importkeystore -deststorepass ${KEYSTORE_PASS} -destkeypass ${KEY_PASS} -destkeystore keystore.jks -srckeystore pkcs.p12 -srcstoretype PKCS12 -srcstorepass ${KEYSTORE_PASS}

keytool -importkeystore -srckeystore keystore.jks -destkeystore keystore.jks -deststoretype pkcs12

My Dockerfile:

RUN yes | CONF_DIR=/etc/elasticsearch gosu elasticsearch bin/elasticsearch-plugin \
    install -b file:///tmp/readonlyrest-1.16.32_es6.5.1.zip && \
    rm -r /tmp/*.zip && \
    echo 'xpack.security.enabled: false \n\
http.type: ssl_netty4 \n\
discovery.zen.minimum_master_nodes: 1' >> /etc/elasticsearch/elasticsearch.yml

readonlyrest:
    ssl:
      enable: true
      keystore_file: "/etc/elasticsearch/keystore.jks"
      key_pass: "xx"
      keystore_pass: "xx"
      key_alias: "1"

(Simone Scarduzio) #2

Hi @bhcopeland,

Can you try with a cURL command? like:

curl -k -vvv 'https://$ES_HOST:9200'

To see if the SSL connection checks out?

Normally, that error shows when you have some health check or clients trying to connect in http (as opposed to https).


(Benjamin Copeland) #3

SSL seems to check out okay. Seems odd!

* Rebuilt URL to: https://localhost:9200/
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9200 (#0)
* found 148 certificates in /etc/ssl/certs/ca-certificates.crt
* found 608 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification SKIPPED
*        server certificate status verification SKIPPED
*        common name: elk.xxxx.org (does not match 'localhost')
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=elk.xxxx.org
*        start date: Sat, 17 Nov 2018 09:24:35 GMT
*        expire date: Fri, 15 Feb 2019 09:24:35 GMT
*        issuer: C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3
*        compression: NULL
* ALPN, server did not agree to a protocol
> GET / HTTP/1.1
> Host: localhost:9200
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 493
< 
{
  "name" : "8IJkFOL",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "tZLRd7yVTLWM3IsLii9AyA",
  "version" : {
"number" : "6.5.1",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "8c58350",
"build_date" : "2018-11-16T02:22:42.182257Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}
* Connection #0 to host localhost left intact

(Simone Scarduzio) #4

did you use -k flag in curl to get this? Try remove it.


(Benjamin Copeland) #5
* Rebuilt URL to: https://elk.xxxx.org:9200/
*   Trying 2a01:4f8:173:1a1d::2...
* Connected to elk.xxxx.org (2a01:4f8:173:1a1d::2) port 9200 (#0)
* found 148 certificates in /etc/ssl/certs/ca-certificates.crt
* found 608 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_128_GCM_SHA256
*        server certificate verification OK
*        server certificate status verification SKIPPED
*        common name: elk.xxxx.org (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=elk.xxxx.org
*        start date: Sat, 17 Nov 2018 09:24:35 GMT
*        expire date: Fri, 15 Feb 2019 09:24:35 GMT
*        issuer: C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3
*        compression: NULL
* ALPN, server did not agree to a protocol
> GET / HTTP/1.1
> Host: elk.xxxx.org:9200
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-type: application/json; charset=UTF-8
< content-length: 493
< 
{
  "name" : "8IJkFOL",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "tZLRd7yVTLWM3IsLii9AyA",
  "version" : {
    "number" : "6.5.1",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "8c58350",
    "build_date" : "2018-11-16T02:22:42.182257Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}
* Connection #0 to host elk.xxxx.org left intact

(Simone Scarduzio) #6

What are the logs from ES when it goes in “reboot loop”? Also what is reboot loop?


(Benjamin Copeland) #7

Elasticsearch keeps restarting.

I have made progress but still can’t get Elasticsearch to start.

[2019-01-17T15:42:30,155][INFO ][t.b.r.e.IndexLevelActionFilter] [QTyJmP3] Settings observer refreshing...
[2019-01-17T15:42:30,157][INFO ][t.b.r.e.IndexLevelActionFilter] [QTyJmP3] Configuration reloaded - ReadonlyREST disabled
[2019-01-17T15:42:30,157][INFO ][t.b.r.e.IndexLevelActionFilter] [QTyJmP3] Readonly REST plugin was loaded...
[2019-01-17T15:42:30,426][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] creating SSL transport
[2019-01-17T15:42:30,428][INFO ][o.e.d.DiscoveryModule    ] [QTyJmP3] using discovery type [zen] and host providers [settings]
[2019-01-17T15:42:30,886][INFO ][o.e.n.Node               ] [QTyJmP3] initialized
[2019-01-17T15:42:30,886][INFO ][o.e.n.Node               ] [QTyJmP3] starting ...
[2019-01-17T15:42:31,008][INFO ][o.e.t.TransportService   ] [QTyJmP3] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2019-01-17T15:42:34,051][INFO ][o.e.c.s.MasterService    ] [QTyJmP3] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {QTyJmP3}{QTyJmP3oSpCJS89HjhLbKg}{lq_nZWBQQW-xeUWaJlJ6AA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=67381219328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
[2019-01-17T15:42:34,054][INFO ][o.e.c.s.ClusterApplierService] [QTyJmP3] new_master {QTyJmP3}{QTyJmP3oSpCJS89HjhLbKg}{lq_nZWBQQW-xeUWaJlJ6AA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=67381219328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {QTyJmP3}{QTyJmP3oSpCJS89HjhLbKg}{lq_nZWBQQW-xeUWaJlJ6AA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=67381219328, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2019-01-17T15:42:34,066][INFO ][t.b.r.c.s.SettingsPoller ] [QTyJmP3] [CLUSTERWIDE SETTINGS] Cluster not ready...
[2019-01-17T15:42:34,115][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: attempting with JKS keystore..
[2019-01-17T15:42:34,264][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: ssl.key_alias not configured, took first alias in keystore: 1
[2019-01-17T15:42:34,326][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Discovered key from JKS
[2019-01-17T15:42:34,329][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Discovered cert chain from JKS
[2019-01-17T15:42:34,366][INFO ][o.e.l.LicenseService     ] [QTyJmP3] license [a2247f66-555d-48ea-b19f-70c8d278ea2f] mode [basic] - valid
[2019-01-17T15:42:34,389][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Using SSL provider: JDK
[2019-01-17T15:42:34,392][INFO ][o.e.g.GatewayService     ] [QTyJmP3] recovered [0] indices into cluster_state
[2019-01-17T15:42:34,522][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Available ciphers: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA
[2019-01-17T15:42:34,523][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] ROR SSL: Available SSL protocols: TLSv1.2,TLSv1.1,TLSv1
[2019-01-17T15:42:34,664][INFO ][t.b.r.e.SSLTransportNetty4] [QTyJmP3] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2019-01-17T15:42:34,664][INFO ][o.e.n.Node               ] [QTyJmP3] started
[2019-01-17T15:42:35,070][INFO ][t.b.r.e.SettingsObservableImpl] [QTyJmP3] [CLUSTERWIDE SETTINGS] index settings not found. Will keep on using the local YAML file. Learn more about clusterwide settings at https://readonlyrest.com/pro.html 


readonlyrest:
  ssl:
    keystore_file: "keystore.jks"
    key_pass: "xxxx"
    keystore_pass: "xxx"

  access_control_rules:

    - name: Accept all requests from localhost
      hosts: [127.0.0.1]

I am saving keystore.jks in /usr/share/elasticsearch/config/ (using the offical docker elasticsearch image).


(Simone Scarduzio) #8

The cert is discovered and used. I think it’s a docker networks problem.


(Benjamin Copeland) #9

How do you mean? What should I look into changing?

My dockerfile

ADD readonlyrest.yml /usr/share/elasticsearch/config/
ADD readonlyrest-1.16.32_es6.5.1.zip /tmp
ADD keystore.jks /usr/share/elasticsearch/config/
ADD elasticsearch.yml /usr/share/elasticsearch/config/elasticsearch.yml

RUN /usr/share/elasticsearch/bin/elasticsearch-plugin \
    install -b file:///tmp/readonlyrest-1.16.32_es6.5.1.zip && \
    rm -r /tmp/*.zip

(Simone Scarduzio) #10

I mean that when you launch two docker containers with an exposed port, you can connect to them from the host computer, but generally they can’t connect to each other (on my mac at least). So I had to use docker-compose.


(Benjamin Copeland) #11

I am exposing port 9200. When I remove readonlyrest I am able to connect to the cluster okay so exposed port is correct.

What does this mean?

[2019-01-17T15:42:35,070][INFO ][t.b.r.e.SettingsObservableImpl] [QTyJmP3] [CLUSTERWIDE SETTINGS] index settings not found. Will keep on using the local YAML file. Learn more about clusterwide settings at https://readonlyrest.com/pro.html


(Simone Scarduzio) #12

OK then I was wrong, it’s not a docker network issue.

This means that ROR is reading the settings from the readonlyrest.yml file, rather than from the .readonlyrest index. The latter happens when you have our PRO/Enterprise Kibana plugin installed and you use the GUI to save the settings.


(Simone Scarduzio) #13

This I don’t understand: what do you mean ES can’t start? If it goes all the way to binding to port 9300 and 9200, and you even have the time to test curl, I guess there’s no startup problem. It stays up, right?


(Benjamin Copeland) #14

Yup it stays up but I just cannot access elasticsearch.

What am I missing? Do you need me to show you anything?


(Simone Scarduzio) #15

So I had a look at your logs in the gist. The instance is receiving requests, but they are in HTTP, not HTTPS.

[2019-01-11T16:34:43,806][INFO ][t.b.r.c.s.SettingsPoller ] [8IJkFOL] [CLUSTERWIDE SETTINGS] Cluster not ready...
[2019-01-11T16:34:44,649][WARN ][t.b.r.e.SSLTransportNetty4] [8IJkFOL] io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f20485454502f312e310d0a486f73743a206c6f63616c686f73743a393230300d0a557365722d4167656e743a206375726c2f372e34372e300d0a4163636570743a202a2f2a0d0a0d0a
[2019-01-11T16:34:44,659][WARN ][t.b.r.e.SSLTransportNetty4] [8IJkFOL] io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 

Make sure that whatever is trying to connect to it, it’s trying to do so in https.

What is it? Kibana? Have you changed in kibana.yml this:

elasticsearch.url: "http://<es_host>:9200"

into this?

elasticsearch.url: "https://<es_host>:9200"