Error updating stack elastic + RoR

erms77 · August 20, 2020, 9:02am

During our update from elastic 7.6.1 (oss) to elastic 7.8.0 (oss) we encounter this error, on elastcisearch logs :

[2020-08-20T07:00:12.095] ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-22] [spicedpassion] Cannot get source of document [.readonlyrest ID=1]
java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.ClusterState.nodes()" because "clusterState" is null
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.<init>(TransportSingleShardAction.java:151) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.<init>(TransportSingleShardAction.java:136) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:103) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:62) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179) ~[elasticsearch-7.8.0.jar:7.8.0]
        at tech.beshu.ror.es.IndexLevelActionFilter.$anonfun$apply$1(IndexLevelActionFilter.scala:95) ~[?:?]
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[?:?]
        at tech.beshu.ror.utils.AccessControllerHelper$$anon$1.run(AccessControllerHelper.scala:25) ~[?:?]
        at java.security.AccessController.doPrivileged(AccessController.java:312) ~[?:?]
        at tech.beshu.ror.utils.AccessControllerHelper$.doPrivileged(AccessControllerHelper.scala:24) ~[?:?]
        at tech.beshu.ror.es.IndexLevelActionFilter.apply(IndexLevelActionFilter.scala:93) ~[?:?]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:155) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:83) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:399) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:388) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.support.AbstractClient.get(AbstractClient.java:492) ~[elasticsearch-7.8.0.jar:7.8.0]
        at tech.beshu.ror.es.services.EsIndexJsonContentService.$anonfun$sourceOf$1(EsIndexJsonContentService.scala:55) ~[?:?]
        at monix.eval.internal.TaskRunLoop$.startFull(TaskRunLoop.scala:81) ~[?:?]
        at monix.eval.internal.TaskRunLoop$.$anonfun$restartAsync$1(TaskRunLoop.scala:222) ~[?:?]
        at monix.execution.internal.InterceptRunnable.run(InterceptRunnable.scala:27) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) ~[?:?]

coutoPL · August 20, 2020, 1:41pm

this is sth which can be improved, but I bet that ROR starts without issues, right?

erms77 · August 20, 2020, 2:13pm

Yes RoR starts up and seems to run normally.

coutoPL · August 21, 2020, 1:24pm

So there is nothing to worry about. The first try failed, but next was success. ES was not ready yet. As I said, in future we will improve it and the error won’t show up.

Thanks for report

erms77 · August 24, 2020, 8:56am

We have a lot of other errors of this kind during the elasticsearch cluster runtime.

The first try failed, but next was success.

What do you mean by next was success?
Is there need a cluster reboot?

Error Log Below

2020-08-24T07:10:30.366] ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-60] [cavarzere] Cann                                            ot get source of document [.readonlyrest ID=1]
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/read/get[s]] would be [4236005                                            706/3.9gb], which is larger than the limit of [4080218931/3.7gb], real usage: [4236005552/3.9gb], new bytes reserved: [154/154b], usages                                             [request=0/0b, fielddata=0/0b, in_flight_requests=154/154b, accounting=15434632/14.7mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:347) ~[e                                            lasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~                                            [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundAggregator.checkBreaker(InboundAggregator.java:210) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundAggregator.finishAggregation(InboundAggregator.java:119) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:140) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:73) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) ~[?:?]

erms77 · August 24, 2020, 10:32am

I tried removing the .readonlyrest index and restarting the ES cluster but we still have these errors.

Can you explain to us why the class tech.beshu.ror.es.services.EsIndexJsonContentService generates all these errors?

This is very annoying for our production monitoring because this class generates tens of stack errors:

ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-22] [spicedpassion] Cannot get source of document [.readonlyrest ID=1]
java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.ClusterState.nodes()" because "clusterState" is null

…

ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-108] [campofelice] Cannot get source of document [.readonlyrest ID=1]
org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.readonlyrest][_doc][1]: routing [null]]

…

ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-84] [campofelice] Cannot get source of document [.readonlyrest ID=1]
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/read/get[s]] would be [4165110002/3.8gb], which is larger than the limit of [4080218931/3.7gb], real usage: [4165109848/3.8gb], new bytes reserved: [154/154b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=154/154b, accounting=15419396/14.7mb]

coutoPL · August 24, 2020, 2:51pm

Do you use Kibana plugin? what version?

erms77 · August 25, 2020, 12:38pm

Yes we use Kibana plugin.

esVersion=7.8.0
pluginVersion=1.20.0

coutoPL · August 25, 2020, 3:15pm

Kibana free, pro or enterprise?

erms77 · August 27, 2020, 9:26am

We have a licence and use Kibana Enterprise

coutoPL · August 27, 2020, 1:54pm

this is interesting:
Data too large, data for [indices:data/read/get[s]]

How long is your configuration?

sscarduzio · September 16, 2020, 2:49pm

@erms77 was the ROR settings YAML extraordinarily large?
Did you resolve this issue?

erms77 · September 16, 2020, 3:23pm

For now we have rolled back to es7.6.1 and ror1.19.4, where the culster is running properly.

Along with the ror errors, we also learned of a lucene memory leak affecting elasticsearch versions 7.8.0 and 7.9.0 …

In the coming days I will be testing es7.6.1 with ror1.22.1 because I need to use the kibana api to save all the kibana assets of the different tenants.

I will see at this time if the above errors still appear.

sscarduzio · September 16, 2020, 3:38pm

OK, makes sense! Thanks for the update @erms77