Error updating stack elastic + RoR

During our update from elastic 7.6.1 (oss) to elastic 7.8.0 (oss) we encounter this error, on elastcisearch logs :

[2020-08-20T07:00:12.095] ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-22] [spicedpassion] Cannot get source of document [.readonlyrest ID=1]
java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.ClusterState.nodes()" because "clusterState" is null
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.<init>(TransportSingleShardAction.java:151) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction$AsyncSingleAction.<init>(TransportSingleShardAction.java:136) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:103) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.single.shard.TransportSingleShardAction.doExecute(TransportSingleShardAction.java:62) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179) ~[elasticsearch-7.8.0.jar:7.8.0]
        at tech.beshu.ror.es.IndexLevelActionFilter.$anonfun$apply$1(IndexLevelActionFilter.scala:95) ~[?:?]
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) ~[?:?]
        at tech.beshu.ror.utils.AccessControllerHelper$$anon$1.run(AccessControllerHelper.scala:25) ~[?:?]
        at java.security.AccessController.doPrivileged(AccessController.java:312) ~[?:?]
        at tech.beshu.ror.utils.AccessControllerHelper$.doPrivileged(AccessControllerHelper.scala:24) ~[?:?]
        at tech.beshu.ror.es.IndexLevelActionFilter.apply(IndexLevelActionFilter.scala:93) ~[?:?]
        at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:155) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:83) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:83) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:72) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:399) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:388) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.client.support.AbstractClient.get(AbstractClient.java:492) ~[elasticsearch-7.8.0.jar:7.8.0]
        at tech.beshu.ror.es.services.EsIndexJsonContentService.$anonfun$sourceOf$1(EsIndexJsonContentService.scala:55) ~[?:?]
        at monix.eval.internal.TaskRunLoop$.startFull(TaskRunLoop.scala:81) ~[?:?]
        at monix.eval.internal.TaskRunLoop$.$anonfun$restartAsync$1(TaskRunLoop.scala:222) ~[?:?]
        at monix.execution.internal.InterceptRunnable.run(InterceptRunnable.scala:27) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) ~[?:?]

this is sth which can be improved, but I bet that ROR starts without issues, right?

1 Like

Yes RoR starts up and seems to run normally.

So there is nothing to worry about. The first try failed, but next was success. ES was not ready yet. As I said, in future we will improve it and the error won’t show up.

Thanks for report

We have a lot of other errors of this kind during the elasticsearch cluster runtime.

The first try failed, but next was success.

What do you mean by next was success?
Is there need a cluster reboot?

Error Log Below

2020-08-24T07:10:30.366] ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-60] [cavarzere] Cann                                            ot get source of document [.readonlyrest ID=1]
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/read/get[s]] would be [4236005                                            706/3.9gb], which is larger than the limit of [4080218931/3.7gb], real usage: [4236005552/3.9gb], new bytes reserved: [154/154b], usages                                             [request=0/0b, fielddata=0/0b, in_flight_requests=154/154b, accounting=15434632/14.7mb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:347) ~[e                                            lasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~                                            [elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundAggregator.checkBreaker(InboundAggregator.java:210) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundAggregator.finishAggregation(InboundAggregator.java:119) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:140) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) ~[elasticsearch-7.8.0.jar:7.8.0]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:73) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[?:?]
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[?:?]
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
        at java.lang.Thread.run(Thread.java:832) ~[?:?]

I tried removing the .readonlyrest index and restarting the ES cluster but we still have these errors.

Can you explain to us why the class tech.beshu.ror.es.services.EsIndexJsonContentService generates all these errors?

This is very annoying for our production monitoring because this class generates tens of stack errors:

ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-22] [spicedpassion] Cannot get source of document [.readonlyrest ID=1]
java.lang.NullPointerException: Cannot invoke "org.elasticsearch.cluster.ClusterState.nodes()" because "clusterState" is null

…

ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-108] [campofelice] Cannot get source of document [.readonlyrest ID=1]
org.elasticsearch.action.NoShardAvailableActionException: No shard available for [get [.readonlyrest][_doc][1]: routing [null]]

…

ERROR tech.beshu.ror.es.services.EsIndexJsonContentService [scala-execution-context-global-84] [campofelice] Cannot get source of document [.readonlyrest ID=1]
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/read/get[s]] would be [4165110002/3.8gb], which is larger than the limit of [4080218931/3.7gb], real usage: [4165109848/3.8gb], new bytes reserved: [154/154b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=154/154b, accounting=15419396/14.7mb]

Do you use Kibana plugin? what version?

Yes we use Kibana plugin.

esVersion=7.8.0
pluginVersion=1.20.0

Kibana free, pro or enterprise?

We have a licence and use Kibana Enterprise

1 Like

this is interesting:
Data too large, data for [indices:data/read/get[s]]

How long is your configuration?

@erms77 was the ROR settings YAML extraordinarily large?
Did you resolve this issue?

For now we have rolled back to es7.6.1 and ror1.19.4, where the culster is running properly.

Along with the ror errors, we also learned of a lucene memory leak affecting elasticsearch versions 7.8.0 and 7.9.0 …

In the coming days I will be testing es7.6.1 with ror1.22.1 because I need to use the kibana api to save all the kibana assets of the different tenants.

I will see at this time if the above errors still appear.

1 Like

OK, makes sense! Thanks for the update @erms77