Performance degradation after upgrading

schwicke · September 12, 2017, 4:01pm

Hi, all,

we’ve upgraded ROR today from 1.14.0 to 1.16.10, and we see a significant degradation of performance. The load of the node increased significantly since the upgrade, and apparently was no longer able to cope with the incoming bulk requests. After downgrading again things go back to normal it seems.
Does anybody else see such a behavior ? Any idea how to debug this ?

Thanks in advance!

ld57 · September 13, 2017, 9:50am

Hi,

interesting feedback.

did you just upgraded RoR, or something else in your stack ?
did you change something in your RoR config section?

also, may it be related to audit and logging activity (there is a huge difference between 1.14.x and 1.16.x

you may try to set log as info or basic only (refer to doc, I think it chaged a bit) and give a try.
LD

sscarduzio · September 13, 2017, 10:13am

Hello @schwicke!

If you have huge bulk requests, there might be one feature killing your performance. I checked the old 1.14.0 sources and a thing we didn’t have back then is the sub-request introspection.

The difference between the two versions is that in the newer one the indices rule goes through all the sub-requests and evicts the ones that are targeting non-permitted indices.
In the old version instead, it would just check the main request indices list and reason about that alone (which is a summary of all the indices involved in all the sub-requests).

I can prepare a build with a setting that disables this feature so we can confirm or not if this is the reason.

schwicke · September 13, 2017, 10:22am

Hi, Simone,
worth a try indeed. What you say sound quite reasonable. That would be great! Thanks in advance!

schwicke · September 13, 2017, 10:23am

Thanks for the feedback LD. The logging level is already very basic, and we are not using auditing on this cluster. There were no other changed, and we can see the effect nicely by up or downgrading ROR.

ld57 · September 13, 2017, 12:06pm

Well, I think Simone put the finger on the stuff.
@sscarduzio , your idea about a new option to be able to enable/disable “nested indices” authorization check is a good idea, but do you think it would be possible to do something more than an “all or nothing” way ?

I mean a more granular option, based on indice mask ?

 SubRequest_authorization_check :
    #enabled by default for all request
    enable : false    
    indices_mask : ["logstash_sec*", "logstash_hosting*"]   # would not check subrequest for these matching indices
    # could be "*" for all indices

It may cost some work, because a mask could match multiple indices ( example “log*” )

sscarduzio · September 13, 2017, 12:27pm

yep this sounds reasonable, but first let’s see if the “nothing” option brings us back to normality. I sent you a pre build for 5.2.1 to test this. Let me know how it goes.

schwicke · September 13, 2017, 12:37pm

Working on it. Struggling with the expected location of config/readonlyrest.yml
More news asap.

schwicke · September 13, 2017, 2:52pm

I think I got it working (thanks to Simone!) after all. We need to get some stats now to conclude.

ld57 · September 14, 2017, 7:09am

@schwicke , you mean , it works with the special release you got from Simone, or an optimisation you made on your side with the 1.16.10 ?

Fred

schwicke · September 14, 2017, 7:23am

I managed to deploy the test version which I got from Simone. There was an issue reading the configuration file which prevented ES to start up. We’re iterating on it. More news asap.

schwicke · September 14, 2017, 7:41am

Status update: With the latest version (with subrequest checking disabled) we still see a load which is about 2-4x higher than what we see for 1.14.0. With plain 1.1.6.10 it was about a factor 8-10. Looks better though a factor >2 in degradation is still quite high. I’ll try to get some more statistics.

schwicke · September 20, 2017, 2:18pm

Status update: the latest pre8 release with subrequest checking disabled looks much better. Good job, thanks a lot, Simone!

sscarduzio · September 20, 2017, 7:49pm

Great to hear! There’s been more or less half a dozen optimisations between 1.16.10 and pre8. But pre9 adds another important optimisation (plus a better implementation of the settings in readonlyrest.yml file). There’s just one more optimisation that I need to benchmark, and then I release 116.11.

sscarduzio · September 21, 2017, 10:59am

So guys, we have a new wildcard matcher that does not use regex, its’ just a tad faster than the previous one…

old: 6207 ms
new: 665 ms
Gain: 89.28629%

Over 10e7 iterations.

This is massive because it’s called N times for each request. With N=#of strings in indices/actions rule.

schwicke · September 22, 2017, 8:25am

Very nice! I gave it a try, it indeed helps. We found that subrequest checking is still rather expensive though.

It would be nice if it was easier to switch it on or off by configuration rather than a Java option (unless that introduces again a performance penalty of cause). What do you think ?

sscarduzio · September 22, 2017, 10:55am

Hey @schwicke! Glad we got to the bottom of this

And thanks for all the telemetry you were able to circle back to me in the process. Would have been really hard to spot these bottlenecks otherwise.

sscarduzio · September 22, 2017, 10:59am

I agree the java option is really crude way to switch the feature off. But I want to take a step back: the sub-request scanning feature originated when I was trying to make the index_rewrite rule work: we needed sub-request granularity to change every index field in every sub-request.

Of course, that turned out being a total failure due to the wide and ever changing set of ActionRequest types within ES, but also - clearly - for performance.

As of today, index_rewrite rule has been removed from documentation, but sub-request checking is still on by default. And I would be in fact very keen to disable this feature.

Some background about sub-request checking

The subrequest checking is trying to address the scenario when you have a bulk request that contains a set or requests that apply to different indices.

Without the subrequest checking, the indices rule evaluates the bulk request as a whole. So the global set of indices is evaluated and if one is prohibited, the whole request is dropped.

With subrequest checking If the bulk request contains 10 subrequests, one of which is trying to index/update/delete the wrong index, only that subrequest is discarded and the other nine will remain. Now the bulk request can be allowed by ROR.

Anybody against disabling this by default?

DISABLED BY DEFAULT
LEAVE IT ON BY DEFAULT
DELETE FEATURE

0 voters

Show results

sscarduzio · September 25, 2017, 11:30am

I will remove this feature from the codebase in 1.16.12 if no major objection arises. For now feel free to disable this behaviour using this java option:

-Dcom.readonlyrest.subreq.introspection=false

ld57 · September 26, 2017, 4:31pm

mmh @sscarduzio,

maybe I m wrong, but , would it be related if :

-existing indices :
log_de_xxxx
log_en_xxx
log_lu_ei_xxx

Ror rule indices allowing : log_lu_ei*

and in kibana I use a pattern log*
since kibana will query ES , it will try to display both 3 indices
and since the logged user can see only log_lu_ei_xxx,
Today kibana react as displaying only results gained from log_lu_ei_xxx .

is it the activity of sub-request checking ?

I mean, if it is the functionality of subrequest checking, and if it is disabled or deleted, the user will get no data display in kibana ?

fred