Upgrading 6.7 w/ 1.18 to 7.14 w/ 1.33. LDAP from MS Active Directory no longer understands multiple AD Group memberships?

JeffSaxe · September 8, 2021, 6:57pm

Good day. We are a RoR PRO subscriber, and I have been a bit lazy in keeping our production cluster upgraded, so we’ve been “coasting” on ES/Kibana 6.7 with RoR 1.18 for a while, and I am just now upgrading a test / clone cluster to 7.14 with RoR 1.33. Everything is mostly working, but it seems something we used to do in our RoR settings is not being recognized, or there’s a bug with the AD Group Membership coming from our LDAP queries, so I need some help.

Note that I am an experienced network and infrastructure engineer and so I generally know how to write permission ACLs, including “permit/deny all of these cases except a few of those unusual exceptions”, but of course every ACL-matching engine’s logic is different, so I am willing to reorganize our ACL config to accomplish the same goal, if need be. I don’t think our use case is very complicated, and I am happy to copy/paste our config or logs with appropriate bits redacted. If it’s easier for you to just provide a framework template of “typically this is how that would be done”, and leave it to me to adapt to my specific group names and such, that’s fine, too.

All my end users are authenticated and authorized through Microsoft AD LDAP. Obviously we’d prefer not to write any rules directly inside RoR that reference specific human usernames; we’d prefer to make all changes by simply moving users in and out of AD groups, and this had traditionally worked great for years. Our LDAP query is constrained to an OU that contains only the few group names under it, to keep the LDAP replies from including a pile of superfluous group names.
We are already using the special, rather old, Microsoft-proprietary LDAP_MATCHING_RULE_IN_CHAIN recursive membership filter field (unique_member_attribute: “member:1.2.840.113556.1.4.1941:”), to be sure to cover cases where user A is a member of group X, and then group X is a member (nested in AD) of group Y, and group Y is the one in the RoR config. I believe this is still working perfectly well right now, because I swapped it out for plain non-chaining “member” and the symptom persisted.
Generally we have 3 groups of users, of which a user would be expected to be a member of only one: Super Users, ReadAllIndices Users, and DashboardOnly Users. The first group should be able to do anything in ES, including writing data into indices (best practice, I’ve excluded editing the RoR settings, reserving that for the special Kibana admin superuser). The second should basically be able to query any data out of any-named indices and set up some quick visualizations, but not write to any indices (and not write back to Kibana, so they can’t mess up existing charts and dashboards). The third should only be able to view dashboards created by others, so the other application icons are suppressed; they probably do technically have read permissions on the underlying indices, but if they are unaware of that and don’t use manual “curl” commands, and they stay safely within the cocooned Kibana environment, then they remain pretty constrained.
However, there is a set of more-sensitive traffic logs in one (wildcarded set) of indices, so we want to keep even the “super users” from seeing them. So I made one additional AD group, and I added two rules ahead of the others, a Permit and a Deny, specifically matching on indices = “confidentiallogs-*” and groups = this 4th AD group name. So if the query is for one of these sensitive logs, then the user has to be in the 4th group; e.g. if they’re only a member of Super Users but not of Sensitive Logs Users, then they hit the Deny. Up until 6.7 w/1.18, this all worked terrific!

So judging from the behavior and error messages in the ES logs, this setup has stopped working upon upgrading RoR to the latest.

If a user is a member of multiple AD groups, it seems to recognize only one of them. In particular, I am a member of both the Super Users and Sensitive Logs groups, but when I first tried to sign into Kibana after the upgrade, it accepted my username & password (so, was confirming my credentials via LDAP), but then it wouldn’t even paint the main Kibana screen at all. It seemed to think I was in Sensitive Logs but not in Super Users (and also not ReadAllIndices or Dashboard Users), so it didn’t have enough ES permissions to show even the welcome screen.
So I set aside the Sensitive Users access for a moment and just took myself out of that AD group, leaving me only in Super Users. Then I was fully able to log in and start looking around the new Kibana 7 to see which dashboards might need some modifications. Not being in Sensitive Users means that I am (correctly) forbidden from seeing the indices with the sensitive data, which is great. So the use case for most of our users (if you are not in Sensitive Users, you can do other things but can never see those logs) is working. But the use case for our sensitive-log users, those who are members of two AD groups, which is working today in our un-upgraded cluster, stopped working. It feels like a bug to me.

I will try to gather some more information and do some experiments (like, see if the alphabetical sort ordering of these two group names is affecting which one it thinks I’m in…???). But I wanted to see if you were already aware of a bug like this or if it’s a new thing I need to deal with after having not watched the releases of RoR for a while. Thanks in advance!

– Jeff Saxe

sscarduzio · September 9, 2021, 7:45am

Hello Jeff! Always nice to hear from you

I think I understand what is going on here. And I take the guilt of the semantic quirk that came to existence in ROR “group” notion.

When a user has multiple groups in ROR, it means that they have the freedom to impersonate any of those groups, but one group at a time. The x-ror-current-group request header can be set to modify the ACL behaviour as for declaring the user agent’s intention to impersonate a specific group between the available ones to the current user.

This behaviour is at the foundation of the multi-tenancy feature: as a user assigned to multiple tenancies (i.e. groups), I want to login to Kibana, and use it as a member of the first group I belong to. Later, I want to be able to use the tenancy switcher to impersonate the other groups I belong to. And so forth.

Multi tenancy is an Enterprise feature that is not present in PRO. That’s why I believe you are surprised the ACL doesn’t behave as the classic RBAC mode (privileges of a user = Σ privileges of their groups).

Maybe to work around it, I would try an “inverted-pyramidal” approach: try to assign to multiple groups the ACL blocks that allow actions that you expect more than one group to be allowed. I.e.

readonlyrest:
  access_control_rules:
  [...]
  - name: "basic read access to some indices"
    [...]
    groups: ["sensitive_data_viewers", "ror_admins", "all_indices_readers", "some_indices_readers"]

  - name: "read access to any index"
    [...]
    groups: ["sensitive_data_viewers", "ror_admins", "all_indices_readers"]

  - name: "administer ROR"
    [...]
    groups: ["sensitive_data_viewers", "ror_admins"]

  - name: "see top sensitive data"
    [...]
    groups: ["sensitive_data_viewer"]

WDYT?

JeffSaxe · September 9, 2021, 2:36pm

Ah, thanks, Simone. OK, I now see (in the Elasticsearch logs, comparing my old and my new/cloned clusters) this new x-ror-current-group header name that is now showing up. So it seems you are changing the meaning of “member of multiple groups” in a way fairly fundamental to the product, correct? I see this allows the end user to switch between group-based roles in the Enterprise product (it’s still the same “me”, but first I’m logging in as the Sales Group, and now I’m switching to the Marketing Group). But we don’t have Enterprise with multi-tenancy; we have PRO, and we liked the way multiple-group ACLs evaluated before. So I guess I will start trying to play with reformulating my ACL set per your suggestion, but let me ask first:

Is there a flag we can set to go back to the previous behavior? Exactly what you said, with no single (randomly chosen) “current group” that the user is member of, but the “classic RBAC, sum of privileges of their groups” behavior. Is this an option, or have you already sliced out that logic and replaced it with this Enterprise-derived logic?
You have some “[…]” in your template, but I don’t know what I should fill in there to accomplish my goals. In particular, I’m looking at the documentation, and I don’t see how to express “all index names except these few”. I can imagine an inverted-pyramid approach, so that all of the ACL clauses are “allow” clauses, with no longer any “forbids”, and the groups list gets progressively less inclusive. This would be a workaround, but I need to be able to allow all indices with any name except the sensitive / confidential ones. Can this be expressed in the indices config line of ROR? In (native) Elasticsearch, index wildcards have a comma-and-minus-sign notation, like “test*,-test3”; will that work?

sscarduzio · September 10, 2021, 2:16pm

Yes, correct.

Not really a fan of changing the behaviour of the whole ACL with a flag tbh. We already pivoted the semantics of “groups” rule once. Not the best idea, as you can see. If I had to envision a come back of the additive permissions, I would rather create a new rule to achieve the different result. That way, every snippet of configuration will be self explanatory. Plus, we could have the two models co-exist in the same ACL (not a thing for the faint hearted though).

In my experience, forbid type ACL blocks are very difficult to reason about well outside the idea of forbidding something to anyone, and even then, you should be careful of putting it at the beginning of the ACL to avoid leaks.

Negation in indices rules is extremely useful indeed. We have a task in the backlog for this (RORDEV-394). I asked @coutoPL to estimate the efforts for this task. If it’s not a big deal we can squeeze it in in this sprint or the next.

JeffSaxe · September 10, 2021, 10:26pm

Thanks for the confirmations. I guess you’re soliciting opinions from me (affected end users) for the product direction, so here goes.

I totally understand why you might find some end users have difficulty with “Forbid” clauses in an ACL. Some folks are approaching a list of access rules for the first time, and they might just think of all the rules in the rule set as being peers, regardless of position in the file, all evaluated at the same time in parallel and any of them can “win”. I see in your docs that you have to keep emphasizing to the user that the ACL blocks are evaluated top-down, in order, falling through on non-matches, and whichever one first matches all its conditions simultaneously “wins” and applies to this request. But all this is extremely familiar to me and other similar folks who do Router and Firewall access-lists, mail transport rules, stuff like that. So there is an audience for which Forbid clauses are utterly normal, and for which this logic feels completely natural:

If user is in Sensitive group, and the index being requested is Confidential, then allow it (and stop looking at any more rules).
If we got past rule 1, and the index being requested is Confidential, then forbid it, because obviously they’re not in Sensitive group – no need to even mention a group membership clause in this step. Denied.
OK, if we got past those rules, then obviously we’re not talking about the Confidential indices, so now go on with several other Allow clauses.
…and if you get to the end of all the clauses and nothing has matched, then you “fall off the end” and the query gets implicitly denied.

I find it quite easy to build logic like this, and you’ll probably find that if you explain it in the docs like this (“it’s like a Cisco ACL or route-map”), then you can show a few examples where you take an existing working structure, and the Kibana rules admin wants to add one more thing, they would add a matched pair like my first two rules (permit “this” to “them”, deny “same this” to anyone else), and insert that matched pair up above everything else they’ve written so far.

So with this structure, obviously at each step I need to be able to test membership in any of the groups the user belongs to. There isn’t just one primary group that’s more important; the user is at the same time member of many groups, and being in one group (Sensitive) affects the match evaluation of some rules, while being in another (SuperUsers) affects the match of different rules. It’s not quite that the permissions of all the group memberships are additive – like, for instance, in an NTFS filesystem ACL, where Windows calculates (in parallel) all the rights and privileges you have by virtue of being in a bunch of groups, and then smushes them together into a “resultant set” of all the permissions you are granted right now. That’s a different way of thinking of multi-group, and it has its place. But this sequential-and-first-match evaluation, with all group memberships being available for consideration at each matching stage, makes sense to me, and I expect to many others.

If you can’t envision going back to this sort of model (which, again, I wouldn’t name “additive permissions”), then I’m eager to to hear what you mean by a new kind of rule, which would be able to co-exist with the current rule-type evaluations without breaking existing configs of your users. You can outline it here and I’ll be happy to poke and prod at the plan in a constructive way. We can also take it offline in email if you don’t want to speculate here in public.

On the subject of negation in the indices list, yes, I agree it could be useful to describe some common cases, so you might want to work on it. I don’t think it would completely cover my case, though, at least not without a particular assumption, namely that all the Sensitive Logs users are also SuperUsers. In our case they happen to be, so we can probably fake that inverted pyramid if we need to, but we would still absolutely have to have each user be unambiguously a member of only one of the groups under the LDAP query scope’s OU. But then it would be impossible to express, for instance, a DashboardOnly user, who cannot write to any indices, but who is nonetheless allowed to look at a dashboard of those confidential logs. Without simultaneous access at each Rule consideration to the entire list of LDAP groups they’re in, it’s not possible to distinguish between a Dashboard user who may not see the sensitive logs and another Dashboard user who may. Or we could make new group names combining the roles, like DashboardNonSensitive and DashboardSensitive, but that just explodes into a 2-to-the-n exponential role names problem.

OK, wow, a long reply! I look forward to continuing this discussion. I’m on vacation much of next week, but I will try to reply promptly. Thank you, Simone.

sscarduzio · September 13, 2021, 10:59am

Yes I was wrong in calling them additive, as they are purely ACL style top down evaluation. Which as you said, is well understood algorithm if you ever touched a firewall or similar security appliances.

And again: RBAC is entirely distinct algorithm, and we have a design document on how to implement it side-by-side within the current ACL algorithm. I’d be glad to share it with you BTW, even though it’s not been implemented just yet.

Back to the current ACL algorithm in ROR. The modification in the we introduced to better support multi tenancy in ROR Enterprise diverges from this well understood algorithm, yes, but only in the presence of the x-ror-current-group request header.
Now, because this header is required to properly handle multi-tenancy, and ROR PRO has no multi-tenancy, we could just avoid setting the header in ROR PRO. So ES won’t see it and the ACL will behave as before. @coutoPL am I right?

coutoPL · September 13, 2021, 4:15pm

yes, on the ROR ES side the header is optional and our ACL takes its absence into consideration.

sscarduzio · September 13, 2021, 4:32pm

Ok I think this will solve @JeffSaxe’s issue then. It’s a small task, I can take care of it tomorrow and send a test build.

JeffSaxe · September 20, 2021, 9:12pm

Thanks, Simone, I appreciate this straightforward solution (simply refraining from sending the x-ror-current-group header from ROR PRO version into ES, so it will automatically revert to previous ACL evaluation). I’m glad it was easy to code, and thanks for sending that version a week ago! I am back in the office today and catching up, but I will make it a priority to try this and report back.

JeffSaxe · September 21, 2021, 4:11pm

Excellent, this has worked perfectly! After installing the 1.35.0-pre1 version, I no longer see that specific header being fed into ES, and now my combination membership in two ACL groups is back to working as desired. Thanks, gentlemen; I really appreciate the quick work, and sorry for the week’s delay in confirming.

I don’t know how you want to document this functionality difference to your customers; obviously it only affects PRO customers (the ones who never had the switch-between-groups ability anyway), and only some fraction of them (the ones who decided to do permit/deny/permit ACLs like a firewall). So if you already have a plan in mind for the future for seamlessly blending the switch-roles style with the multi-group-membership style, then maybe you don’t have to document it at all, aside from the existence of this forum thread that others like me can search for. In the meantime, I guess I got lucky that just by your removing that header, the previous ACL evaluation functionality was “uncovered” and is still working. As you continue to enhance the software, please remember to leave that evaluation logic in place until it’s fully supplanted by whatever the future solution is. If you’d like us to beta-test that future solution whenever it’s on the roadmap, we’ll be happy to.

Thanks!
– JeffS