How to secure aggregation queries?


(Barry Kaplan) #1

We have documents of the form

entity:
  plant: <string>
dimensions:
  a: <value>
  b: <value>
...

Rules like

- name: plant-enrich-data
  headers:
    - x-vi-plant:*
  proxy_auth: "*"
  indices:
    - timeseries-*
  filter: '{"query_string": {"query": "entity.plant: @{x-vi-plant}"}}'

Then issue a query like:

GET timeseries-2019-01-01/_search?size=0
{
  "aggs": {
    "plants": {
      "terms": {
        "field": "entity.plant",
        "size": 10
        , "min_doc_count": 0
      }
    }
  }
}

And get back:

...
"buckets" : [
        {
          "key" : "plant1",
          "doc_count" : 6322
        },
        {
          "key" : "plant2",
          "doc_count" : 0
        },
        {
          "key" : "plant3",
          "doc_count" : 0
        }
      ]

So this is clearly a confidentiality breach. Is there mechanism in RoR to ensure that aggregation queries do not leak data?


(Simone Scarduzio) #2

Hi @memelet,

This should not happen in our latest pre-release which we are about to release. Can you check?


(Anish Mashankar) #3

Hi Simone,
I tried the latest pre release with similar rules like @memelet posted. However, doing the same query, I still see aggregation keys with doc_count 0. Do we have to change the ruleset?


(Barry Kaplan) #4

I looked at all commits for the pre-release, found only this that seemed relevant – https://github.com/sscarduzio/elasticsearch-readonlyrest-plugin/commit/da354e5660d5665818b832a8f72526499318d179

But I could find no tests (new or existing) that have anything to do with aggregation queries. Am I missing something?


(Simone Scarduzio) #5

Working on this at the moment, as I could reproduce the issue.


(Simone Scarduzio) unlisted #6

(Simone Scarduzio) #7

Hi @memelet, this is currently a bug in the whole document level security model of Elasticsearch itself.
I reported this to Elasticsearch security team, they acknowledged the situation and the discussion is still ongoing on what is the best solution.

Unfortunately in order to fix this, substantial changes should be done in how the aggregation work inside the main Elasticsearch code base, and it’s necessary that the actual Elastic’s architects take their decisions.

I will report back as soon as I have more information from them. In the meanwhile I will hide this topic from the forum until we have a solution.