How to secure aggregation queries?

memelet · January 28, 2019, 4:35pm

We have documents of the form

entity:
  plant: <string>
dimensions:
  a: <value>
  b: <value>
...

Rules like

- name: plant-enrich-data
  headers:
    - x-vi-plant:*
  proxy_auth: "*"
  indices:
    - timeseries-*
  filter: '{"query_string": {"query": "entity.plant: @{x-vi-plant}"}}'

Then issue a query like:

GET timeseries-2019-01-01/_search?size=0
{
  "aggs": {
    "plants": {
      "terms": {
        "field": "entity.plant",
        "size": 10
        , "min_doc_count": 0
      }
    }
  }
}

And get back:

...
"buckets" : [
        {
          "key" : "plant1",
          "doc_count" : 6322
        },
        {
          "key" : "plant2",
          "doc_count" : 0
        },
        {
          "key" : "plant3",
          "doc_count" : 0
        }
      ]

So this is clearly a confidentiality breach. Is there mechanism in RoR to ensure that aggregation queries do not leak data?

sscarduzio · January 28, 2019, 10:46pm

Hi @memelet,

This should not happen in our latest pre-release which we are about to release. Can you check?

anishm · January 29, 2019, 9:55am

Hi Simone,
I tried the latest pre release with similar rules like @memelet posted. However, doing the same query, I still see aggregation keys with doc_count 0. Do we have to change the ruleset?

memelet · January 29, 2019, 3:40pm

I looked at all commits for the pre-release, found only this that seemed relevant – Fields rule would not work for aggregation due to tag type prefix. · sscarduzio/elasticsearch-readonlyrest-plugin@da354e5 · GitHub

But I could find no tests (new or existing) that have anything to do with aggregation queries. Am I missing something?

sscarduzio · February 3, 2019, 10:35am

Working on this at the moment, as I could reproduce the issue.

sscarduzio · February 20, 2019, 12:41pm

Hi @memelet, this is currently a bug in the whole document level security model of Elasticsearch itself.
I reported this to Elasticsearch security team, they acknowledged the situation and the discussion is still ongoing on what is the best solution.

Unfortunately in order to fix this, substantial changes should be done in how the aggregation work inside the main Elasticsearch code base, and it’s necessary that the actual Elastic’s architects take their decisions.

I will report back as soon as I have more information from them. In the meanwhile I will hide this topic from the forum until we have a solution.