Audit log to optionally log executed queries

:bulb: Audit log to optionally log executed queries

There are many cases where sensitive information is stored in ElasticSearch and there is a reason to give access to the data for some users, but still need to be able to detect malicious or unnecessary access.

Thus it would be good for audit log to also record the executed query on specific indices, so that users’ actions can be reviewed later. Also possibly, the amount of returned documents could be good to record.

There should be

  1. a new multi value setting, “audit_include_query” which species that which indices’ queries are logged. The setting should support wildcards.

  2. if there is a query to an index that matches a rule, the request body(which includes users’ query) would be included in the audit log

Considerations and possible side effects of the feature.

  • Performance impact
  • Log size
  • Information security when using the feature

:eyes: Example

Example configuration:

 readonlyrest:
 audit_collector: true
 audit_include_query: ["bla-*", "sensitive-index", "*-personal" ]

Request body - and entry which would be added to audit log

{
   "query":{
     "query_string": {
     	"default_field": "product_id",
       "query":"HFD-DE4"
     }
   }
}

References

Github conversation
Discussion about similar feature at the forum

:rocket: Let’s do this?

  • 1
  • 2
  • 3
  • 4
  • 5

0 voters

@sscarduzio is there any work going for this feature?

Also, when you add this option, it may be beneficial to log these into separate index rather than clogging actual audit index.

There was some work done on audit recently (custom audit index name and time granularity feature), but not this one yet.

I was thinking we could push this a bit further and create a sort of ACL for loggers. Albeit it would not stop the evaluation at the first matching block, and log a request multiple times in multiple indices if more than a block matches.

...
audit:
  enable: true
  index_loggers:
  - name: "privacy audit"
    index_template: "'trace-'YYYY-MM"
    indices: ["sensitive-*", "*-private-*"]
    log_fields: ["content", "path", "oa", "indices" ]

  - name: "redflags audit"
    index_template: "'redflags-'YYYY-MM"
    actions: ["*delete*", "*flush*"]
    log_fields: ["*"] # can omit the rule all together to mean log every field

  - name: "trace audit"
    index_template: "'trace-'YYYY-MM"
    log_fields: ["~content"] # don't log the body of less interesting the requests (i.e. bulk indexing)

what do you think?
PS: I was trying to invent a rule name and syntax to express a cleanup policy and frequency. I.e. “cleanup stuff older than 6 months, every 24 hours”. Maybe you can suggest?

Yes. This looks to be cleaner approach. Also, today, ROR already creates audit related index, when we enable audit collector. So now are we saying that this will be in addition to it or is this going to replace that existing audit index altogether? I would vote for in addition to existing feature.

Regarding your other question, how about using “expiry” or “purge_after” or “expiry_frequency” or some variant of it?

Also, once you add these audit indexes, probably having some standard dashboards which can use these indexes would also be a nice to have feature on the Kibana side :wink:

1 Like

It would be very cool to extend existing audit logs with new field - ‘query’.

What would that contain? The body of the request?

Hi,
Maybe a older topic but I think still relevant perhaps.

It sounds interesting to enable logging of queries on specific authorization blocks.

Was anything further done with the above?

A bit like:
Extended audit
If you want to log the request content then an additional serializer is provided. This will log the entire user request within the content field of the audit event. To enable, configure the audit_serializer parameter as below.

readonlyrest:
audit_collector: true
audit_serializer: tech.beshu.ror.requestcontext.QueryAuditLogSerializer
…

But then only for certain auth blocks perhaps?

Maybe we should move the audit serializer from global to local to ACL blocks. Or maybe even make them parametric, who wants to write a java class just for adding/removing fields anyway?

Having a seperate config logic for query auditing could be very flexible.
But… It would potentially be quite complicated.

We currently have the audit log (in Elasticsearch index) which already contains all relevant context /metadata. Ideally we enrich this with the relevant query information.
Query information being the request body.
I am currently not behind laptop, but you can also have search string in uri right? It should also account for that.

If you look at it practically there are 4 scenarios:

  1. You are not interested in query audit log
    Great, this is current situation.
  2. You want to be able to audit specific users, groups or ACL blocks
    For this it would be good to have the a option in existing config with which you can say “store query for this block”
  3. You want to be able to audit specific security indices.
    You don’t care which user or whatever accesses the index, you want a audit log of everything that happens on index X.
    For this the seperate configuration would be easier perhaps.
  4. You want a combination of 2 and 3.
    You want every query for index X audited.
    And you want to audit everything that Dave does, you just don’t trust dave :smiley:

Perhaps an idea is to extend the the type field in current configuration to allow array.
/pseudo config
You could then specify a block:
Index: index-x
Type: queryaudit

And a block:
User: dave
Type: allow, queryaudit
/end of pseudo config

This would allow you to have option 4, the combination.
The first block would mark all request that involve index-x to log query data. But it doesn’t do anything with regards to allow or forbid.

The second block would allow Dave access, but also mark all his queries to store the query data.

This isn’t perfect in my opinion as the configuration can potentially become more complicated for novice users. But it would make good use of all existing things that already exist.

Documentation would need to clearly explain that the block “type: queryaudit” should be before any allow or forbid rules regarding the indices.

Another thing that might be useful is to be able to explicitly NOT audit query data for certain blocks… But this would require more thinking… Do you really want to be able to configure this or should you handle this in block ordering.
Example for this scenario:
I want keep track of everything that happens for index-x, except Logstash writing with the bulk writer.

I hope the above helps.

An alternative to the above would be to add type “none” and have a configuration item “markforqueryaudit”.

The 2 blocks above would look something like this:
/pseudo config
Index: index-x
Type: none
Markforqueryaudit: true

User: dave
Type: allow
Markforqueryaudit: true
/end of pseudo config
Perhaps this would be less intruisive in the code that you need to change and keeps config a bit more understable.

So what makes the ACL simple and less error prone than other systems?

  • Every block is evaluated sequentially, top to down. No parallel/concurrent execution.
  • Requests are rejected by default.
  • A block either allows or rejects, and optionally mutates the request (i.e. adds metadata)
  • The first block that matches, ends the execution.

If we add a queryaudit type, either we need to stop the ACL execution without a clear allow/forbid exit state, or we need to continue execution after a match. In both cases, this is breaking the ACL model.

Another way would be to have a secondary logging ACL where the request goes through a set of blocks and the “winner” assigns the log level.

The simplest solution in the route of “least-surprise” would be the introduction of a side-effecting rule that adds a log level flag to the request (in the case the block matches).
Turns out we already have one, “verbosity”. We can expand on it, with optionally accepting a “sink”.

i.e.

# PROPOSED SYNTAX NOT IMPLEMENTED YET
#####################################

access_control_rules:
- name: "Suspicious Dave"
  users: [dave]
  indices: [this, that]
  verbosity:
      level: info
      sink: myIndexSink

audit_log_sinks:
- name: myIndexSink
  type: local-index
  index_pattern: ror_audit_YYYY_MM_DD
  fields: [~body, ~history, custom_severity:average_suspicious ]  # <-- negating some of the available JSON fields in the standard log document, adding some custom ones.

- name: myRemoteIndexSink
  type: remote-index
  hosts: ["https://remote-server:9200"]
  index_pattern: ror_audit_YYYY_MM_DD
  fields: [~body, ~history, custom_severity:very_suspicious ] 

- name: localESlog
  type: es-logfile
  fields: [~body]