the issue is the following :
if just obfuscating or hashing is required it would be ok,
but here that I am facing :
multiple doc, multiple indice
a similar field type “lastname”
kibana and bulk request must not see the content of lastname ( GDPR rule ), but they need to correlate docs based on the “lastname” ( security purpose and tracking )
if correlation reveals an issue, a request for authorisation is made to identify this “lastname”.
in my usecase, I can not just hide lastname, but I need to “transform it” to a hash to be able to correlate information.
and if I get authorisation to reveal lastname, I must be able to “decode” the hash.
here the approach I have chosen :
- in my logstash filter, I use prune and fingerprint and clone
I use fingerprint to hash the value of a field, then I use prune and clone to store :
in public indice : all field except original “lastname”, replaced by “hash_lastname”
in restricted indice as update : original “lastname” and “hash_lastname”
if users read message from public indice ,they do not see lastname orginal content , but a hash.
then scripts can be used to correlate multiple search, based on hash.
If an authorisation is rised, then a “superuser” get access to restricted indice, and search the hash, to get orginal lastname.
it is a quite complicated and eat ressources on logstash side.
If it would be possible to replace field value on the fly by a hash with RoR based on a rule , it would be useful, but I guess performance will be dramatically decreased.
what do you think ?