if just obfuscating or hashing is required it would be ok,
but here that I am facing :
usecase :
multiple doc, multiple indice
a similar field type “lastname”
kibana and bulk request must not see the content of lastname ( GDPR rule ), but they need to correlate docs based on the “lastname” ( security purpose and tracking )
if correlation reveals an issue, a request for authorisation is made to identify this “lastname”.
in my usecase, I can not just hide lastname, but I need to “transform it” to a hash to be able to correlate information.
and if I get authorisation to reveal lastname, I must be able to “decode” the hash.
here the approach I have chosen :
in my logstash filter, I use prune and fingerprint and clone
I use fingerprint to hash the value of a field, then I use prune and clone to store :
in public indice : all field except original “lastname”, replaced by “hash_lastname”
in restricted indice as update : original “lastname” and “hash_lastname”
if users read message from public indice ,they do not see lastname orginal content , but a hash.
then scripts can be used to correlate multiple search, based on hash.
If an authorisation is rised, then a “superuser” get access to restricted indice, and search the hash, to get orginal lastname.
it is a quite complicated and eat ressources on logstash side.
If it would be possible to replace field value on the fly by a hash with RoR based on a rule , it would be useful, but I guess performance will be dramatically decreased.
I agree that the best way to do this is to create an extra field at ingest time (for performance reasons), which is exactly what you are doing.
This in conjunction with fields rule would certainly help you in the sense that you can write to blocks: one for superusers that see all the fields, and one for regular users and scripts that will exclude the non-hashed sensitive fields.
So yes, I think you are all set in your use case. Thanks for sharing, finally a very concrete and sensible application of GDPR.