RoR/es-hadoop connector/Hive problem


(Vadim) #1

I’m experiencing the issue when querying ES with RoR from external Hive table with property ‘es.net.http.header.Authorization’=‘Basic (base64 encoded user:pwd)’ in hive table definition.

Running select statement from table outputs the error “Failed with exception Java.io.IOException:org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [HEAD] on [index_name] failed; server[server_name:port] returned [401|Unauthorized:]”

Running cURL on the same es index with the same user/pwd authorization (curl -HGET -H Authorization:Basic (base64 encoded user:pwd)’ ‘http://server:port/index/_search’) returns correct response and proves that RoR/ES have been configured properly.
Also, the same table without the ‘es.net.http.header.Authorization’ in its definition works well too.

Environment: RHEL 6.6, open jdk 1.8.0_141, ES 5.6.3, RoR 1.16.14_es5.6.3, Hadoop 2.7, Hive 1.2.100.2.6.3, elasticsearch-hadoop-6.0.0.jar

Had anyone faced this issue or successfully queried Elasticsearch 5.6 with Readonlyrest plugin from external Hive table with the es.net.http.header.Authorization ?
Any help or suggestion will be appreciated.


(Ld57) #2

Hi,

could be nothing but just to check, would it be possible you have a difference related to case on index_name ?
maybe a capital letter when you ran the select ?


(Vadim) #3

Appreciate your help Ld57.

The goal is to test es-hadoop connector along with ES and Readonlyrest plugin while enabling/disabling access to ES in readonlyrest.yml on server level (hosts: [ip addr of hive server]) or on user/index level (using sha256 encrypted user:pwd value and authorization header in the request) as follows:
(1)Reading from ES to Hive
(a)server level enabling/disabling reading - works well
(b)user/index level enabling/disabling reading - works well, but when disabling reading the response does not contain the message declared in readonlyrest.yml
(1)Writing to ES from Hive
(a)server level enabling/disabling writing - doesn’t work
(b)user/index level enabling/disabling writing - doesn’t work


(Ld57) #4

mmmh,

would it be possible to copy/paste your readonlyrest.yml file here ?


(Vadim) #5

(1) readonlyrest.yml (server level access)

readonlyrest:
    enable: true
    response_if_req_forbidden: access denied by ror
    access_control_rules:
    - name: "accept requests from host xyz"
    hosts: ["11.111.11.55"]

Note: 11.111.11.55 is ip of the server where hadoop/hive external table are running

(2) readonlyrest.yml (user/index level access)

readonlgrest:
    enable: true
    response_if_req_forbidden: access denied by ror
    access_control_rules:
    - name: "accept requests from users in group team2 on index books"
      type: allow
      groups: ["team2"]
      actions: ["indices:data/write/*", "indices:data/read/*"]
      indices: ["books","flowers"]
    users:
    - username: tom
      auth_key_sha256: ab3.....1k
      groups: ["team2"]

(Ld57) #6

does the typo “readonlgrest” on the second code is also in your code ?

it should be “readonlyrest”


(Simone Scarduzio) #7

@dzyubanv you should read the log lines of ES where it says “FORBIDDEN” when Hive attempts to write. Moreover, if you enable the debug mode in ES, you will see the Authorization header value.

I suspect that the credentials don’t reach ES, and by analising such log line you could inspect the presence of the credentials.


(Vadim) #8

Yes ldp57, this is just typo here. The file name, I used, is readonlyrest.yml. I’m typing all text here manually rather then copying/pasting to communicate with you because of the security issues.
Appreciate your help.


(Vadim) #9

Hi Simone,
I enabled debug mode in ES, but after I ran Hive insert there was no relevant entry in ES.log at all.

Readonlyrest deployed and enabled on master node listening on port:9200 on server with IP 11.111.11.88
(1) readonlyrest.yml

readonlyrest:
    enable: true
    response_if_req_forbidden: **access denied by ror**
    access_control_rules:
    - name: "accept requests from host xyz"
    hosts: ["11.111.11.55"]

11.111.11.55 is the IP address of the server where Hadoop/Hive external table are declared and insert statement is executed.

The table definition

hive> CREATE EXTERNAL TABLE IF NOT EXISTS write_to_es_ror_enable_xyz_9200 (
title string,
author string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.nodes'='11.111.11.88',
'es.port'='9200',
'es.resource.write.'='books/kindle',
'es.index.auto.create'='true',
'es.write.operation'='index',
'es.mapping.names'='title:title, author:author');

hive> insert into write_to_es_ror_enable_xyz_9200 values ('vdtitle', 'vdauthor');

After running this insert statement the data has not been written to ES and there was no relevant entry in the ES.log

Hive.log shows the error:

Caused by: org.apache.hadoophive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {'tmp_values_col1':'vdtitle','tmp_values_col2':'vdauthor',}
     at org.apache.hadoop .....
Caused by org.elasticserch.hadoop.rest.EsHadoopInvalidRequest: **access denied by ror**
null
      at org.elasticserch.hadoop.rest.RestClient.checkResponse (restClient.java:510)

What is interesting here - the line with the access denied by ror, which is exact value of the response_if_req_forbidden, defined in (1)readonlyrest.yml.

Also, after running cURL -XPUT ... " on allowed server (IP 11.111.11.55) and not allowed (11.111.11.99) I can see the corresponding “ALLOWED” and “FORBIDDEN” entries accordingly in ES.log.

I’m not sure what does this mean.
Might be that request from Hadoop/Hive reached ES+ROR, somehow handled and ES+ROR sent the response back without writing anything in ES.log?

I’m continue to work on the issue and any help will be appreciated.


(Simone Scarduzio) #10

server lever an user level ROR configuration means you installed ROR in two different nodes with different settings, right?


(Vadim) #11

Yes, I’m trying to test two use cases when writing index/data to ES from Hadoop/Hive:
use case (1) - server level ROR config on port 9200 and Hive table with No line es.net.http.header.Authorization in its definition
use case (2) - user level ROR config on port 9201 and Hive table with the line es.net.http.header.Authorization in its definition

I’m running these two use cases separately and subsequently - disabled ROR on port 9201 and ran use case (1), then disabled ROR on port 9200 and ran use case (2) - to exclude any conflicts and collision.
I’m focusing on use case (1) now, trying to resolve the issue.


(Simone Scarduzio) #12

OK as far as I understood, the use case 1 access control is done via hosts rule. Which should definitely work, unless you have a load balancer or proxy in between (for which we have support via X-Forwarded-For header).

Can you confirm that you can curl the server level ROR from the Hive machine and get through?


(Vadim) #13

That is correct. In use case (1) I implemented access control in readonlyrest.yml via host rule and there is neither a load balancer nor proxy in between.
I can successfully run cURL from Hive machine
cURL -XPUT ‘http://11.111.11.88:9200/books/kindle/5’ -d '{“title”:“testtitle”, “author”:“testauthor”}'
and the document has been written to ES and the appropriate “ALLOWED…” entry has been written into ES.log.
Running the cURL from another machine, which IP address is not listed in readonlyrest.yml host rule, gave me the correct response “access denied by ror” and the corresponding “FORBIDDEN…” entry has been written into ES.log.
Writing a document from Hadoop/Hive machine to ES with Hive external table and access control via hosts rule still not working.


(Simone Scarduzio) #14

this is way too strange, could you please take a snoop with wireshark or tcp dump between the nodes to see the http traffic on the said port?


(Vadim) #15

Hi Simone,
I can’t monitor traffic over master node, but I captured the request from Hive/insert inside es.log
and here is how it looks like

FORBIDDEN by default req={ID:123…, TYP:NodesInfoRequest, CGR:N/A, USR:[no basic auth header], BRS:true, ACT:cluster:monitor/nodes/info, OA:11.111.11.77, IDX:<N/A>, MET:GET, PTH:/_nodes/http, CNT:<N/A>, HDR:{Accept=application/json, content-length=0, Content-type=application/json,
Host=11.111.11.88:9200, User-Agent+Jakarta Commons-HttpClient/3.1}, HIS:[Accept all requestsfrom host hive-host-machine-name->[hosts->false]]}

This same entry appeared in es.log 4 times with only different OAs, like OA:11.111.11.77, OA:11.111.11.78, OA:11.111.11.79, OA:11.111.11.80.

The attributes of this request is not good:

TYP:NodesInfoRequest should be TYP:IndexRequest
ACT:cluster:monitor/nodes/info should be ACT:indices:data/write/index
OA:11.111.11.77 should be OA:11.111.11.55
IDX:<N/A> should be IDX:books
MET:GET should be MET:PUT
PTH:/_nodes/http should be PTH:/books/kindle
CNT:<N/A> should be CNT:{‘TITLE’:‘vdtitle’, ‘author’:‘vdauthor’}
HDR:{Accept=application/json, content-length=0, Content-type=application/json,
Host=11.111.11.88:9200, User-Agent+Jakarta Commons-HttpClient/3.1} should be HDR:{Accept=/,…}
HIS:[Accept all requestsfrom host hive-host-machine-name->[hosts->false] should be HIS:[Accept all requestsfrom host hive-host-machine-name->[hosts->true]

It looks like when I’m writing from Hive table to node with this RoR configuration then ES sees wrong request?
At the same time I can write to master node with this RoR configuration from Hive server using the cURL successfully. Also, if I remove RoR from this master node then I can write from Hive table to ES successfully too.

Any help on this issue will be appreciated.


(Simone Scarduzio) #16

So ROR is reading the wrong originating address? How many IPs does the hive server have configured? Why don’t we just enable authentication via basic auth instead of the hosts rule? See this: https://discuss.elastic.co/t/how-hive-connect-to-elasticsearch-when-x-pack-installed/92407


(Vadim) #17

Hi Simone,
I’m not aware how many IPs does the hive server have configured.
If I use Hive table against ES node without installed RoR - it works fine and data is written in ES and there is no appropriate entry about request from Hive in ES.log.
Using the same table against ES node with installed RoR (access control is done via hosts rule) gives me bad request entry in es.log described earlier.

Here I tested RoR with user/groups level access rule as it is in (2)readonlyrest.yml at the beginning of this thread.
I’m using the same table, just added one line to it - ‘es.net.http.header.Authorization’=‘Basic <value of Base64(tom:tompwd)>’

hive> CREATE EXTERNAL TABLE IF NOT EXISTS write_to_es_ror_enabled_9200 (
title string,
author string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler’
TBLPROPERTIES(
‘es.nodes’=‘11.111.11.88’,
‘es.port’=‘9200’,
‘es.resource.write.’=‘books/kindle’,
‘es.index.auto.create’=‘true’,
‘es.write.operation’=‘index’,
‘es.mapping.names’=‘title:title, author:author’,
‘es.net.http.header.Authorization’=‘Basic <Base64(tom:tompwd)>’);

hive> insert into write_to_es_ror_enabled_9200 values (‘vdtitle’, ‘vdauthor’);

Here is the entry inside es.log. It contains lots of wrong attributes (some of them are correct) and the data has not been written into ES.

FORBIDDEN by default req={ID:123…, TYP:MainRequest, CGR:N/A, USR:tom, BRS:true, ACT:cluster:monitor/main, OA:11.111.11.55, IDX:<N/A>, MET:GET, PTH:/, CNT:<N/A>, HDR:{Accept=application/json, Authorization=Basic <Base64(tom:tompwd)>, content-length=0, Content-type=application/json,
Host=11.111.11.88:9200, User-Agent+Jakarta Commons-HttpClient/3.1}, HIS:[Accept all requests from users in team2 on index books->[auth_key_sha256->true, indices->true, actions->false]]}

Also, I looked at refernced link which is not relevant to my use case because the guy used xPack and he was not able to make it workable.

I tested and used RoR already with (1)hosts rule and (2)user/group rule along with JEST client and it worked well and now I’d like to integrate Hive and ES With RoR.

When you advised to enable authentication via basic auth instead of the hosts rule, did you mean this my use case I just described or something different ?
Or you meant using basic auth. by adding to the hive table es.net.http.auth.user and es.net.http.auth.pass with the values “tom” and “Base64(tom:tompwd)” which I tested successfully when writing to ES via cURL -XPUT … with header ?

Thank you for your time, patience and help.