How to use params/properties flag values when executing hive job on google dataproc

断了今生、忘了曾经 提交于 2019-12-12 04:47:53

问题


I am trying to execute a hive job in google dataproc using following gcloud command:

gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --properties=[bucket1=abcd]

gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --params=[bucket1=abcd]

But none of the 2 above commands is able to set 'bucket1' variable to 'x' variable .

The hive script is as follows:

set x=${bucket1};
set x;
drop table T1;
create external table T1( column1 bigint, column2 float, column3 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 'gs://${hiveconf:x}/output/prod';

But the variable 'x' is unable to take 'bucket1' variable which i passed in the gcloud command.

How do i do it? Please suggest


回答1:


Both examples should work with minor tweaks.

  • In cloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --properties bucket1=abcd, you can access variable as ${bucket1}

  • In gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --params bucket1=abcd, you can access variable as ${hivevar:bucket1}

Easy way to test this, is to submit a script like this to dump all variables:

gcloud dataproc jobs submit hive --cluster msm-test-cluster -e "set;" --properties foo=bar --params bar=baz

The output should contain:

| foo=bar    
| hivevar:bar=baz

Related question: How to set variables in HIVE scripts



来源:https://stackoverflow.com/questions/44969376/how-to-use-params-properties-flag-values-when-executing-hive-job-on-google-datap

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!