问题
I am trying to execute a hive job in google dataproc using following gcloud command:
gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --properties=[bucket1=abcd]
gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --params=[bucket1=abcd]
But none of the 2 above commands is able to set 'bucket1' variable to 'x' variable .
The hive script is as follows:
set x=${bucket1};
set x;
drop table T1;
create external table T1( column1 bigint, column2 float, column3 int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION 'gs://${hiveconf:x}/output/prod';
But the variable 'x' is unable to take 'bucket1' variable which i passed in the gcloud command.
How do i do it? Please suggest
回答1:
Both examples should work with minor tweaks.
In
cloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --properties bucket1=abcd
, you can access variable as${bucket1}
In
gcloud dataproc jobs submit hive --cluster=msm-test-cluster --file hive.sql --params bucket1=abcd
, you can access variable as${hivevar:bucket1}
Easy way to test this, is to submit a script like this to dump all variables:
gcloud dataproc jobs submit hive --cluster msm-test-cluster -e "set;" --properties foo=bar --params bar=baz
The output should contain:
| foo=bar
| hivevar:bar=baz
Related question: How to set variables in HIVE scripts
来源:https://stackoverflow.com/questions/44969376/how-to-use-params-properties-flag-values-when-executing-hive-job-on-google-datap