问题
I would like to automate my hive script every day , in order to do that i have an option which is data pipeline. But the problem is there that i am exporting data from dynamo-db to s3 and with a hive script i am manipulating this data. I am giving this input and output in hive-script that's where the problem starts because a hive-activity has to have input and output but i have to give them in script file.
I am trying to find a way to automate this hive-script and waiting for some ideas ?
Cheers,
回答1:
You can disable staging on Hive Activity to run any arbitrary Hive Script.
stage = false
Do something like:
{
"name": "DefaultActivity1",
"id": "ActivityId_1",
"type": "HiveActivity",
"stage": "false",
"scriptUri": "s3://baucket/query.hql",
"scriptVariable": [
"param1=value1",
"param2=value2"
],
"schedule": {
"ref": "ScheduleId_l"
},
"runsOn": {
"ref": "EmrClusterId_1"
}
},
回答2:
Another alternative to the Hive Activity, is to use an EMR activity as in the following example:
{
"schedule": {
"ref": "DefaultSchedule"
},
"name": "EMR Activity name",
"step": "command-runner.jar,hive-script,--run-hive-script,--args,-f,s3://bucket/path/query.hql",
"runsOn": {
"ref": "EmrClusterId"
},
"id": "EmrActivityId",
"type": "EmrActivity"
}
来源:https://stackoverflow.com/questions/19709651/automating-hive-activity-using-aws