问题
I want to load Hive tables using Pig. I think we can do this through HCatLoader
but I am using xml files to load pig. For this, I have to use XMLLoader
. Can I use two options to load XML files in Pig.
I am extracting data from XML files using my own UDF and once we extract all the data, I have to load Pig data in Hive tables.
I can't use HIVE to extract the XML data as the XML I received is quite complex and I wrote my own UDF to parse the XML. Any suggestions or pointers how we can load Hive tables using PIG data.
I am using AWS.
回答1:
You can STORE the loaded data into text file using delimiters (may be comma) and then create an external table in hive pointing to your file location.
Create external table YOURTABLE (schema)
row format delimited
fields terminated by ','
location '/your/file/directory';
回答2:
You can store data from pig into Hive tables using HCatStorer. For example:
register 's3n://bucket/path/xmlUDF.jar'
xml = LOAD 's3n://bucket/pathtofiles' USING xmlUDF();
STORE xml INTO 'database.table' USING org.apache.hive.hcatalog.pig.HCatStorer();
Your question isn't quite clear. Are you hoping to work with the XML and Hive data within pig, do something, and then store the result in Hive? Just trying to store the XML data in Hive and work with it there?
来源:https://stackoverflow.com/questions/32921201/hadoop-load-hive-tables-using-pig