Exception while using lateral view in Hive

眉间皱痕 提交于 2019-11-29 17:31:38
gobrewers14

I don't know what your data looks like in Hive because you didn't provide that information so here is how I loaded your XML into Hive.

Loader:

ADD JAR /path/to/jar/hivexmlserde-1.0.5.3.jar;

DROP TABLE IF EXISTS db.tbl;
CREATE TABLE IF NOT EXISTS db.tbl (
  code STRING,
  entryInfo ARRAY<MAP<STRING,STRING>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerde'
WITH SERDEPROPERTIES (
  "column.xpath.code"="/document/code/text()",
  "column.xpath.entryInfo"="/document/entryInfo/*"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
TBLPROPERTIES (
  "xmlinput.start"="<document>",
  "xmlinput.end"="</document>"
);

LOAD DATA LOCAL INPATH 'someFile.xml' INTO TABLE db.tbl;

In the Hive-XML-SerDe documentation under section 3 - Arrays, you can see that they use an array structure to handle repeated tags and in 4 - Maps, you can see that they use maps to handle entries under a sub-tag. So, entryInfo will be of type ARRAY<MAP<STRING,STRING>>.

You can then explode this array, collect like key/vals, and re-combine.

Query:

ADD JAR /path/to/jar/hivexmlserde-1.0.5.3.jar;
ADD JAR /path/to/jars/brickhouse-0.7.1.jars;

CREATE TEMPORARY FUNCTION COLLECT AS 'brickhouse.udf.collect.CollectUDAF';

SELECT code
  , m_map['statusCode']    AS status_code
  , m_map['startTime']     AS start_time
  , m_map['endTime']       AS end_time
  , m_map['strengthValue'] AS strength_value
  , m_map['strengthUnits'] AS strength_units
FROM (
  SELECT code
    , COLLECT(m_keys, m_vals) AS m_map
  FROM (
    SELECT code
      , idx
      , MAP_KEYS(entry_info_map)[0]   AS m_keys
      , MAP_VALUES(entry_info_map)[0] AS m_vals
    FROM (
      SELECT code
        , entry_info_map
        , CASE
           WHEN FLOOR(tmp / 5) = 0 THEN 0
           WHEN FLOOR(tmp / 5) = 1 THEN 1
           WHEN FLOOR(tmp / 5) = 2 THEN 2
           ELSE -1
         END AS idx
      FROM db.tbl
      LATERAL VIEW POSEXPLODE(entryInfo) exptbl AS tmp, entry_info_map ) x ) y
  GROUP BY code, idx ) z

Output:

code    status_code     start_time      end_time    strength_value  strength_units
10160-0 completed       20110729        20110822    24              h
10160-0 completed       20120130        20120326    12              h
10160-0 completed       20100412        20110822    8               d

Also, you've basically asked this question 4 times (one, two, three, four). This is not a good idea. Just ask once, edit to add more information, and be patient.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!