问题
I was able to run query in presto to read the non-float columns from Hive ORC(snappy) table. However, when I select all float datatype columns through the presto cli, gives me the below error message. Any suggestions what is the alternative other than changing the filed type to double in the targetHive table
presto:sample> select * from emp_detail;
Query 20200107_112537_00009_2zpay failed: Error opening Hive split hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc (offset=0, length=1999857): Malformed ORC file. Can not read SQL type real from ORC stream .salary of type DOUBLE [hdfs://ip_address/warehouse/tablespace/managed/hive/sample.db/emp_detail/part-00079-5b0c6005-0943-4181-951f-43bcfcfe741f-c000.snappy.orc]
回答1:
Please try to add this property
hive.orc.use-column-names=true
to presto-server/conf/catalog/hive.properties
,
and restart your presto server.
To test it without restarting the server run this from presto-cli
SET SESSION hive.orc_use_column_names=true;
Release notes from Presto regarding these attribute.
来源:https://stackoverflow.com/questions/59647638/presto-query-error-on-hive-orc-can-not-read-sql-type-real-from-orc-stream-of-ty