How to query data from gz file of Amazon S3 using Qubole Hive query？

问题

I need get specific data from gz. how to write the sql? can I just sql as table database?:

  Select * from gz_File_Name where key = 'keyname' limit 10.

but it always turn back with an error.

回答1:

You need to create Hive external table over this file location(folder) to be able to query using Hive. Hive will recognize gzip format. Like this:

create external table hive_schema.your_table (
col_one string, 
col_two string
)
stored as textfile  --specify your file type, or use serde
LOCATION
  's3://your_s3_path_to_the_folder_where_the_file_is_located'
;

See the manual on Hive table here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable

To be precise s3 under the hood does not store folders, filename containing /s in s3 represented by different tools such as Hive like a folder structure. See here: https://stackoverflow.com/a/42877381/2700344

来源：https://stackoverflow.com/questions/42943445/how-to-query-data-from-gz-file-of-amazon-s3-using-qubole-hive-query

标签

amazon-s3

Hive

gzip

qubole

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!