Hive 2.1
I have following table definition :
CREATE EXTERNAL TABLE table_snappy ( a STRING, b INT) PARTITIONED BY (c STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/' TBLPROPERTIES ('parquet.compress'='SNAPPY');
Now, I would like to insert data into it :
INSERT INTO table_snappy PARTITION (c='something') VALUES ('xyz', 1);
However, when I look into the data file, all I see is plain parquet file without any compression. How can I enable snappy compression in this case?
Goal : To have hive table data in parquet format and SNAPPY compressed.
I have tried setting multiple properties as well :
SET parquet.compression=SNAPPY; SET hive.exec.compress.output=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.type=BLOCK; SET mapreduce.output.fileoutputformat.compress=true; SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec; SET PARQUET_COMPRESSION_CODEC=snappy;
as well as
TBLPROPERTIES ('parquet.compression'='SNAPPY');
but nothing is being helpful. I tried the same with GZIP compression and it seem to be not working as well. I am starting to think if it's possible or not. Any help is appreciated.