orc

Query fails on presto-cli for a table created in hive in orc format with data residing in s3

旧城冷巷雨未停 提交于 2020-02-25 03:55:37
问题 I set up an Amazon EMR instance which includes 1 Master & 1 Core (m4 Large) with the following version details: EMR : 5.5.0 Presto: Presto 0.170 Hadoop 2.7.3 HDFS Hive 2.1.1 Metastore My Spark app wrote out the data in ORC to Amazon S3. Then I created the table in hive ( create external table TABLE ... partition() stored as ORC location 's3a"//' ), and tried to query from presto-cli, and I get the following error for query SELECT * from TABLE : Query 20170615_033508_00016_dbhsn failed: com

Hive: Merging Configuration Settings not working

别等时光非礼了梦想. 提交于 2020-01-28 10:13:55
问题 On Hive 2.2.0, I am filling an orc table from another source table of size 1.34 GB, using the query INSERT INTO TABLE TableOrc SELECT * FROM Table; ---- (1) The query creates TableORC table with 6 orc files, which are much smaller than the block size of 256MB. -- FolderList1 -rwxr-xr-x user1 supergroup 65.01 MB 1/1/2016, 10:14:21 AM 1 256 MB 000000_0 -rwxr-xr-x user1 supergroup 67.48 MB 1/1/2016, 10:14:55 AM 1 256 MB 000001_0 -rwxr-xr-x user1 supergroup 66.3 MB 1/1/2016, 10:15:18 AM 1 256 MB

I am using spark 1.4 and trying to save as orcfile with compression snappy but it saves as zlib

安稳与你 提交于 2020-01-06 14:50:46
问题 here is my code: val df=hiveContext.write.format("orc").options("orc.compression","SNAPPY").save( "xyz") but file is saved as ZLIB. 回答1: You could try adding the extra conf "spark.io.compression.codec=snappy" to spark-shell / spark-submit: spark-shell --conf spark.io.compression.codec=snappy #rest of your command.. Also, for writing to ORC format (assuming you are in Spark >= 1.5) you can use: myDf.orc("/some/path") The "orc" method is exactly like doing '.format("orc").save("/some/path")'.

Hadoop ORC file - How it works - How to fetch metadata

与世无争的帅哥 提交于 2020-01-01 03:35:20
问题 I am new to ORC file. I went through many blogs, but didn't get clear understanding. Please help and clarify below questions. Can I fetch schema from ORC file? I know in Avro, schema can fetched. How it actually provides schema evolution? I know that few columns can be added. But how to do it. The only I know, creating orc file is by loading data into hive table which store data in orc format. How ORC files index works? What I know is for every stripe index will be maintained. But as file is

Sqoop import as ORC ERROR java.io.IOException: HCat exited with status 1

坚强是说给别人听的谎言 提交于 2019-12-24 05:45:07
问题 I am trying to import a table from Netezza DB using sqoop hcatlog ( see below) in ORC format as suggested here Sqoop command: sqoop import -m 1 --connect <jdbc_url> --driver <database_driver> --connection-manager org.apache.sqoop.manager.GenericJdbcManager --username <db_username> --password <db_password> --table <table_name> --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database <hcat_db> --hcatalog-table < table_name > --create-hcatalog-table --hcatalog-storage-stanza 'stored as

Spark 2.0 DataSourceRegister configuration error while saving DataFrame as cvs

泄露秘密 提交于 2019-12-22 09:27:25
问题 I'm trying to save a data frame to cvs in in Spark 2.0, Scala 2.11 (process of migrating code from Spark 1.6). sparkSession.sql("SELECT * FROM myTable"). coalesce(1). write. format("com.databricks.spark.csv"). option("header","true"). save(config.resultLayer) Is the spark session built correctly? implicit val sparkSession = SparkSession.builder .master("local") .appName("com.yo.go") .enableHiveSupport() .getOrCreate() The error is received only at runtime (code compiles). Exception in thread

How does Hive 'alter table <table name> concatenate' work?

ε祈祈猫儿з 提交于 2019-12-22 01:30:32
问题 I have n(large) number of small sized orc files which i want to merge into k(small) number of large orc files. This is done using alter table table_name concatenate command in Hive. I want to understand how does Hive implement this. I'm looking to implement this using Spark with any changes if required. Any pointers would be great. 回答1: As per the AlterTable/PartitionConcatenate: If the table or partition contains many small RCFiles or ORC files, then the above command will merge them into

Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

家住魔仙堡 提交于 2019-12-21 10:19:09
问题 Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table , you get this: STORED AS INPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’ But if you create the table with those clauses, you will then get the casting error when selecting. Error likes: Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop

How to create a Schema file in Spark

岁酱吖の 提交于 2019-12-18 09:30:14
问题 I am trying to read a Schema file (which is a text file) and apply it to my CSV file without a header. Since I already have a schema file I don't want to use InferSchema option which is an overhead. My input schema file looks like below, "num IntegerType","letter StringType" I am trying the below code to create a schema file, val schema_file = spark.read.textFile("D:\\Users\\Documents\\schemaFile.txt") val struct_type = schema_file.flatMap(x => x.split(",")).map(b => (b.split(" ")(0)

CTAS with Dynamic Partition

给你一囗甜甜゛ 提交于 2019-12-12 18:17:07
问题 I want to change an existing table, that contains text format, into orc format. I was able to do it by: (1) creating a table in orc format manually having the partitions and then, (2) using the INSERT OVERWRITE statement to populate the table. I am trying to use CTAS (Create Table... AS Select...) statement for this. Is there any way I can include the dynamic partitioning with CTAS statement? So, if my text data set has multiple partitions (for example: year and month), can I point this in