hive-serde | 易学教程

character slash is not being read by hive on using OpenCSVSerde

阅读更多关于 character slash is not being read by hive on using OpenCSVSerde

问题 I have defined a table on top of files present in hdfs. I am using the OpenCSV Serde to read from the file. But, '\' slash characters in the data are getting omitted in the final result set. Is there a hive serde property that I am not using correctly. As per the documentation, escapeChar = '\' should fix this problem. But, the problem persists. CREATE EXTERNAL TABLE `tsr`( `last_update_user` string COMMENT 'from deserializer', `last_update_datetime` string COMMENT 'from deserializer') ROW

Hive from JSON Error

阅读更多关于 Hive from JSON Error

问题 I can't make this json into hive table somehow, either become all null data or not able being selected. i just need all the same fields with my DDL, and if it's structured inside it, i want to let it as a string instead try to parse that. The only one almost achieved only by : hive-hcatalog-core-1.1.0-cdh5.10.0.jar since some data are blank, i'm able to query with LIMIT but when i remove the limit, it was returning me this kind of error org.apache.hadoop.hive.serde2.SerDeException: java.io

Hive table source delimited by multiple spaces

阅读更多关于 Hive table source delimited by multiple spaces

问题 How can I make the following table source delimiter by one or more white spaces : CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT)

Custom Hive SerDe unable to SELECT column but works when I do SELECT *

阅读更多关于 Custom Hive SerDe unable to SELECT column but works when I do SELECT *

问题 I'm writing a custom SerDe and will only be using it to deserialize. The underlying data is a thrift binary, each row is an event log. Each event has a schema which i have access to, but we wrap the event in another schema, let's call it Message before storing. The reason I'm writing a SerDe instead of using the ThriftDeserializer is because as mentioned the underlying event is wrapped as a Message. So we first need to deserialize using the schema of Message and then deserialize the data for

PySpark/Hive: how to CREATE TABLE with LazySimpleSerDe to convert boolean 't' / 'f'?

阅读更多关于 PySpark/Hive: how to CREATE TABLE with LazySimpleSerDe to convert boolean 't' / 'f'?

问题 Hello dear stackoverflow community, here is my problem: A) I have data in csv with some boolean columns; unfortunately, the values in these columns are t or f (single letter); this is an artifact (from Redshift) that I cannot control. B) I need to create a spark dataframe from this data, hopefully converting t -> true and f -> false . For that, I create a Hive DB and a temp Hive table and then SELECT * from it, like this: sql_str = """SELECT * FROM {db}.{s}_{t} """.format( db=hive_db_name, s

SerDe properties list for AWS Athena (JSON)

阅读更多关于 SerDe properties list for AWS Athena (JSON)

问题 I'm testing the Athena product of AWS, so far is working very good. But I want to know the list of SerDe properties. I've searched far and wide and couldn't find it. I'm using this one for example "ignore.malformed.json" = "true" , but I'm pretty sure there are a ton of other options to tune the queries. I couldn't find info for example, on what the "path" property does, so having the full list will be amazing. I have looked at Apache Hive docs but couldn't find this, and neither on AWS docs

Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

阅读更多关于 Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

问题 Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table , you get this: STORED AS INPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’ But if you create the table with those clauses, you will then get the casting error when selecting. Error likes: Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop

Why does all columns get created as string when I use OpenCSVSerde in Hive?

阅读更多关于 Why does all columns get created as string when I use OpenCSVSerde in Hive?

问题 I am trying to create a table using the OpenCSVSerde and some integer and date columns. But the columns get converted to String. Is this an expected outcome? As a workaround, I do an explicit type-cast after this step (which makes the complete run slower) hive> create external table if not exists response(response_id int,lead_id int,creat_date date ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ('quoteChar' = '"', 'separatorChar' = '\,', 'serialization

Can I use 2 fields terminators(like ',' and '.') at a time in hive while creating table?

阅读更多关于 Can I use 2 fields terminators(like ',' and '.') at a time in hive while creating table?

问题 I have a file with id and year . My fields are separated by , and . . Is there any chance I can in the place of fields terminated by can I use , and . ? 回答1: This is possible using RegexSerDe. hive> CREATE EXTERNAL TABLE citiesr1 (id int, city_org string, ppl float) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ('input.regex'='^(\\d+)\\.(\\S+),(\\d++.\\d++)\\t.*') LOCATION '/user/it1/hive/serde/regex'; In the regex above three regex groups are defined. (\\d+

How to build a hive table on data which is separated by '^P' delimiter

阅读更多关于 How to build a hive table on data which is separated by '^P' delimiter

问题 My query is: CREATE EXTERNAL TABLE gateway_staging ( poll int, total int, transaction_id int, create_time timestamp, update_time timestamp ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^P'; (I am not sure whether '^P' can be used as a delimiter but tried it out) The result is showing all fields 'none' when I load the data into hive table. The data looks like: 4307421698^P200^P138193920770^P2017-03-08 02:46:18.021204^P2017-03-08 02:46:18.021204 Please help me out. 回答1: Here are the options: ...