hive-serde

character slash is not being read by hive on using OpenCSVSerde

﹥>﹥吖頭↗ 提交于 2020-08-26 06:51:51
问题 I have defined a table on top of files present in hdfs. I am using the OpenCSV Serde to read from the file. But, '\' slash characters in the data are getting omitted in the final result set. Is there a hive serde property that I am not using correctly. As per the documentation, escapeChar = '\' should fix this problem. But, the problem persists. CREATE EXTERNAL TABLE `tsr`( `last_update_user` string COMMENT 'from deserializer', `last_update_datetime` string COMMENT 'from deserializer') ROW

Hive from JSON Error

99封情书 提交于 2019-12-25 09:40:10
问题 I can't make this json into hive table somehow, either become all null data or not able being selected. i just need all the same fields with my DDL, and if it's structured inside it, i want to let it as a string instead try to parse that. The only one almost achieved only by : hive-hcatalog-core-1.1.0-cdh5.10.0.jar since some data are blank, i'm able to query with LIMIT but when i remove the limit, it was returning me this kind of error org.apache.hadoop.hive.serde2.SerDeException: java.io

Hive table source delimited by multiple spaces

谁都会走 提交于 2019-12-25 08:29:41
问题 How can I make the following table source delimiter by one or more white spaces : CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT)

Custom Hive SerDe unable to SELECT column but works when I do SELECT *

南笙酒味 提交于 2019-12-25 00:12:50
问题 I'm writing a custom SerDe and will only be using it to deserialize. The underlying data is a thrift binary, each row is an event log. Each event has a schema which i have access to, but we wrap the event in another schema, let's call it Message before storing. The reason I'm writing a SerDe instead of using the ThriftDeserializer is because as mentioned the underlying event is wrapped as a Message. So we first need to deserialize using the schema of Message and then deserialize the data for

PySpark/Hive: how to CREATE TABLE with LazySimpleSerDe to convert boolean 't' / 'f'?

拟墨画扇 提交于 2019-12-22 17:46:25
问题 Hello dear stackoverflow community, here is my problem: A) I have data in csv with some boolean columns; unfortunately, the values in these columns are t or f (single letter); this is an artifact (from Redshift) that I cannot control. B) I need to create a spark dataframe from this data, hopefully converting t -> true and f -> false . For that, I create a Hive DB and a temp Hive table and then SELECT * from it, like this: sql_str = """SELECT * FROM {db}.{s}_{t} """.format( db=hive_db_name, s

SerDe properties list for AWS Athena (JSON)

我的未来我决定 提交于 2019-12-22 09:52:23
问题 I'm testing the Athena product of AWS, so far is working very good. But I want to know the list of SerDe properties. I've searched far and wide and couldn't find it. I'm using this one for example "ignore.malformed.json" = "true" , but I'm pretty sure there are a ton of other options to tune the queries. I couldn't find info for example, on what the "path" property does, so having the full list will be amazing. I have looked at Apache Hive docs but couldn't find this, and neither on AWS docs

Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

家住魔仙堡 提交于 2019-12-21 10:19:09
问题 Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table , you get this: STORED AS INPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’ But if you create the table with those clauses, you will then get the casting error when selecting. Error likes: Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop

Why does all columns get created as string when I use OpenCSVSerde in Hive?

杀马特。学长 韩版系。学妹 提交于 2019-12-21 04:43:08
问题 I am trying to create a table using the OpenCSVSerde and some integer and date columns. But the columns get converted to String. Is this an expected outcome? As a workaround, I do an explicit type-cast after this step (which makes the complete run slower) hive> create external table if not exists response(response_id int,lead_id int,creat_date date ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ('quoteChar' = '"', 'separatorChar' = '\,', 'serialization

Can I use 2 fields terminators(like ',' and '.') at a time in hive while creating table?

被刻印的时光 ゝ 提交于 2019-12-20 03:56:10
问题 I have a file with id and year . My fields are separated by , and . . Is there any chance I can in the place of fields terminated by can I use , and . ? 回答1: This is possible using RegexSerDe. hive> CREATE EXTERNAL TABLE citiesr1 (id int, city_org string, ppl float) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ('input.regex'='^(\\d+)\\.(\\S+),(\\d++.\\d++)\\t.*') LOCATION '/user/it1/hive/serde/regex'; In the regex above three regex groups are defined. (\\d+

How to build a hive table on data which is separated by '^P' delimiter

一曲冷凌霜 提交于 2019-12-12 12:25:29
问题 My query is: CREATE EXTERNAL TABLE gateway_staging ( poll int, total int, transaction_id int, create_time timestamp, update_time timestamp ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^P'; (I am not sure whether '^P' can be used as a delimiter but tried it out) The result is showing all fields 'none' when I load the data into hive table. The data looks like: 4307421698^P200^P138193920770^P2017-03-08 02:46:18.021204^P2017-03-08 02:46:18.021204 Please help me out. 回答1: Here are the options: ...