hortonworks-data-platform

Hive sort operation on high volume skewed dataset

早过忘川 提交于 2019-12-11 03:36:32
问题 I am working on a big dataset of size around 3 TB on Hortonworks 2.6.5, the layout of the dataset is pretty straight forward. The heirarchy of data is as follows - -Country -Warehouse -Product -Product Type -Product Serial Id We have transaction data in the above hierarchy for 30 countries each country have more than 200 warehouse, single country USA contributes around 75% of the entire data set. Problem: 1) We have transaction data with transaction date column ( trans_dt ) for the above data

Spark on YARN: Less executor memory than set via spark-submit

北慕城南 提交于 2019-12-10 23:00:40
问题 I'm using Spark in a YARN cluster (HDP 2.4) with the following settings: 1 Masternode 64 GB RAM (48 GB usable) 12 cores (8 cores usable) 5 Slavenodes 64 GB RAM (48 GB usable) each 12 cores (8 cores usable) each YARN settings memory of all containers (of one host): 48 GB minimum container size = maximum container size = 6 GB vcores in cluster = 40 (5 x 8 cores of workers) minimum #vcores/container = maximum #vcores/container = 1 When I run my spark application with the command spark-submit -

Oracle Virtual Box error: failure to open a session with Hortonworks

廉价感情. 提交于 2019-12-10 22:46:02
问题 I've researched the questions already on stackoverflow that suggest upgrading to the most recent version of Virtual Box; one question at the time suggested upgrading to V4.3.14. Well, I'm on V 4.3.20. I've reinstalled about 5 times, and ensured the BIOS was set to virtualization. I continue to get the error message below. Failed to open a session for the virtual machine Hortonworks Sandbox with HDP 2.2. The virtual machine 'Hortonworks Sandbox with HDP 2.2' has terminated unexpectedly during

Hadoop streaming with python on Windows

巧了我就是萌 提交于 2019-12-10 19:05:48
问题 I'm using Hortonworks HDP for Windows and have it successfully configured with a master and 2 slaves. I'm using the following command; bin\hadoop jar contrib\streaming\hadoop-streaming-1.1.0-SNAPSHOT.jar -files file:///d:/dev/python/mapper.py,file:///d:/dev/python/reducer.py -mapper "python mapper.py" -reducer "python reduce.py" -input /flume/0424/userlog.MDAC-HD1.MDAC.local..20130424.1366789040945 -output /flume/o%1 -cmdenv PYTHONPATH=c:\python27 The mapper runs through fine, but the log

Table loaded through Spark not accessible in Hive

ぐ巨炮叔叔 提交于 2019-12-10 13:25:13
问题 Hive table created through Spark (pyspark) are not accessible from Hive. df.write.format("orc").mode("overwrite").saveAsTable("db.table") Error while accessing from Hive: Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0) Table getting created successfully in Hive and able to read this table back in spark. Table metadata is accessible (in Hive) and data file in table (in hdfs) directory. TBLPROPERTIES of Hive table are : 'bucketing

Cannot retrieve repository metadata (repomd.xml) for repository: sandbox. Please verify its path and try again

社会主义新天地 提交于 2019-12-10 09:55:45
问题 I have HDP 2.6.1 installed on VirtualBox and am attempting to run yum install python-pip However, the error below appears: http://dev2.hortonworks.com.s3.amazonaws.com/repo/dev/master/utils/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 403 Forbidden" Trying other mirror. To address this issue please refer to the below knowledge base article https://access.redhat.com/solutions/69319 If above article doesn't help to resolve this issue please open a ticket

Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe

ⅰ亾dé卋堺 提交于 2019-12-08 07:55:22
问题 Getting Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe while creating table on Hive. Below is the table creation script : CREATE EXTERNAL TABlE ratings(user_id INT, movie_id INT,rating INT,rating_time String) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="::") LOCATION '/user/hive/ratings'; HDP Version : 2.1.1 回答1: You are

ExecuteSQL doesn't select table if it having dateTime Offset value?

笑着哭i 提交于 2019-12-08 06:01:37
问题 I have created table with single column having data type -dateTimeOffset value and inserted some values. create table dto (dto datetimeoffset(7)) insert into dto values (GETDATE()) -- inserts date and time with 0 offset insert into dto values (SYSDATETIMEOFFSET()) -- current date time and offset insert into dto values ('20131114 08:54:00 +10:00') -- manual way In Nifi,i have specified "Select * from dto" query in Execute SQL . It shows below error.., java.lang.IllegalArgumentException:

Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe

故事扮演 提交于 2019-12-08 05:33:24
Getting Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe while creating table on Hive. Below is the table creation script : CREATE EXTERNAL TABlE ratings(user_id INT, movie_id INT,rating INT,rating_time String) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="::") LOCATION '/user/hive/ratings'; HDP Version : 2.1.1 You are facing this problem because your hive lib does not have hive-contrib jar or hive-site.xml is not pointing

ExecuteSQL doesn't select table if it having dateTime Offset value?

六眼飞鱼酱① 提交于 2019-12-07 15:18:32
I have created table with single column having data type -dateTimeOffset value and inserted some values. create table dto (dto datetimeoffset(7)) insert into dto values (GETDATE()) -- inserts date and time with 0 offset insert into dto values (SYSDATETIMEOFFSET()) -- current date time and offset insert into dto values ('20131114 08:54:00 +10:00') -- manual way In Nifi,i have specified "Select * from dto" query in Execute SQL . It shows below error.., java.lang.IllegalArgumentException: createSchema: Unknown SQL type -155 cannot be converted to Avro type If i change that column into dateTime