hortonworks-data-platform

Hive sort operation on high volume skewed dataset

阅读更多关于 Hive sort operation on high volume skewed dataset

问题 I am working on a big dataset of size around 3 TB on Hortonworks 2.6.5, the layout of the dataset is pretty straight forward. The heirarchy of data is as follows - -Country -Warehouse -Product -Product Type -Product Serial Id We have transaction data in the above hierarchy for 30 countries each country have more than 200 warehouse, single country USA contributes around 75% of the entire data set. Problem: 1) We have transaction data with transaction date column ( trans_dt ) for the above data

Spark on YARN: Less executor memory than set via spark-submit

阅读更多关于 Spark on YARN: Less executor memory than set via spark-submit

问题 I'm using Spark in a YARN cluster (HDP 2.4) with the following settings: 1 Masternode 64 GB RAM (48 GB usable) 12 cores (8 cores usable) 5 Slavenodes 64 GB RAM (48 GB usable) each 12 cores (8 cores usable) each YARN settings memory of all containers (of one host): 48 GB minimum container size = maximum container size = 6 GB vcores in cluster = 40 (5 x 8 cores of workers) minimum #vcores/container = maximum #vcores/container = 1 When I run my spark application with the command spark-submit -

Oracle Virtual Box error: failure to open a session with Hortonworks

阅读更多关于 Oracle Virtual Box error: failure to open a session with Hortonworks

问题 I've researched the questions already on stackoverflow that suggest upgrading to the most recent version of Virtual Box; one question at the time suggested upgrading to V4.3.14. Well, I'm on V 4.3.20. I've reinstalled about 5 times, and ensured the BIOS was set to virtualization. I continue to get the error message below. Failed to open a session for the virtual machine Hortonworks Sandbox with HDP 2.2. The virtual machine 'Hortonworks Sandbox with HDP 2.2' has terminated unexpectedly during

Hadoop streaming with python on Windows

阅读更多关于 Hadoop streaming with python on Windows

问题 I'm using Hortonworks HDP for Windows and have it successfully configured with a master and 2 slaves. I'm using the following command; bin\hadoop jar contrib\streaming\hadoop-streaming-1.1.0-SNAPSHOT.jar -files file:///d:/dev/python/mapper.py,file:///d:/dev/python/reducer.py -mapper "python mapper.py" -reducer "python reduce.py" -input /flume/0424/userlog.MDAC-HD1.MDAC.local..20130424.1366789040945 -output /flume/o%1 -cmdenv PYTHONPATH=c:\python27 The mapper runs through fine, but the log

Table loaded through Spark not accessible in Hive

阅读更多关于 Table loaded through Spark not accessible in Hive

问题 Hive table created through Spark (pyspark) are not accessible from Hive. df.write.format("orc").mode("overwrite").saveAsTable("db.table") Error while accessing from Hive: Error: java.io.IOException: java.lang.IllegalArgumentException: bucketId out of range: -1 (state=,code=0) Table getting created successfully in Hive and able to read this table back in spark. Table metadata is accessible (in Hive) and data file in table (in hdfs) directory. TBLPROPERTIES of Hive table are : 'bucketing

Cannot retrieve repository metadata (repomd.xml) for repository: sandbox. Please verify its path and try again

阅读更多关于 Cannot retrieve repository metadata (repomd.xml) for repository: sandbox. Please verify its path and try again

问题 I have HDP 2.6.1 installed on VirtualBox and am attempting to run yum install python-pip However, the error below appears: http://dev2.hortonworks.com.s3.amazonaws.com/repo/dev/master/utils/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 403 Forbidden" Trying other mirror. To address this issue please refer to the below knowledge base article https://access.redhat.com/solutions/69319 If above article doesn't help to resolve this issue please open a ticket

Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe

阅读更多关于 Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe

问题 Getting Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe while creating table on Hive. Below is the table creation script : CREATE EXTERNAL TABlE ratings(user_id INT, movie_id INT,rating INT,rating_time String) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="::") LOCATION '/user/hive/ratings'; HDP Version : 2.1.1 回答1: You are

ExecuteSQL doesn't select table if it having dateTime Offset value?

阅读更多关于 ExecuteSQL doesn't select table if it having dateTime Offset value?

问题 I have created table with single column having data type -dateTimeOffset value and inserted some values. create table dto (dto datetimeoffset(7)) insert into dto values (GETDATE()) -- inserts date and time with 0 offset insert into dto values (SYSDATETIMEOFFSET()) -- current date time and offset insert into dto values ('20131114 08:54:00 +10:00') -- manual way In Nifi,i have specified "Select * from dto" query in Execute SQL . It shows below error.., java.lang.IllegalArgumentException:

Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe

阅读更多关于 Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe

Getting Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe while creating table on Hive. Below is the table creation script : CREATE EXTERNAL TABlE ratings(user_id INT, movie_id INT,rating INT,rating_time String) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="::") LOCATION '/user/hive/ratings'; HDP Version : 2.1.1 You are facing this problem because your hive lib does not have hive-contrib jar or hive-site.xml is not pointing

ExecuteSQL doesn't select table if it having dateTime Offset value?

阅读更多关于 ExecuteSQL doesn't select table if it having dateTime Offset value?

I have created table with single column having data type -dateTimeOffset value and inserted some values. create table dto (dto datetimeoffset(7)) insert into dto values (GETDATE()) -- inserts date and time with 0 offset insert into dto values (SYSDATETIMEOFFSET()) -- current date time and offset insert into dto values ('20131114 08:54:00 +10:00') -- manual way In Nifi,i have specified "Select * from dto" query in Execute SQL . It shows below error.., java.lang.IllegalArgumentException: createSchema: Unknown SQL type -155 cannot be converted to Avro type If i change that column into dateTime