hortonworks-sandbox | 易学教程

Error “No such container sandbox-hdp” when trying to install docker image on RHEL7

阅读更多关于 Error “No such container sandbox-hdp” when trying to install docker image on RHEL7

问题 I am trying to get the HDP sandbox running on RHEL7. I am however getting "no such container sandbox-hdp" error message when I try to run docker-deploy-hdp30.sh. sudo sh docker-deploy-hdp30.sh + registry=hortonworks + name=sandbox-hdp + version=3.0.1 + proxyName=sandbox-proxy + proxyVersion=1.0 + flavor=hdp + echo hdp + mkdir -p sandbox/proxy/conf.d + mkdir -p sandbox/proxy/conf.stream.d + docker pull hortonworks/sandbox-hdp:3.0.1 3.0.1: Pulling from hortonworks/sandbox-hdp 70799bbf2226: Pull

Error “No such container sandbox-hdp” when trying to install docker image on RHEL7

阅读更多关于 Error “No such container sandbox-hdp” when trying to install docker image on RHEL7

How do I fix this Kryo exception when using a UDF on hive?

阅读更多关于 How do I fix this Kryo exception when using a UDF on hive?

问题 I have a hive query that worked in hortonworks 2.6 sandbox, but it doesn't work on sandbox ver. 3.0 because of this exception: Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 95 Serialization trace: parentOperators (org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkLongOperator) childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)

Using hive database in spark

阅读更多关于 Using hive database in spark

问题 I am new in spark and trying to run some queries on tpcds benchmark tables, using HortonWorks Sandbox. http://www.tpc.org/tpcds/ There is no problem while using hive through shell or hive-view on sandbox. The problem is that I don't know how connect to the database if I want to use the spark. How can I use a hive database in spark for running the queries? The only solution that I know till now is to rebuild each table manually and load data in them using the following scala codes, which is

Hortonworks sandbox Install on LinuxVM?

阅读更多关于 Hortonworks sandbox Install on LinuxVM?

问题 How to install hortonworks sandbox on LinuxVM? Any video tutorials will be highly appreciated. 回答1: Hortonworks Sandbox Installation on Oracle Virtual Machine: Download HDP sandbox from here and extract it. Download Virtual Box from here AND Install Virtual Box on Windows. Now Open Oracle Virtual Box and go to File "Menu" and click on "Import Appliance". Set Name, CPU, RAM etc. as per your configuration and Click on "Import" button. (It will take time. Please wait for it.) After installation,

How can I fix java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0?

阅读更多关于 How can I fix java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0?

问题 I get a java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0 exception in my query. Here's the query: WITH t1 as (select * from browserdata join citydata on cityid=id), t2 as (select uap.device as device, uap.os as os, uap.browser as browser, name as cityname from t1 lateral view ParseUserAgentUDTF(UserAgent) uap as device, os, browser), t3 as (select t2.cityname as cityname, t2.device as device, t2.browser as browser, t2.os as os, count(*) as count from t2

Installing Apache Spark using yum

阅读更多关于 Installing Apache Spark using yum

问题 I am in the process of installing spark in my organization's HDP box. I run yum install spark and it installs Spark 1.4.1. How do I install Spark 2.0? Please help! 回答1: Spark 2 is supported (as a technical preview) in HDP 2.5. You can get the specific HDP 2.5 repo added to your yum repo directory and then install the same. Spark 1.6.2 is the version default in HDP 2.5. wget http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.0.0/hdp.repo sudo cp hdp.repo /etc/yum.repos.d/hdp.repo

Not able to send json tweets events to Kafka topic/producer using kafka command line

阅读更多关于 Not able to send json tweets events to Kafka topic/producer using kafka command line

问题 I have created a python script raw_tweets_stream.py to stream twitter data using twitter api. The json data from twitter is pipped to kafka producer using the script below. `python raw_tweets_stream.py | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list localhost:2181 --topic raw_json_tweets` raw_json_tweets is the kafka topic created for these tweets. The python script raw_tweets_stream.py runs just fine but it throws error while sending it to the kafka producer. I am

Apache NiFi - OutOfMemory Error: GC overhead limit exceeded on SplitText processor

阅读更多关于 Apache NiFi - OutOfMemory Error: GC overhead limit exceeded on SplitText processor

问题 I am trying to use NiFi to process large CSV files (potentially billions of records each) using HDF 1.2. I've implemented my flow, and everything is working fine for small files. The problem is that if I try to push the file size to 100MB (1M records) I get a java.lang.OutOfMemoryError: GC overhead limit exceeded from the SplitText processor responsible of splitting the file into single records. I've searched for that, and it basically means that the garbage collector is executed for too long

Apache NiFi - OutOfMemory Error: GC overhead limit exceeded on SplitText processor

阅读更多关于 Apache NiFi - OutOfMemory Error: GC overhead limit exceeded on SplitText processor

I am trying to use NiFi to process large CSV files (potentially billions of records each) using HDF 1.2. I've implemented my flow, and everything is working fine for small files. The problem is that if I try to push the file size to 100MB (1M records) I get a java.lang.OutOfMemoryError: GC overhead limit exceeded from the SplitText processor responsible of splitting the file into single records. I've searched for that, and it basically means that the garbage collector is executed for too long without obtaining much heap space. I expect this means that too many flow files are being generated