hortonworks-sandbox

Error “No such container sandbox-hdp” when trying to install docker image on RHEL7

▼魔方 西西 提交于 2021-01-29 05:12:27
问题 I am trying to get the HDP sandbox running on RHEL7. I am however getting "no such container sandbox-hdp" error message when I try to run docker-deploy-hdp30.sh. sudo sh docker-deploy-hdp30.sh + registry=hortonworks + name=sandbox-hdp + version=3.0.1 + proxyName=sandbox-proxy + proxyVersion=1.0 + flavor=hdp + echo hdp + mkdir -p sandbox/proxy/conf.d + mkdir -p sandbox/proxy/conf.stream.d + docker pull hortonworks/sandbox-hdp:3.0.1 3.0.1: Pulling from hortonworks/sandbox-hdp 70799bbf2226: Pull

Error “No such container sandbox-hdp” when trying to install docker image on RHEL7

谁说胖子不能爱 提交于 2021-01-29 05:10:25
问题 I am trying to get the HDP sandbox running on RHEL7. I am however getting "no such container sandbox-hdp" error message when I try to run docker-deploy-hdp30.sh. sudo sh docker-deploy-hdp30.sh + registry=hortonworks + name=sandbox-hdp + version=3.0.1 + proxyName=sandbox-proxy + proxyVersion=1.0 + flavor=hdp + echo hdp + mkdir -p sandbox/proxy/conf.d + mkdir -p sandbox/proxy/conf.stream.d + docker pull hortonworks/sandbox-hdp:3.0.1 3.0.1: Pulling from hortonworks/sandbox-hdp 70799bbf2226: Pull

How do I fix this Kryo exception when using a UDF on hive?

倖福魔咒の 提交于 2019-12-24 19:29:50
问题 I have a hive query that worked in hortonworks 2.6 sandbox, but it doesn't work on sandbox ver. 3.0 because of this exception: Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 95 Serialization trace: parentOperators (org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkLongOperator) childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator) childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)

Using hive database in spark

社会主义新天地 提交于 2019-12-24 00:50:05
问题 I am new in spark and trying to run some queries on tpcds benchmark tables, using HortonWorks Sandbox. http://www.tpc.org/tpcds/ There is no problem while using hive through shell or hive-view on sandbox. The problem is that I don't know how connect to the database if I want to use the spark. How can I use a hive database in spark for running the queries? The only solution that I know till now is to rebuild each table manually and load data in them using the following scala codes, which is

Hortonworks sandbox Install on LinuxVM?

只谈情不闲聊 提交于 2019-12-12 04:21:28
问题 How to install hortonworks sandbox on LinuxVM? Any video tutorials will be highly appreciated. 回答1: Hortonworks Sandbox Installation on Oracle Virtual Machine: Download HDP sandbox from here and extract it. Download Virtual Box from here AND Install Virtual Box on Windows. Now Open Oracle Virtual Box and go to File "Menu" and click on "Import Appliance". Set Name, CPU, RAM etc. as per your configuration and Click on "Import" button. (It will take time. Please wait for it.) After installation,

How can I fix java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0?

徘徊边缘 提交于 2019-12-11 17:03:32
问题 I get a java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.0 exception in my query. Here's the query: WITH t1 as (select * from browserdata join citydata on cityid=id), t2 as (select uap.device as device, uap.os as os, uap.browser as browser, name as cityname from t1 lateral view ParseUserAgentUDTF(UserAgent) uap as device, os, browser), t3 as (select t2.cityname as cityname, t2.device as device, t2.browser as browser, t2.os as os, count(*) as count from t2

Installing Apache Spark using yum

会有一股神秘感。 提交于 2019-12-11 04:33:59
问题 I am in the process of installing spark in my organization's HDP box. I run yum install spark and it installs Spark 1.4.1. How do I install Spark 2.0? Please help! 回答1: Spark 2 is supported (as a technical preview) in HDP 2.5. You can get the specific HDP 2.5 repo added to your yum repo directory and then install the same. Spark 1.6.2 is the version default in HDP 2.5. wget http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.0.0/hdp.repo sudo cp hdp.repo /etc/yum.repos.d/hdp.repo

Not able to send json tweets events to Kafka topic/producer using kafka command line

大城市里の小女人 提交于 2019-12-10 23:21:11
问题 I have created a python script raw_tweets_stream.py to stream twitter data using twitter api. The json data from twitter is pipped to kafka producer using the script below. `python raw_tweets_stream.py | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list localhost:2181 --topic raw_json_tweets` raw_json_tweets is the kafka topic created for these tweets. The python script raw_tweets_stream.py runs just fine but it throws error while sending it to the kafka producer. I am

Apache NiFi - OutOfMemory Error: GC overhead limit exceeded on SplitText processor

旧巷老猫 提交于 2019-12-06 06:49:47
问题 I am trying to use NiFi to process large CSV files (potentially billions of records each) using HDF 1.2. I've implemented my flow, and everything is working fine for small files. The problem is that if I try to push the file size to 100MB (1M records) I get a java.lang.OutOfMemoryError: GC overhead limit exceeded from the SplitText processor responsible of splitting the file into single records. I've searched for that, and it basically means that the garbage collector is executed for too long

Apache NiFi - OutOfMemory Error: GC overhead limit exceeded on SplitText processor

妖精的绣舞 提交于 2019-12-04 10:24:22
I am trying to use NiFi to process large CSV files (potentially billions of records each) using HDF 1.2. I've implemented my flow, and everything is working fine for small files. The problem is that if I try to push the file size to 100MB (1M records) I get a java.lang.OutOfMemoryError: GC overhead limit exceeded from the SplitText processor responsible of splitting the file into single records. I've searched for that, and it basically means that the garbage collector is executed for too long without obtaining much heap space. I expect this means that too many flow files are being generated