hortonworks-data-platform

pyspark interpreter not found in apache zeppelin

旧巷老猫 提交于 2019-12-22 10:46:04
问题 I am having issue with using pyspark in Apache-Zeppelin (version 0.6.0) notebook. Running the following simple code gives me pyspark interpreter not found error %pyspark a = 1+3 Running sc.version gave me res2: String = 1.6.0 which is the version of spark installed on my machine. And running z return res0: org.apache.zeppelin.spark.ZeppelinContext = {} Pyspark works from CLI (using spark 1.6.0 and python 2.6.6) The default python on the machine 2.6.6, while anaconda-python 3.5 is also

Install error: ftheader.h: No such file or directory

我的梦境 提交于 2019-12-21 07:25:24
问题 When I am trying to build matplotlib-1.3.1, I am getting the below freetype header errors. Probably it is not finding the ftheader.h. Any idea on how to solve this problem? NOTE: I just installed Freetype-2.5.0.1 following the instructions as mentioned in FreeType Install because manually building Matplotlib-1.3.1 from source was failing due to the required package 'freetype' which was not found initially. In file included from src/ft2font.h:16, from src/ft2font.cpp:3: /usr/include/ft2build.h

Issue in connecting kafka from outside

末鹿安然 提交于 2019-12-21 02:41:02
问题 I am using hortonwork Sandbox for kafka server trying to connect kafka from eclipse with java code . Use this configuration to connect to producer to send the message metadata.broker.list=sandbox.hortonworks.com:45000 serializer.class=kafka.serializer.DefaultEncoder zk.connect=sandbox.hortonworks.com:2181 request.required.acks=0 producer.type=sync where sandbox.hortonworks.com is sandboxname to whom i connect in kafka server.properties I changed this configuration host.name=sandbox

How to disable Transparent Huge Pages (THP) in Ubuntu 16.04LTS

微笑、不失礼 提交于 2019-12-18 19:00:55
问题 I am setting up an ambari cluster with 3 virtualbox VMs running Ubuntu 16.04LTS. However I get the below warning: The following hosts have Transparent Huge Pages (THP) enabled. THP should be disabled to avoid potential Hadoop performance issues. How can I disable THP in Ubuntu 16.04? 回答1: Did you try this command: sudo su echo never > /sys/kernel/mm/transparent_hugepage/enabled ? Alternatively, you may install hugepages sudo su apt-get install hugepages hugeadm --thp-never As mentioned by

Hive tables not found when running in YARN-Cluster mode

↘锁芯ラ 提交于 2019-12-17 20:28:43
问题 I have a Spark (version 1.4.1) application on HDP 2.3. It works fine when running it in YARN-Client mode. However, when running it on YARN-Cluster mode none of my Hive tables can be found by the application. I submit the application like so: ./bin/spark-submit --class com.myCompany.Main --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 10g --executor-cores 1 --jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-rdbms-3.2.9.jar,lib/datanucleus-core-3.2.10.jar

Missing hive-site when using spark-submit YARN cluster mode

↘锁芯ラ 提交于 2019-12-17 16:49:14
问题 Using HDP 2.5.3 and I've been trying to debug some YARN container classpath issues. Since HDP includes both Spark 1.6 and 2.0.0, there have been some conflicting versions Users I support are successfully able to use Spark2 with Hive queries in YARN client mode, but not from cluster mode they get errors about tables not found, or something like that because the Metastore connection isn't established. I am guessing that setting either --driver-class-path /etc/spark2/conf:/etc/hive/conf or

Missing hive-site when using spark-submit YARN cluster mode

血红的双手。 提交于 2019-12-17 16:49:10
问题 Using HDP 2.5.3 and I've been trying to debug some YARN container classpath issues. Since HDP includes both Spark 1.6 and 2.0.0, there have been some conflicting versions Users I support are successfully able to use Spark2 with Hive queries in YARN client mode, but not from cluster mode they get errors about tables not found, or something like that because the Metastore connection isn't established. I am guessing that setting either --driver-class-path /etc/spark2/conf:/etc/hive/conf or

Spark read file from S3 using sc.textFile ("s3n://…)

陌路散爱 提交于 2019-12-17 02:29:37
问题 Trying to read a file located in S3 using spark-shell: scala> val myRdd = sc.textFile("s3n://myBucket/myFile1.log") lyrics: org.apache.spark.rdd.RDD[String] = s3n://myBucket/myFile1.log MappedRDD[55] at textFile at <console>:12 scala> myRdd.count java.io.IOException: No FileSystem for scheme: s3n at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2607) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.access$200

Hortonworks shc unresolved dependencies

只愿长相守 提交于 2019-12-14 01:56:05
问题 I would like to use the hbase hortonworks connector. github guide But I don't know how to import it in my project. I have the following build.sbt : name := "project" version := "1.0" scalaVersion := "2.11.8" libraryDependencies ++= Seq( "org.apache.spark" % "spark-core_2.11" % "2.2.0", "org.apache.spark" % "spark-sql_2.11" % "2.2.0", "org.scala-lang" % "scala-compiler" % "2.11.8", "com.hortonworks" % "shc" % "1.1.2-2.1-s_2.11-SNAPSHOT" ) And it gives me the follwing unresolved dependencies :

HDP 2.2@Linux/CentOS@OracleVM (Hortonworks) fails on remote submission from Eclipse@Windows

两盒软妹~` 提交于 2019-12-13 15:34:15
问题 I have HDP 2.2 running on CentOS within OracleVM on my local machine (Windows 7) in Pseudo Distro mode. Wanted to test it for remote submission and hence created a WordCount example in Eclipse running outside OVM and submitted as follows (example I chose is from somewhere else on the net itself) Path inputPath = new Path("/hdfsinput"); Path outputDir = new Path("/hdfsoutput"); // Create configuration Configuration conf = new Configuration(true); // create inputPath on HDFS if needed