hortonworks-data-platform

pyspark interpreter not found in apache zeppelin

阅读更多关于 pyspark interpreter not found in apache zeppelin

问题 I am having issue with using pyspark in Apache-Zeppelin (version 0.6.0) notebook. Running the following simple code gives me pyspark interpreter not found error %pyspark a = 1+3 Running sc.version gave me res2: String = 1.6.0 which is the version of spark installed on my machine. And running z return res0: org.apache.zeppelin.spark.ZeppelinContext = {} Pyspark works from CLI (using spark 1.6.0 and python 2.6.6) The default python on the machine 2.6.6, while anaconda-python 3.5 is also

Install error: ftheader.h: No such file or directory

阅读更多关于 Install error: ftheader.h: No such file or directory

问题 When I am trying to build matplotlib-1.3.1, I am getting the below freetype header errors. Probably it is not finding the ftheader.h. Any idea on how to solve this problem? NOTE: I just installed Freetype-2.5.0.1 following the instructions as mentioned in FreeType Install because manually building Matplotlib-1.3.1 from source was failing due to the required package 'freetype' which was not found initially. In file included from src/ft2font.h:16, from src/ft2font.cpp:3: /usr/include/ft2build.h

Issue in connecting kafka from outside

阅读更多关于 Issue in connecting kafka from outside

问题 I am using hortonwork Sandbox for kafka server trying to connect kafka from eclipse with java code . Use this configuration to connect to producer to send the message metadata.broker.list=sandbox.hortonworks.com:45000 serializer.class=kafka.serializer.DefaultEncoder zk.connect=sandbox.hortonworks.com:2181 request.required.acks=0 producer.type=sync where sandbox.hortonworks.com is sandboxname to whom i connect in kafka server.properties I changed this configuration host.name=sandbox

How to disable Transparent Huge Pages (THP) in Ubuntu 16.04LTS

阅读更多关于 How to disable Transparent Huge Pages (THP) in Ubuntu 16.04LTS

问题 I am setting up an ambari cluster with 3 virtualbox VMs running Ubuntu 16.04LTS. However I get the below warning: The following hosts have Transparent Huge Pages (THP) enabled. THP should be disabled to avoid potential Hadoop performance issues. How can I disable THP in Ubuntu 16.04? 回答1: Did you try this command: sudo su echo never > /sys/kernel/mm/transparent_hugepage/enabled ? Alternatively, you may install hugepages sudo su apt-get install hugepages hugeadm --thp-never As mentioned by

Hive tables not found when running in YARN-Cluster mode

阅读更多关于 Hive tables not found when running in YARN-Cluster mode

问题 I have a Spark (version 1.4.1) application on HDP 2.3. It works fine when running it in YARN-Client mode. However, when running it on YARN-Cluster mode none of my Hive tables can be found by the application. I submit the application like so: ./bin/spark-submit --class com.myCompany.Main --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 10g --executor-cores 1 --jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-rdbms-3.2.9.jar,lib/datanucleus-core-3.2.10.jar

Missing hive-site when using spark-submit YARN cluster mode

阅读更多关于 Missing hive-site when using spark-submit YARN cluster mode

问题 Using HDP 2.5.3 and I've been trying to debug some YARN container classpath issues. Since HDP includes both Spark 1.6 and 2.0.0, there have been some conflicting versions Users I support are successfully able to use Spark2 with Hive queries in YARN client mode, but not from cluster mode they get errors about tables not found, or something like that because the Metastore connection isn't established. I am guessing that setting either --driver-class-path /etc/spark2/conf:/etc/hive/conf or

Missing hive-site when using spark-submit YARN cluster mode

阅读更多关于 Missing hive-site when using spark-submit YARN cluster mode

Spark read file from S3 using sc.textFile ("s3n://…)

阅读更多关于 Spark read file from S3 using sc.textFile ("s3n://…)

问题 Trying to read a file located in S3 using spark-shell: scala> val myRdd = sc.textFile("s3n://myBucket/myFile1.log") lyrics: org.apache.spark.rdd.RDD[String] = s3n://myBucket/myFile1.log MappedRDD[55] at textFile at <console>:12 scala> myRdd.count java.io.IOException: No FileSystem for scheme: s3n at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2607) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2614) at org.apache.hadoop.fs.FileSystem.access$200

Hortonworks shc unresolved dependencies

阅读更多关于 Hortonworks shc unresolved dependencies

问题 I would like to use the hbase hortonworks connector. github guide But I don't know how to import it in my project. I have the following build.sbt : name := "project" version := "1.0" scalaVersion := "2.11.8" libraryDependencies ++= Seq( "org.apache.spark" % "spark-core_2.11" % "2.2.0", "org.apache.spark" % "spark-sql_2.11" % "2.2.0", "org.scala-lang" % "scala-compiler" % "2.11.8", "com.hortonworks" % "shc" % "1.1.2-2.1-s_2.11-SNAPSHOT" ) And it gives me the follwing unresolved dependencies :

HDP 2.2@Linux/CentOS@OracleVM (Hortonworks) fails on remote submission from Eclipse@Windows

阅读更多关于 HDP 2.2@Linux/CentOS@OracleVM (Hortonworks) fails on remote submission from Eclipse@Windows

问题 I have HDP 2.2 running on CentOS within OracleVM on my local machine (Windows 7) in Pseudo Distro mode. Wanted to test it for remote submission and hence created a WordCount example in Eclipse running outside OVM and submitted as follows (example I chose is from somewhere else on the net itself) Path inputPath = new Path("/hdfsinput"); Path outputDir = new Path("/hdfsoutput"); // Create configuration Configuration conf = new Configuration(true); // create inputPath on HDFS if needed