hadoop2 | 易学教程

How do I scale my AWS EMR cluster with 1 master and 2 core nodes using AWS auto scaling? Is there a way?

阅读更多关于 How do I scale my AWS EMR cluster with 1 master and 2 core nodes using AWS auto scaling? Is there a way?

问题 I have implemented a cluster using AWS EMR. I have a master ndoe with 2 core nodes with hadoop bootstrap action. Now, I would like to use autoscaling and adjust the cluster size dynamically based on cpu threshold and some other constraints. BUt, I have no idea as there isn't much information on the web on how to use AutoScaling on already existing cluster. Any help. 回答1: Currently you can't launch a EMR CLuster in a AutoScaling Group. But you can achieve a very similar goal by delivering your

How do I scale my AWS EMR cluster with 1 master and 2 core nodes using AWS auto scaling? Is there a way?

阅读更多关于 How do I scale my AWS EMR cluster with 1 master and 2 core nodes using AWS auto scaling? Is there a way?

Hive table source delimited by multiple spaces

阅读更多关于 Hive table source delimited by multiple spaces

问题 How can I make the following table source delimiter by one or more white spaces : CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT)

Execution Error, return code 1 while executing query in hive for twitter sentiment analysis

阅读更多关于 Execution Error, return code 1 while executing query in hive for twitter sentiment analysis

问题 I am doing twitter sentiment analysis using hadoop, flume and hive. I have created the table using hive -f tweets.sql tweets.sql --create the tweets_raw table containing the records as received from Twitter SET hive.support.sql11.reserved.keywords=false; CREATE EXTERNAL TABLE Mytweets_raw ( id BIGINT, created_at STRING, source STRING, favorited BOOLEAN, retweet_count INT, retweeted_status STRUCT< text:STRING, user:STRUCT<screen_name:STRING,name:STRING>>, entities STRUCT< urls:ARRAY<STRUCT

Parsing Data in Apache Spark Scala org.apache.spark.SparkException: Task not serializable error when trying to use textinputformat.record.delimiter

阅读更多关于 Parsing Data in Apache Spark Scala org.apache.spark.SparkException: Task not serializable error when trying to use textinputformat.record.delimiter

问题 Input file: ___DATE___ 2018-11-16T06:3937 Linux hortonworks 3.10.0-514.26.2.el7.x86_64 #1 SMP Fri Jun 30 05:26:04 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux 06:39:37 up 100 days, 1:04, 2 users, load average: 9.01, 8.30, 8.48 06:30:01 AM all 6.08 0.00 2.83 0.04 0.00 91.06 ___DATE___ 2018-11-16T06:4037 Linux cloudera 3.10.0-514.26.2.el7.x86_64 #1 SMP Fri Jun 30 05:26:04 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux 06:40:37 up 100 days, 1:05, 28 users, load average: 8.39, 8.26, 8.45 06:40:01 AM all 6.92

Too many open files in spark aborting spark job

阅读更多关于 Too many open files in spark aborting spark job

问题 In my application i am reading 40 GB text files that is totally spread across 188 files . I split this files and create xml files per line in spark using pair rdd . For 40 GB of input it will create many millions small xml files and this is my requirement. All working fine but when spark saves files in S3 it throws error and job fails . Here is the exception i get Caused by: java.nio.file.FileSystemException: /mnt/s3/emrfs-2408623010549537848/0000000000: Too many open files at sun.nio.fs

How to merge all part files in a folder created by SPARK data frame and rename as folder name in scala

阅读更多关于 How to merge all part files in a folder created by SPARK data frame and rename as folder name in scala

问题 Hi i have output of my spark data frame which creates folder structure and creates so may part files . Now i have to merge all part files inside the folder and rename that one file as folder path name . This is how i do partition df.write.partitionBy("DataPartition","PartitionYear") .format("csv") .option("nullValue", "") .option("header", "true")/ .option("codec", "gzip") .save("hdfs:///user/zeppelin/FinancialLineItem/output") It creates folder structure like this hdfs:///user/zeppelin

Hadoop Mahout Clustering

阅读更多关于 Hadoop Mahout Clustering

问题 I am trying to apply canopy clustering in Mahout. I already converted a text file into sequence file. But i cannot view the sequence file. Anyways I thought of applying canopy clustering by giving the following command, hduser@ubuntu:/usr/local/mahout/trunk$ mahout canopy -i /user/Hadoop/mahout_seq/seqdata -o /user/Hadoop/clustered_data -t1 5 -t2 3 I got the following error, 16/05/10 17:02:03 INFO mapreduce.Job: Task Id : attempt_1462850486830_0008_m_000000_1, Status : FAILED Error: java.lang

hive shell not starting

阅读更多关于 hive shell not starting

问题 hadoop_1@shubho-HP-Notebook:~$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop_1/apache-hive-2.3.2-bin/lib /log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop_1/hadoop/share/hadoop/common /lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type

Ambari shows service as stopped

阅读更多关于 Ambari shows service as stopped

问题 We are using Hortonworks HDP 2.1 with Ambari 1.6.1 After a crash in our underlying hardware we restarted our cluster some days ago. We got everything back up again, however, Ambari shows that two services are still down, the YARN Resource Manager and the MapReduce History Server. Both of those services are running, verified both by checking running processes on the server as well as checking the provided functionality. Nagios healthchecks are also ok. Still, Ambari shows the services as being