hadoop2 | 易学教程

Hadoop Error - All data nodes are aborting

阅读更多关于 Hadoop Error - All data nodes are aborting

问题 I am using Hadoop 2.3.0 version. Sometimes when I execute the Map reduce job, the below errors will get displayed. 14/08/10 12:14:59 INFO mapreduce.Job: Task Id : attempt_1407694955806_0002_m_000780_0, Status : FAILED Error: java.io.IOException: All datanodes 192.168.30.2:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError

YARN log aggregation on AWS EMR - UnsupportedFileSystemException

阅读更多关于 YARN log aggregation on AWS EMR - UnsupportedFileSystemException

问题 I am struggling to enable YARN log aggregation for my Amazon EMR cluster. I am following this documentation for the configuration: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html#emr-plan-debugging-logs-archive Under the section titled: "To aggregate logs in Amazon S3 using the AWS CLI". I've verified that the hadoop-config bootstrap action puts the following in yarn-site.xml <property><name>yarn.log-aggregation-enable</name><value>true</value><

Is there the equivalent for a `find` command in `hadoop`?

阅读更多关于 Is there the equivalent for a `find` command in `hadoop`?

问题 I know that from the terminal, one can do a find command to find files such as : find . -type d -name "*something*" -maxdepth 4 But, when I am in the hadoop file system, I have not found a way to do this. hadoop fs -find .... throws an error. How do people traverse files in hadoop? I'm using hadoop 2.6.0-cdh5.4.1 . 回答1: hadoop fs -find was introduced in Apache Hadoop 2.7.0. Most likely you're using an older version hence you don't have it yet. see: HADOOP-8989 for more information. In the

What is --direct mode in sqoop?

阅读更多关于 What is --direct mode in sqoop?

问题 As per my understanding sqoop is used to import or export table/data from the Database to HDFS or Hive or HBASE. And we can directly import a single table or list of tables. Internally mapreduce program (i think only map task) will run. My doubt is what is sqoop direct and what when to go with sqoop direct option? 回答1: Just read the Sqoop documentation! General principles are located here for imports and there for exports Some databases can perform imports in a more high-performance fashion

Troubles writing temp file on datanode with Hadoop

阅读更多关于 Troubles writing temp file on datanode with Hadoop

I would like to create a file during my program. However, I don't want this file to be written on HDFS but on the datanode filesystem where the map operation is executed. I tried the following approach : public void map(Object key, Text value, Context context) throws IOException, InterruptedException { // do some hadoop stuff, like counting words String path = "newFile.txt"; try { File f = new File(path); f.createNewFile(); } catch (IOException e) { System.out.println("Message easy to look up in the logs."); System.err.println("Error easy to look up in the logs."); e.printStackTrace(); throw e

Spark Indefinite Waiting with “Asked to send map output locations for shuffle”

阅读更多关于 Spark Indefinite Waiting with “Asked to send map output locations for shuffle”

问题 My jobs often hang with this kind of message: 14/09/01 00:32:18 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to spark@*:37619 Would be great if someone could explain what Spark is doing when it spits out this message. What does this message mean? What could the user be doing wrong to cause this? What configurables should be tuned? It's really hard to debug because it doesn't OOM, it doesn't give an ST, it just sits and sits and sits. This has been

kafka network Processor error in producer program(ArrayIndexOutOfBoundsException: 18)

阅读更多关于 kafka network Processor error in producer program(ArrayIndexOutOfBoundsException: 18)

I have below kafka producer Api program and i am new to kafka itself. Below code fetch data from one of API and send message to kafka topic. package kafka_Demo; import java.util.Properties; import java.io.BufferedReader; import java.io.InputStream; import java.io.InputStreamReader; import org.apache.kafka.clients.producer.*; import java.net.URL; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; public class HttpBasicAuth { public static void main(String[] args) { try { Properties props = new Properties(); props.put("bootstrap

is the output of map phase of the mapreduce job always sorted?

阅读更多关于 is the output of map phase of the mapreduce job always sorted?

I am a bit confused with the output I get from Mapper. For example, when I run a simple wordcount program, with this input text: hello world Hadoop programming mapreduce wordcount lets see if this works 12345678 hello world mapreduce wordcount this is the output that I get: 12345678 1 Hadoop 1 hello 1 hello 1 if 1 lets 1 mapreduce 1 mapreduce 1 programming 1 see 1 this 1 wordcount 1 wordcount 1 works 1 world 1 world 1 As you can see, the output from mapper is already sorted. I did not run Reducer at all. But I find in a different project that the output from mapper is not sorted. So I am

Namenode high availability client request

阅读更多关于 Namenode high availability client request

Can anyone please tell me that If I am using java application to request some file upload/download operations to HDFS with Namenode HA setup, Where this request go first? I mean how would client know that which namenode is active? It would be great if you provide some workflow type diagram or something that explains request steps in detail(start to end). If hadoop cluster is configured with HA, then it will have namenode IDs in hdfs-site.xml like this : <property> <name>dfs.ha.namenodes.mycluster</name> <value>namenode1,namenode2</value> </property> Whichever NameNode is started first will

How do you retrieve the replication factor info in Hdfs files?

阅读更多关于 How do you retrieve the replication factor info in Hdfs files?

I have set the replication factor for my file as follows: hadoop fs -D dfs.replication=5 -copyFromLocal file.txt /user/xxxx When a NameNode restarts, it makes sure under-replicated blocks are replicated. Hence the replication info for the file is stored (possibly in nameNode ). How can I get that information? Try to use command hadoop fs -stat %r /path/to/file , it should print the replication factor. You can run following command to get replication factor, hadoop fs -ls /user/xxxx The second column in the output signify replication factor for the file and for the folder it shows - , as shown