hadoop2 | 易学教程

HDFS IO error org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 i

阅读更多关于 HDFS IO error org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 i

问题 I am using Flume 1.6.0 in a virtual machine and Hadoop 2.7.1 in another virtual machine . When I send Avro Events to the Flume 1.6.0 and it try to write on Hadoop 2.7.1 HDFS System. The follwing exception occurs (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:455)] HDFS IO error org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 at org.apache.hadoop.ipc.Client.call

How do you retrieve the replication factor info in Hdfs files?

阅读更多关于 How do you retrieve the replication factor info in Hdfs files?

问题 I have set the replication factor for my file as follows: hadoop fs -D dfs.replication=5 -copyFromLocal file.txt /user/xxxx When a NameNode restarts, it makes sure under-replicated blocks are replicated. Hence the replication info for the file is stored (possibly in nameNode ). How can I get that information? 回答1: Try to use command hadoop fs -stat %r /path/to/file , it should print the replication factor. 回答2: You can run following command to get replication factor, hadoop fs -ls /user/xxxx

Running Hadoop MR jobs without Admin privilege on Windows

阅读更多关于 Running Hadoop MR jobs without Admin privilege on Windows

问题 I have installed Hadoop 2.3.0 in windows and able to execute MR jobs successfully. But when I trying to execute MR jobs in normal privilege (without admin privilege) means job get fails with following exception. Here I tried with Pig Script sample. 2014-10-15 12:02:32,822 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:kaveen (auth:SIMPLE) cause:java.io.IOException: Split class org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit not

Where is the classpath set for hadoop

阅读更多关于 Where is the classpath set for hadoop

问题 Where is the classpath for hadoop set? When I run the below command it gives me the classpath. Where is the classpath set? bin/hadoop classpath I'm using hadoop 2.6.0 回答1: As said by almas shaikh it's set in hadoop-config.sh , but you could add more jars to it in hadoop-env.sh Here is a relevant code from hadoop-env.sh which adds additional jars like capacity-scheduler and aws jar's. export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} # Extra Java CLASSPATH elements. Automatically insert

hive.HiveImport: FAILED: SemanticException [Error 10072]: Database does not exist:

阅读更多关于 hive.HiveImport: FAILED: SemanticException [Error 10072]: Database does not exist:

I am trying to import MySQL database into Hive to analysis of large MySQL Data according to Blog there are couple of ways to do this Non realtime: Sqoop Realtime: Hadoop Applier for MySQL so I decided to go with the ' Non realtime ' approach and I have setup the Hadoop cluster with 4 node, Sqoop and Hive which working fine with following versions Name Version Apache Hadoop 2.6.0 Apache Hive hive-0.14.0 Apache Sqoop sqoop-1.4.5.bin__hadoop-2.0.4-alpha Now when I am trying to import data using following command Import Command sqoop-import-all-tables --verbose --connect jdbc:mysql://X.X.X.X

Importing CSV file into Hadoop

阅读更多关于 Importing CSV file into Hadoop

问题 I am new with Hadoop, I have a file to import into hadoop via command line (I access the machine through SSH) How can I import the file in hadoop? How can I check afterward (command)? 回答1: 2 steps to import csv file move csv file to hadoop sanbox (/home/username) using winscp or cyberduck. use -put command to move file from local location to hdfs. hdfs dfs -put /home/username/file.csv /user/data/file.csv 来源： https://stackoverflow.com/questions/34277239/importing-csv-file-into-hadoop

how many blocks are made in Hadoop for the following example?

阅读更多关于 how many blocks are made in Hadoop for the following example?

问题 Assume my HDFS block size is 64 MB. I have 4 files: File A: 64MB * 3 + 2 MB; File B: 62 MB; There should be 4 blocks for File A each with 64 MB and one with 2 MB. There should be one block for File B with 62 MB. So in total there should be 6 blocks Just because there is "free" space in the one of the blocks of File A which stores only 2 MB, file B does NOT get appended to same block. Is it correct? I have seen some tutorials where they say the "free" space in the block is utilized. 回答1:

One reducer in Custom Partitioner makes mapreduce jobs slower

阅读更多关于 One reducer in Custom Partitioner makes mapreduce jobs slower

问题 Hi i have an application that reads records from HBase and writes into text files. Application is working as expected but when tested this for huge data it is taking 1.20 hour to complete the job . Here is the details of my application Size the data in the HBase is 400 GB approx 2 billions records . I have created 400 regions in the HBase tabl so 400 mappers . I have used custom Partitioner that puts records into 194 text files. I have lzo compression for map output and gzip for final output.

RM job was stuck when running with oozie

阅读更多关于 RM job was stuck when running with oozie

问题 I'm running a mapreduce wordcount job task on oozie. 2 jobs were submitted to the yarn, and then the monitoring tasks running upto 99% were stuck. Wordcount job has been 0%. When I kill off the monitor job, wordcount job runs smoothly. I use a cluster of 3 virtual machines, configuration is as follows: Profile per VM: cores=2 memory=2048MB reserved=0GB usableMem=0GB disks=1 Num Container=3 Container Ram=640MB Used Ram=1GB Unused Ram=0GB yarn.scheduler.minimum-allocation-mb=640 yarn.scheduler

hive.HiveImport: FAILED: SemanticException [Error 10072]: Database does not exist:

阅读更多关于 hive.HiveImport: FAILED: SemanticException [Error 10072]: Database does not exist:

问题 I am trying to import MySQL database into Hive to analysis of large MySQL Data according to Blog there are couple of ways to do this Non realtime: Sqoop Realtime: Hadoop Applier for MySQL so I decided to go with the ' Non realtime ' approach and I have setup the Hadoop cluster with 4 node, Sqoop and Hive which working fine with following versions Name Version Apache Hadoop 2.6.0 Apache Hive hive-0.14.0 Apache Sqoop sqoop-1.4.5.bin__hadoop-2.0.4-alpha Now when I am trying to import data using