Hadoop2.2 对字符统计的测试

笑着哭i 提交于 2019-12-01 12:56:50

打开官方下载链接 http://hadoop.apache.org/releases.html#Download  ,选择2.2.0版本的发布包下载后解压到指定路径下:

$ tar -zxf hadoop-2.2.0.tar.gz -C /usr/local/
$ cd /usr/local
$ ln -s hadoop-2.2.0 hadoop

那么本文中HADOOP_HOME = /usr/local/hadoop/.

3、配置hadoop用户的环境变量 vi ~/.bash_profile ,添加如下内容:

# set java environment
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk.x86_64
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

# Michael@micmiu.com
# Hadoop 
export HADOOP_PREFIX="/usr/local/hadoop" 
export PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
export HADOOP_COMMON_HOME=${HADOOP_PREFIX} 
export HADOOP_HDFS_HOME=${HADOOP_PREFIX} 
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}

4、编辑 <HADOOP_HOME>/etc/hadoop/hadoop-env.sh

修改JAVA_HOME的配置:

export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk.x86_64

5、编辑 <HADOOP_HOME>/etc/hadoop/yarn-env.sh

修改JAVA_HOME的配置:

export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk.x86_64

6、编辑  <HADOOP_HOME>/etc/hadoop/core-site.xml

在<configuration>节点下添加或者更新下面的配置信息:

<!-- 新变量f:s.defaultFS 代替旧的:fs.default.name |micmiu.com-->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://Master.Hadoop:9000</value>
    <description>The name of the default file system.</description>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <!-- 注意创建相关的目录结构 -->
    <value>/usr/local/hadoop/temp</value>
    <description>A base for other temporary directories.</description>
</property>

7、编辑<HADOOP_HOME>/etc/hadoop/hdfs-site.xml

在<configuration>节点下添加或者更新下面的配置信息:

<property>
    <name>dfs.replication</name>
    <!-- 值需要与实际的DataNode节点数要一致,本文为3 -->
    <value>3</value>
    </property>
<property>
    <name>dfs.namenode.name.dir</name>
    <!-- 注意创建相关的目录结构 -->
    <value>file:/usr/local/hadoop/dfs/name</value>
    <final>true</final>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
     <!-- 注意创建相关的目录结构 -->
    <value>file:/usr/local/hadoop/dfs/data</value>
</property>

8、编辑<HADOOP_HOME>/etc/hadoop/yarn-site.xml

在<configuration>节点下添加或者更新下面的配置信息:

<!-- micmiu.com -->
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<!--  resourcemanager hostname或ip地址-->
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>Master.Hadoop</value>
</property>

9、编辑 <HADOOP_HOME>/etc/hadoop/mapred-site.xml

默认没有mapred-site.xml文件,copy  mapred-site.xml.template 一份为 mapred-site.xml即可

在<configuration>节点下添加或者更新下面的配置信息:

<!-- micmiu.com -->
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <final>true</final>
</property>

[三]、启动和测试

1、启动Hadoop

1.1、第一次启动需要在Master.Hadoop 执行format hdfs namenode -format :

[hadoop@Master ~]$ hdfs namenode -format
14/01/22 15:43:10 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Master.Hadoop/192.168.6.77
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.2.0
STARTUP_MSG:   classpath =
........................................
............micmiu.com.............
........................................
STARTUP_MSG:   java = 1.6.0_20
************************************************************/
14/01/22 15:43:10 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
Formatting using clusterid: CID-645f2ed2-6f02-4c24-8cbc-82b09eca963d
14/01/22 15:43:11 INFO namenode.HostFileManager: read includes:
HostSet(
)
14/01/22 15:43:11 INFO namenode.HostFileManager: read excludes:
HostSet(
)
14/01/22 15:43:11 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
14/01/22 15:43:11 INFO util.GSet: Computing capacity for map BlocksMap
14/01/22 15:43:11 INFO util.GSet: VM type       = 64-bit
14/01/22 15:43:11 INFO util.GSet: 2.0% max memory = 888.9 MB
14/01/22 15:43:11 INFO util.GSet: capacity      = 2^21 = 2097152 entries
14/01/22 15:43:11 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
14/01/22 15:43:11 INFO blockmanagement.BlockManager: defaultReplication         = 3
14/01/22 15:43:11 INFO blockmanagement.BlockManager: maxReplication             = 512
14/01/22 15:43:11 INFO blockmanagement.BlockManager: minReplication             = 1
14/01/22 15:43:11 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
14/01/22 15:43:11 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
14/01/22 15:43:11 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
14/01/22 15:43:11 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
14/01/22 15:43:11 INFO namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
14/01/22 15:43:11 INFO namenode.FSNamesystem: supergroup          = supergroup
14/01/22 15:43:11 INFO namenode.FSNamesystem: isPermissionEnabled = true
14/01/22 15:43:11 INFO namenode.FSNamesystem: HA Enabled: false
14/01/22 15:43:11 INFO namenode.FSNamesystem: Append Enabled: true
14/01/22 15:43:11 INFO util.GSet: Computing capacity for map INodeMap
14/01/22 15:43:11 INFO util.GSet: VM type       = 64-bit
14/01/22 15:43:11 INFO util.GSet: 1.0% max memory = 888.9 MB
14/01/22 15:43:11 INFO util.GSet: capacity      = 2^20 = 1048576 entries
14/01/22 15:43:11 INFO namenode.NameNode: Caching file names occuring more than 10 times
14/01/22 15:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
14/01/22 15:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
14/01/22 15:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
14/01/22 15:43:11 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
14/01/22 15:43:11 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
14/01/22 15:43:11 INFO util.GSet: Computing capacity for map Namenode Retry Cache
14/01/22 15:43:11 INFO util.GSet: VM type       = 64-bit
14/01/22 15:43:11 INFO util.GSet: 0.029999999329447746% max memory = 888.9 MB
14/01/22 15:43:11 INFO util.GSet: capacity      = 2^15 = 32768 entries
14/01/22 15:43:11 INFO common.Storage: Storage directory /usr/local/hadoop/dfs/name has been successfully formatted.
14/01/22 15:43:11 INFO namenode.FSImage: Saving image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
14/01/22 15:43:11 INFO namenode.FSImage: Image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 198 bytes saved in 0 seconds.
14/01/22 15:43:11 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
14/01/22 15:43:11 INFO util.ExitUtil: Exiting with status 0
14/01/22 15:43:11 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Master.Hadoop/192.168.6.77
************************************************************/

1.2、在Master.Hadoop执行 start-dfs.sh :

[hadoop@Master ~]$ start-dfs.sh 
Starting namenodes on [Master.Hadoop]
Master.Hadoop: starting namenode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-namenode-Master.Hadoop.out
Slave7.Hadoop: starting datanode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-datanode-Slave7.Hadoop.out
Slave5.Hadoop: starting datanode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-datanode-Slave5.Hadoop.out
Slave6.Hadoop: starting datanode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-datanode-Slave6.Hadoop.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.2.0/logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out

在Master.Hadoop 验证启动进程:

[hadoop@Master ~]$ jps
7695 Jps
7589 SecondaryNameNode
7403 NameNode

在SlaveX.Hadop 验证启动进程如下:

[hadoop@Slave5 ~]$ jps
8724 DataNode
8815 Jps

1.3、在Master.Hadoop 执行 start-yarn.sh :

[hadoop@Master ~]$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-hadoop-resourcemanager-Master.Hadoop.out
Slave7.Hadoop: starting nodemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-Slave7.Hadoop.out
Slave5.Hadoop: starting nodemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-Slave5.Hadoop.out
Slave6.Hadoop: starting nodemanager, logging to /usr/local/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-Slave6.Hadoop.out

在Master.Hadoop 验证启动进程:

[hadoop@Master ~]$ jps
8071 Jps
7589 SecondaryNameNode
7821 ResourceManager
7403 NameNode

在SlaveX.Hadop 验证启动进程如下:

[hadoop@Slave5 ~]$ jps
9013 Jps
8724 DataNode
8882 NodeManager

2、演示

2.1、演示hdfs 一些常用命令,为wordcount演示做准备:

[hadoop@Master ~]$ hdfs dfs -ls /
[hadoop@Master ~]$ hdfs dfs -mkdir /user
[hadoop@Master ~]$ hdfs dfs -mkdir -p /user/micmiu/wordcount/in
[hadoop@Master ~]$ hdfs dfs -ls /user/micmiu/wordcount
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2014-01-22 16:01 /user/micmiu/wordcount/in

2.2、本地创建三个文件 micmiu-01.txt、micmiu-03.txt、micmiu-03.txt, 分别写入如下内容:

micmiu-01.txt:

Hi Michael welcome to Hadoop 
more see micmiu.com

micmiu-02.txt:

Hi Michael welcome to BigData
more see micmiu.com

micmiu-03.txt:

Hi Michael welcome to Spark 
more see micmiu.com

把 micmiu 打头的三个文件上传到hdfs:

[hadoop@Master ~]$ hdfs dfs -put micmiu*.txt /user/micmiu/wordcount/in
[hadoop@Master ~]$ hdfs dfs -ls /user/micmiu/wordcount/in
Found 3 items
-rw-r--r--   3 hadoop supergroup         50 2014-01-22 16:06 /user/micmiu/wordcount/in/micmiu-01.txt
-rw-r--r--   3 hadoop supergroup         50 2014-01-22 16:06 /user/micmiu/wordcount/in/micmiu-02.txt
-rw-r--r--   3 hadoop supergroup         49 2014-01-22 16:06 /user/micmiu/wordcount/in/micmiu-03.txt

2.3、然后cd 切换到Hadoop的根目录下执行

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount  /user/micmiu/wordcount/in /user/micmiu/wordcount/out

ps: hdfs 中 /user/micmiu/wordcount/out 目录不能存在 否则运行报错。

看到类似如下的日志信息:

[hadoop@Master hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount  /user/micmiu/wordcount/in /user/micmiu/wordcount/out
14/01/22 16:36:28 INFO client.RMProxy: Connecting to ResourceManager at Master.Hadoop/192.168.6.77:8032
14/01/22 16:36:29 INFO input.FileInputFormat: Total input paths to process : 3
14/01/22 16:36:29 INFO mapreduce.JobSubmitter: number of splits:3
............................
.....micmiu.com........
............................
File System Counters
		FILE: Number of bytes read=297
		FILE: Number of bytes written=317359
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=536
		HDFS: Number of bytes written=83
		HDFS: Number of read operations=12
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=3
		Launched reduce tasks=1
		Data-local map tasks=3
		Total time spent by all maps in occupied slots (ms)=55742
		Total time spent by all reduces in occupied slots (ms)=3933
	Map-Reduce Framework
		Map input records=6
		Map output records=24
		Map output bytes=243
		Map output materialized bytes=309
		Input split bytes=387
		Combine input records=24
		Combine output records=24
		Reduce input groups=10
		Reduce shuffle bytes=309
		Reduce input records=24
		Reduce output records=10
		Spilled Records=48
		Shuffled Maps =3
		Failed Shuffles=0
		Merged Map outputs=3
		GC time elapsed (ms)=1069
		CPU time spent (ms)=12390
		Physical memory (bytes) snapshot=846753792
		Virtual memory (bytes) snapshot=5155561472
		Total committed heap usage (bytes)=499580928
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=149
	File Output Format Counters 
		Bytes Written=83

到此 wordcount的job已经执行完成,执行如下命令可以查看刚才job的执行结果:

[hadoop@Master hadoop]$ hdfs dfs -ls /user/micmiu/wordcount/out
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2014-01-22 16:38 /user/micmiu/wordcount/out/_SUCCESS
-rw-r--r--   3 hadoop supergroup         83 2014-01-22 16:38 /user/micmiu/wordcount/out/part-r-00000
[hadoop@Master hadoop]$ hdfs dfs -cat /user/micmiu/wordcount/out/part-r-00000
BigData	1
Hadoop	1
Hi	3
Michael	3
Spark	1
micmiu.com	3
more	3
see	3
to	3
welcome	3

打开浏览器输入:http://192.168.6.77(Master.Hadoop):8088 可查看相关的应用运行情况。

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!