环境
- CentOS 6.5
- Apache Hadoop 2.6.5
- JDK 7 (Hadoop3.0以上需要JDK8)
- hadoop-2.6.5.tar.gz
搭建步骤
参考官方文档: https://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation
-
安装jdk7
# 安装 rpm -i jdk-7u67-linux-x64.rpm whereis java # 配置环境变量 vi + /etc/profile # 添加下面代码到profile 末尾 export JAVA_HOME=/usr/java/jdk1.7.0_67 PATH=$PATH:$JAVA_HOME/bin # 使profile生效 . /etc/profile # 验证是否配置成功 jps
-
设置ssh免密钥登陆
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
# 使用ssh本地登录,如果才会有.ssh文件 ssh localhost # 切换到家目录 cd # 查看是否有.ssh 文件 并进入 ll -a cd ~/.ssh/ # 生成id dsa 公钥 ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa # 把生成的公钥追加到authorized_keys 文件中,完成免密要操作 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
-
安装hadoop
# 创建目录 mkdir -p /opt/hadoop # 解压 tar xf hadoop-2.6.5.tar.gz -C /opt/hadoop
-
配置hadoop环境变量
vi + /etc/profile # 修改profile 环境变量代码 export JVAV_HOME=/usr/java/jdk1.7.0_67 export HADOOP_HOME=/opt/hadoop/hadoop-2.6.5 PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin # 使profile生效 . /etc/profile # 测试是否成功 hdfs
-
修改hadoop路径相关配置文件
# 修改hadoop-env.sh 中 export JAVA_HOME= 的值为jdk路径 vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/hadoop-env.sh export JAVA_HOME=/usr/java/jdk1.7.0_67 # 修改 mapred-env.sh 计算框架中 # export JAVA_HOME 的值为jdk路径 vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/mapred-env.sh export JAVA_HOME=/usr/java/jdk1.7.0_67 # 修改 yarn-env.sh 资源管理框架 # export JAVA_HOME= 的值为jdk路径 vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/yarn-env.sh export JAVA_HOME=/usr/java/jdk1.7.0_67
-
设置主节点和副本数配置信息
Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
Configuration
Use the following:
etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
dfs.namenode.secondary.http-address 0.0.0.0:50090 The secondary namenode http server address and port. dfs.namenode.secondary.https-address 0.0.0.0:50091 The secondary namenode HTTPS server address and port. # 设置主节点配置信息 粘入core-site.xml中,修改块文件、元数据信息存放路径 vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/core-site.xml <property> <name>fs.defaultFS</name> <value>hdfs://node01:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/var/hadoop/pseudo</value> </property> # 设置副本数 粘入hdfs-site.xml,设置 secondaryNode 在hdfs-site.xml vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/hdfs-site.xml <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>node01:50090</value> </property>
-
设置从节点NN
# 设置从节点 把slaves中localhost 换成从节点主机名 vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/slaves node01
启动
Execution
The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.
Format the filesystem:
$ bin/hdfs namenode -format
Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
The hadoop daemon log output is written to the
$HADOOP_LOG_DIR
directory (defaults to$HADOOP_HOME/logs
).Browse the web interface for the NameNode; by default it is available at:
- NameNode -
http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/<username>
Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input
Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar grep input output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output $ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*
When you’re done, stop the daemons with:
$ sbin/stop-dfs.sh
# 格式化namenode 操作
hdfs namenode -format
# 在每次启动格式化后会产生内存镜像文件,命令文件 和clusterID文件
cd /var/hadoop/pseudo/dfs/name/current
# 启动集群
[root@Node001 ~]# /opt/hadoop/hadoop-2.6.5/sbin/start-dfs.sh
Starting namenodes on [node0001]
node0001: starting namenode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-namenode-Node001.out
node0001: starting datanode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-datanode-Node001.out
Starting secondary namenodes [node0001]
node0001: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-secondarynamenode-Nod
e001.out
# 验证是否启动成功 是否有 datenode namenode secondarynamenode
[root@node0001 ~]# jps
1434 SecondaryNameNode
1195 NameNode
1536 Jps
1273 DataNode
# 查看浏览器通讯端口
ss -nal
使用浏览器访问
-
查看端口
[root@node0001 ~]# ss -nal State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 *:50070 *:* LISTEN 0 128 :::22 :::* LISTEN 0 128 *:22 *:* LISTEN 0 100 ::1:25 :::* LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 128 *:50010 *:* LISTEN 0 128 *:50075 *:* LISTEN 0 128 *:50020 *:* LISTEN 0 128 192.168.9.101:9000 *:* LISTEN 0 128 192.168.9.101:50090 *:* [root@node0001 ~]#
-
浏览器使用 http://node0001:50070/ 查看网页后台
上传文件
# /user/root 为默认目录 可以不写
hdfs dfs -mkdir -p /user/root
# 查看hdfs中创建好的目录
hdfs dfs -ls /
# 创建文件
for i in `seq 100000`;do echo "test hello $i" >> test.txt;done
# 上传文件测试 /user/root 为默认目录 可以不写
hdfs dfs -put test.txt /user/root
其他
- 关闭hadoop
stop-dfs.sh
- 日志文件路径
hadoop-2.6.5/logs
- 块文件存贮路径
/var/hadoop/pseudo/dfs/data/current/BP-1247525349-127.0.0.1-1572210034470/current/finalized/subdir0/subdir0
- 操作记录与内存镜像文件存储地址
/var/hadoop/pseudo/dfs/name/current
- name clusterID 在
/var/hadoop/pseudo/dfs/name/current/VERSION
中查看 - data 查看 clusterID
/var/hadoop/pseudo/dfs/name/current/VERSION
来源:CSDN
作者:得过且过1223
链接:https://blog.csdn.net/dgqg1223/article/details/104188377