2.Hadoop_HDFS1.x_伪分布式搭建

跟風遠走 提交于 2020-02-06 02:54:27

环境

  • CentOS 6.5
  • Apache Hadoop 2.6.5
  • JDK 7 (Hadoop3.0以上需要JDK8)
  • hadoop-2.6.5.tar.gz

搭建步骤

参考官方文档: https://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation

  1. 安装jdk7

    # 安装
    rpm -i jdk-7u67-linux-x64.rpm 
    whereis java
    # 配置环境变量
    vi + /etc/profile
    # 添加下面代码到profile 末尾
    export JAVA_HOME=/usr/java/jdk1.7.0_67
    PATH=$PATH:$JAVA_HOME/bin
    # 使profile生效
    . /etc/profile
    # 验证是否配置成功
    jps
    
  2. 设置ssh免密钥登陆

    Now check that you can ssh to the localhost without a passphrase:

     $ ssh localhost
    

    If you cannot ssh to localhost without a passphrase, execute the following commands:

     $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
     $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    
    # 使用ssh本地登录,如果才会有.ssh文件
    ssh localhost
    # 切换到家目录
    cd 
    # 查看是否有.ssh 文件 并进入
    ll -a
    cd ~/.ssh/
    # 生成id dsa 公钥
    ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
    # 把生成的公钥追加到authorized_keys 文件中,完成免密要操作
    cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    
  3. 安装hadoop

    # 创建目录
    mkdir -p /opt/hadoop
    # 解压
    tar xf hadoop-2.6.5.tar.gz -C /opt/hadoop
    
  4. 配置hadoop环境变量

    vi + /etc/profile
    # 修改profile 环境变量代码
    export JVAV_HOME=/usr/java/jdk1.7.0_67
    export HADOOP_HOME=/opt/hadoop/hadoop-2.6.5
    PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    # 使profile生效
    . /etc/profile
    # 测试是否成功
    hdfs
    
  5. 修改hadoop路径相关配置文件

    # 修改hadoop-env.sh 中 export JAVA_HOME= 的值为jdk路径
    vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/hadoop-env.sh
    export JAVA_HOME=/usr/java/jdk1.7.0_67
    
    # 修改 mapred-env.sh 计算框架中 # export JAVA_HOME 的值为jdk路径
    vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/mapred-env.sh
    export JAVA_HOME=/usr/java/jdk1.7.0_67
    
    # 修改 yarn-env.sh 资源管理框架 # export JAVA_HOME= 的值为jdk路径
    vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/yarn-env.sh
    export JAVA_HOME=/usr/java/jdk1.7.0_67
    
  6. 设置主节点和副本数配置信息

    Pseudo-Distributed Operation

    Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

    Configuration

    Use the following:

    etc/hadoop/core-site.xml:

    <configuration>
       <property>
           <name>fs.defaultFS</name>
           <value>hdfs://localhost:9000</value>
       </property>
    </configuration>
    

    etc/hadoop/hdfs-site.xml:

    <configuration>
       <property>
           <name>dfs.replication</name>
           <value>1</value>
       </property> 
    </configuration>
    

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pGkVLRiv-1580907195727)(2.Hadoop_HDFS1.x_伪分布式搭建/image-20191027203002463.png)]

    dfs.namenode.secondary.http-address 0.0.0.0:50090 The secondary namenode http server address and port.
    dfs.namenode.secondary.https-address 0.0.0.0:50091 The secondary namenode HTTPS server address and port.
    # 设置主节点配置信息 粘入core-site.xml中,修改块文件、元数据信息存放路径
    vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/core-site.xml
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://node01:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/var/hadoop/pseudo</value>
        </property>
    
    # 设置副本数 粘入hdfs-site.xml,设置 secondaryNode 在hdfs-site.xml 
    vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/hdfs-site.xml
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
         <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>node01:50090</value>
        </property>
    
  7. 设置从节点NN

    # 设置从节点 把slaves中localhost 换成从节点主机名
    vi /opt/hadoop/hadoop-2.6.5/etc/hadoop/slaves 
    node01
    

启动

Execution

The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

  1. Format the filesystem:

      $ bin/hdfs namenode -format
    
  2. Start NameNode daemon and DataNode daemon:

      $ sbin/start-dfs.sh
    

    The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

  3. Browse the web interface for the NameNode; by default it is available at:

    • NameNode - http://localhost:50070/
  4. Make the HDFS directories required to execute MapReduce jobs:

      $ bin/hdfs dfs -mkdir /user
      $ bin/hdfs dfs -mkdir /user/<username>
    
  5. Copy the input files into the distributed filesystem:

      $ bin/hdfs dfs -put etc/hadoop input
    
  6. Run some of the examples provided:

      $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar grep input output 'dfs[a-z.]+'
    
  7. Examine the output files:

    Copy the output files from the distributed filesystem to the local filesystem and examine them:

      $ bin/hdfs dfs -get output output
      $ cat output/*
    

    or

    View the output files on the distributed filesystem:

      $ bin/hdfs dfs -cat output/*
    
  8. When you’re done, stop the daemons with:

      $ sbin/stop-dfs.sh
    
# 格式化namenode 操作
hdfs namenode -format
# 在每次启动格式化后会产生内存镜像文件,命令文件 和clusterID文件
cd /var/hadoop/pseudo/dfs/name/current
# 启动集群
[root@Node001 ~]# /opt/hadoop/hadoop-2.6.5/sbin/start-dfs.sh
Starting namenodes on [node0001]
node0001: starting namenode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-namenode-Node001.out
node0001: starting datanode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-datanode-Node001.out
Starting secondary namenodes [node0001]
node0001: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.6.5/logs/hadoop-root-secondarynamenode-Nod
e001.out

# 验证是否启动成功 是否有 datenode namenode secondarynamenode
[root@node0001 ~]# jps
1434 SecondaryNameNode
1195 NameNode
1536 Jps
1273 DataNode


# 查看浏览器通讯端口
ss -nal

使用浏览器访问

  • 查看端口

    [root@node0001 ~]# ss -nal
    State      Recv-Q Send-Q                        Local Address:Port                          Peer Address:Port 
    LISTEN     0      128                                       *:50070                                    *:*     
    LISTEN     0      128                                      :::22                                      :::*     
    LISTEN     0      128                                       *:22                                       *:*     
    LISTEN     0      100                                     ::1:25                                      :::*     
    LISTEN     0      100                               127.0.0.1:25                                       *:*     
    LISTEN     0      128                                       *:50010                                    *:*     
    LISTEN     0      128                                       *:50075                                    *:*     
    LISTEN     0      128                                       *:50020                                    *:*     
    LISTEN     0      128                           192.168.9.101:9000                                     *:*     
    LISTEN     0      128                           192.168.9.101:50090                                    *:*     
    [root@node0001 ~]# 
    
  • 浏览器使用 http://node0001:50070/ 查看网页后台

    在这里插入图片描述

上传文件

# /user/root 为默认目录 可以不写
hdfs dfs -mkdir -p /user/root
# 查看hdfs中创建好的目录
hdfs dfs -ls /
# 创建文件
for i in `seq 100000`;do echo "test hello $i" >> test.txt;done
# 上传文件测试  /user/root 为默认目录 可以不写
hdfs dfs -put test.txt /user/root 

其他

  • 关闭hadoop stop-dfs.sh
  • 日志文件路径 hadoop-2.6.5/logs
  • 块文件存贮路径 /var/hadoop/pseudo/dfs/data/current/BP-1247525349-127.0.0.1-1572210034470/current/finalized/subdir0/subdir0
  • 操作记录与内存镜像文件存储地址 /var/hadoop/pseudo/dfs/name/current
  • name clusterID 在/var/hadoop/pseudo/dfs/name/current/VERSION 中查看
  • data 查看 clusterID /var/hadoop/pseudo/dfs/name/current/VERSION
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!