MapReduce概述

匿名 (未验证) 提交于 2019-12-02 23:55:01

1MapReduce概述

1.1 MapReduce

1.2MapReduce优缺点

1.2.1 优点

1.2.2 缺点

1.3apReduce核心

MapReduce

1)分布式的运算程序往往需要分成至少2个阶段。

2)第一个阶段的MapTask并发实例,完全并行运行,互不相干。

3)第二个阶段的ReduceTask并发实例互不相干,但是他们的数据依赖于上一个阶段的所有MapTask并发实例的输出。

4)MapReduce编程模型只能包含一个Map阶段和一个Reduce阶段,如果用户的业务逻辑非常复杂,那就只能多个MapReduce

总结分析WordCount数据流走向深入MapReduce

1.4apReduce进程

1.5WordCount源码

采用WordCountMap类、Reduce类和且Hadoop类型

1.6 常用数据

Hadoop

Java

Hadoop Writable

boolean

BooleanWritable

byte

ByteWritable

int

IntWritable

float

FloatWritable

long

LongWritable

double

DoubleWritable

String

Text

map

MapWritable

array

ArrayWritable

1.7MapReduce编程规范

MapperReducerDriver

1.8WordCount

1

在给定的文本文件中统计输出每一个单词出现的总次数

1

atguigu atguigu ss ss cls cls jiao banzhang xue hadoop

2

atguigu 2 banzhang 1 cls 2 hadoop 1 jiao 1 ss 2 xue 1

2.需求

按照MapReduceMapperReducerDriver

3.环境

1maven

2pom.xml文件添加如下依赖

<dependencies>         <dependency>             <groupId>junit</groupId>             <artifactId>junit</artifactId>             <version>RELEASE</version>         </dependency>         <dependency>             <groupId>org.apache.logging.log4j</groupId>             <artifactId>log4j-core</artifactId>             <version>2.8.2</version>         </dependency>         <dependency>             <groupId>org.apache.hadoop</groupId>             <artifactId>hadoop-common</artifactId>             <version>2.7.2</version>         </dependency>         <dependency>             <groupId>org.apache.hadoop</groupId>             <artifactId>hadoop-client</artifactId>             <version>2.7.2</version>         </dependency>         <dependency>             <groupId>org.apache.hadoop</groupId>             <artifactId>hadoop-hdfs</artifactId>             <version>2.7.2</version>         </dependency> </dependencies>

2src/main/resourceslog4j.properties

log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=target/spring.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

4.编写程序

1Mapper

package com.atguigu.mapreduce; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;  public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{          Text k = new Text();     IntWritable v = new IntWritable(1);          @Override     protected void map(LongWritable key, Text value, Context context)    throws IOException, InterruptedException {                  // 1 获取一行         String line = value.toString();                  // 2 切割         String[] words = line.split(" ");                  // 3 输出         for (String word : words) {                          k.set(word);             context.write(k, v);         }     } }

2Reducer

package com.atguigu.mapreduce.wordcount; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer;  public class WordcountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{  int sum; IntWritable v = new IntWritable();      @Override     protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {                  // 1 累加求和         sum = 0;         for (IntWritable count : values) {             sum += count.get();         }                  // 2 输出        v.set(sum);         context.write(key,v);     } }

3Driver驱动类

package com.atguigu.mapreduce.wordcount; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  public class WordcountDriver {      public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {          // 1 获取配置信息以及封装任务         Configuration configuration = new Configuration();         Job job = Job.getInstance(configuration);          // 2 设置jar加载路径         job.setJarByClass(WordcountDriver.class);          // 3 设置map和reduce类         job.setMapperClass(WordcountMapper.class);         job.setReducerClass(WordcountReducer.class);          // 4 设置map输出         job.setMapOutputKeyClass(Text.class);         job.setMapOutputValueClass(IntWritable.class);          // 5 设置最终输出kv类型         job.setOutputKeyClass(Text.class);         job.setOutputValueClass(IntWritable.class);                  // 6 设置输入和输出路径         FileInputFormat.setInputPaths(job, new Path(args[0]));         FileOutputFormat.setOutputPath(job, new Path(args[1]));          // 7 提交         boolean result = job.waitForCompletion(true);          System.exit(result ? 0 : 1);     } }

5

1win7的将win7的hadoop jar解压在WindowsHADOOP_HOME环境。是电脑win10操作系统,win10的hadoop jar,HADOOP_HOME环境。

  注意win8电脑win10家庭

  

2Eclipse/Idea程序

6

0maven打jar,依赖

注意标记为自己

<build>         <plugins>             <plugin>                 <artifactId>maven-compiler-plugin</artifactId>                 <version>2.3.2</version>                 <configuration>                     <source>1.8</source>                     <target>1.8</target>                 </configuration>             </plugin>             <plugin>                 <artifactId>maven-assembly-plugin </artifactId>                 <configuration>                     <descriptorRefs>                         <descriptorRef>jar-with-dependencies</descriptorRef>                     </descriptorRefs>                     <archive>                         <manifest>                             <mainClass>com.atguigu.mr.WordcountDriver</mainClass>                         </manifest>                     </archive>                 </configuration>                 <executions>                     <execution>                         <id>make-assembly</id>                         <phase>package</phase>                         <goals>                             <goal>single</goal>                         </goals>                     </execution>                 </executions>             </plugin>         </plugins>     </build>

注意显示在->maven->update project

1jarHadoop

步骤详情->Run as->maven install。会在项目targetjar。看-Refresh修改不jar为wc.jar,并拷贝该jarHadoop

2Hadoop

3WordCount

[atguigu@hadoop102 software]$ hadoop jar  wc.jar  com.atguigu.wordcount.WordcountDriver /user/atguigu/input /user/atguigu/output

来源: https://www.cnblogs.com/Diyo/p/11374849.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!