1MapReduce概述

1.1 MapReduce

1.2MapReduce优缺点

1.2.1 优点

1.2.2 缺点

1.3apReduce核心

MapReduce

1）分布式的运算程序往往需要分成至少2个阶段。

2）第一个阶段的MapTask并发实例，完全并行运行，互不相干。

3）第二个阶段的ReduceTask并发实例互不相干，但是他们的数据依赖于上一个阶段的所有MapTask并发实例的输出。

4）MapReduce编程模型只能包含一个Map阶段和一个Reduce阶段，如果用户的业务逻辑非常复杂，那就只能多个MapReduce

总结分析WordCount数据流走向深入MapReduce

1.4apReduce进程

1.5WordCount源码

采用WordCountMap类、Reduce类和且Hadoop类型

1.6 常用数据

Hadoop

Java	Hadoop Writable
boolean	BooleanWritable
byte	ByteWritable
int	IntWritable
float	FloatWritable
long	LongWritable
double	DoubleWritable
String	Text
map	MapWritable
array	ArrayWritable

1.7MapReduce编程规范

MapperReducerDriver

1.8WordCount

在给定的文本文件中统计输出每一个单词出现的总次数

atguigu atguigu ss ss cls cls jiao banzhang xue hadoop

atguigu 2 banzhang 1 cls 2 hadoop 1 jiao 1 ss 2 xue 1

2．需求

按照MapReduceMapperReducerDriver

3．环境

1maven

2pom.xml文件添加如下依赖

<dependencies>         <dependency>             <groupId>junit</groupId>             <artifactId>junit</artifactId>             <version>RELEASE</version>         </dependency>         <dependency>             <groupId>org.apache.logging.log4j</groupId>             <artifactId>log4j-core</artifactId>             <version>2.8.2</version>         </dependency>         <dependency>             <groupId>org.apache.hadoop</groupId>             <artifactId>hadoop-common</artifactId>             <version>2.7.2</version>         </dependency>         <dependency>             <groupId>org.apache.hadoop</groupId>             <artifactId>hadoop-client</artifactId>             <version>2.7.2</version>         </dependency>         <dependency>             <groupId>org.apache.hadoop</groupId>             <artifactId>hadoop-hdfs</artifactId>             <version>2.7.2</version>         </dependency> </dependencies>

2src/main/resourceslog4j.properties

log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=target/spring.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

4．编写程序

1Mapper

package com.atguigu.mapreduce; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;  public class WordcountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{          Text k = new Text();     IntWritable v = new IntWritable(1);          @Override     protected void map(LongWritable key, Text value, Context context)    throws IOException, InterruptedException {                  // 1 获取一行         String line = value.toString();                  // 2 切割         String[] words = line.split(" ");                  // 3 输出         for (String word : words) {                          k.set(word);             context.write(k, v);         }     } }

2Reducer

package com.atguigu.mapreduce.wordcount; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer;  public class WordcountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{  int sum; IntWritable v = new IntWritable();      @Override     protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {                  // 1 累加求和         sum = 0;         for (IntWritable count : values) {             sum += count.get();         }                  // 2 输出        v.set(sum);         context.write(key,v);     } }

3Driver驱动类

package com.atguigu.mapreduce.wordcount; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  public class WordcountDriver {      public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {          // 1 获取配置信息以及封装任务         Configuration configuration = new Configuration();         Job job = Job.getInstance(configuration);          // 2 设置jar加载路径         job.setJarByClass(WordcountDriver.class);          // 3 设置map和reduce类         job.setMapperClass(WordcountMapper.class);         job.setReducerClass(WordcountReducer.class);          // 4 设置map输出         job.setMapOutputKeyClass(Text.class);         job.setMapOutputValueClass(IntWritable.class);          // 5 设置最终输出kv类型         job.setOutputKeyClass(Text.class);         job.setOutputValueClass(IntWritable.class);                  // 6 设置输入和输出路径         FileInputFormat.setInputPaths(job, new Path(args[0]));         FileOutputFormat.setOutputPath(job, new Path(args[1]));          // 7 提交         boolean result = job.waitForCompletion(true);          System.exit(result ? 0 : 1);     } }

1win7的将win7的hadoop jar解压在WindowsHADOOP_HOME环境。是电脑win10操作系统，win10的hadoop jar，HADOOP_HOME环境。

　　注意win8电脑win10家庭

2Eclipse/Idea程序

0maven打jar，依赖

注意标记为自己

<build>         <plugins>             <plugin>                 <artifactId>maven-compiler-plugin</artifactId>                 <version>2.3.2</version>                 <configuration>                     <source>1.8</source>                     <target>1.8</target>                 </configuration>             </plugin>             <plugin>                 <artifactId>maven-assembly-plugin </artifactId>                 <configuration>                     <descriptorRefs>                         <descriptorRef>jar-with-dependencies</descriptorRef>                     </descriptorRefs>                     <archive>                         <manifest>                             <mainClass>com.atguigu.mr.WordcountDriver</mainClass>                         </manifest>                     </archive>                 </configuration>                 <executions>                     <execution>                         <id>make-assembly</id>                         <phase>package</phase>                         <goals>                             <goal>single</goal>                         </goals>                     </execution>                 </executions>             </plugin>         </plugins>     </build>

注意显示在->maven->update project

1jarHadoop

步骤详情->Run as->maven install。会在项目targetjar。看-Refresh修改不jar为wc.jar，并拷贝该jarHadoop

2Hadoop

3WordCount

[atguigu@hadoop102 software]$ hadoop jar  wc.jar  com.atguigu.wordcount.WordcountDriver /user/atguigu/input /user/atguigu/output

来源: https://www.cnblogs.com/Diyo/p/11374849.html

标签

MapReduce

Hadoop

mapreduce实例

log4j