设置Reducer数目

##一、实验内容
本篇介绍如何控制reduce的数目。当结果文件很多时,都会发现一般是以part-r-00000 形式出现多个文件,其实这个reducer的数目有关系,reducer数目多,结果文件数目就多。

在初始化job的时候,是可以设置reducer的数目的。复制一份MapReduce3工程,取名为MapReduce4,

LogJob.java代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
package com.shiyanlou.mapreduce;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class LogJob {

public static void main(String[] args) throws Exception {
// 输入路径
String inputPath = args[0];
if (inputPath.endsWith("/")) {
inputPath = inputPath.substring(0, inputPath.length() - 1);
}
// 输出路径
String outputPath = inputPath + "/output";
// reducer数目
int numReducer = Integer.parseInt(args[1]);

Configuration conf = new Configuration();
Job job = new Job(conf, "sum_did_from_log_file");
job.setJarByClass(LogJob.class);

job.setMapperClass(LogMapper.class);
job.setReducerClass(LogReducer.class);
job.setNumReduceTasks(numReducer);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

Path path1 = new Path(inputPath);
Path path2 = new Path(outputPath);

removeFolder(path2, conf);

MultipleOutputs.addNamedOutput(job, "result", TextOutputFormat.class,
Text.class, IntWritable.class);

FileInputFormat.addInputPath(job, path1);
FileOutputFormat.setOutputPath(job, path2);

System.exit(job.waitForCompletion(true) ? 0 : 1);
}

/**
* 清除目录
*
* @param path
* @param conf
* @throws IOException
*/

private static void removeFolder(Path path, Configuration conf)
throws IOException {

FileSystem fs = path.getFileSystem(conf);
if (fs.exists(path)) {
fs.delete(path);
}
}
}

运行结果,通过观察jobtracker,的确reducer数目为1了。并且结果文件也变成了只有一个:

./hadoop dfs -ls input/output/

二、小结

MapReduce中可以通过setNumReduceTasks设置Reducer的数目,从而改变结果文件的数目。

Comments