Run a Local file system directory as input of a Mapper in cluster

后端未结

关注

 7  996

I gave an input to the mapper from a local filesystem.It is running successfully from eclipse,But not running from the cluster as it is unable to find the local input path s

相关标签:

7条回答

面向向阳花

2021-01-12 13:48
I have tried the following code and got the solution... Please try it and let me know..

You need to get FileSystem object for local file system and then use makequalified method to return path.. As we need to pass path of local filesystem(no other way to pass this to inputformat), i ve used make qualified, which in deed returns only local file system path..

The code is shown below..
```
Configuration conf = new Configuration();
FileSystem fs = FileSystem.getLocal(conf);
Path inputPath = fs.makeQualified(new Path("/usr/local/srini/"));  // local path

FileInputFormat.setInputPaths(job, inputPath);
```
I hope this works for your requirement, though it's posted very late.. It worked fine for me.. It does not need any configuration changes i believe..
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2021-01-12 13:53
U might wanna try this by setting the configuration as
```
Configuration conf=new Configuration();
conf.set("job.mapreduce.tracker","local");
conf.set("fs.default.name","file:///");
```
After this u can set the fileinputformat with the local path and u r good to go
0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2021-01-12 13:57
The question is an interesting one. One can have data on S3 and access this data without an explicit copy to HDFS prior to running the job. In the wordcount example, one would specify this as follows:

hadoop jar example.jar wordcount s3n://bucket/input s3n://bucket/output

What occurs in this is that the mappers read records directly from S3.

If this can be done with S3, why wouldn't hadoop similarly, using this syntax instead of s3n
```
file:///input file:///output
```
?

But empirically, this seems to fail in an interesting way -- I see that Hadoop gives a file not found exception for a file that is indeed in the input directory. That is, it seems to be able to list the files in the put directory on my local disk but when it comes time to open them to read the records, the file is not found (or accessible).
0 讨论(0)
发布评论:

提交评论
- 加载中...
我寻月下人不归

2021-01-12 14:02

Running in a cluster requires the data to be loaded into distributed storage (HDFS). Copy the data to HDFS first using hadoop fs -copyFromLocal and then try to trun your job again, giving it the path of the data in HDFS

0 讨论(0)
发布评论:

提交评论
- 加载中...
盖世英雄少女心

2021-01-12 14:02

The data must be on HDFS for any MapReduce job to process it. So even if you have a source such as local File System or a network path or a web based store (such as Azure Blob Storage or Amazon Block stoage), you would need to copy the data at HDFS first and then run the Job. The bottom line is that you would need to push the data first to to HDFS and there are several ways depend on data source, you would perform the data transfer from your source to HDFS such as from local file system you would use the following command:

$hadoop -f CopyFromLocal SourceFileOrStoragePath _HDFS__Or_directPathatHDFS_

0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2021-01-12 14:03

Try setting the input path like this

FileInputFormat.addInputPath(conf, new Path(file:///the directory on your local filesystem));

if you give the file extension, it can access files from the localsystem

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页