Amazon Web Service EMR FileSystem

问题

I am trying to run a job on an AWS EMR cluster. The problem Im getting is the following:

aws java.io.IOException: No FileSystem for scheme: hdfs

I dont know where exactly my problem resides (in my java jar job or in the configurations of the job)

In my S3 bucket Im making a folder (input) and in it im putting a bunch of files with my data. Then in the arguments Im giving the path for the input folder which then same path is used as the FileInputPath.getInputPath(args[0]).

My question is - First will the job grab all files in the input folder and work on them all or I have to supply all of the paths of each file?

Second question - How can I solve the above Exception?

Thanks

回答1:

Keep your input files in S3 . e.g. s3://mybucket/input/ Keep all your file to be pressed in input folder under my bucket.

In you map reduce use code as below

FileInputFormat.addInputPath(job,"s3n://mybucket/input/")

This will automatically process all files under input folder.

来源：https://stackoverflow.com/questions/26460177/amazon-web-service-emr-filesystem

标签

java

Hadoop

amazon-web-services

amazon-s3

elastic-map-reduce

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!