YARN log aggregation on AWS EMR - UnsupportedFileSystemException

守給你的承諾、 提交于 2019-12-06 03:59:50

问题


I am struggling to enable YARN log aggregation for my Amazon EMR cluster. I am following this documentation for the configuration:

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html#emr-plan-debugging-logs-archive

Under the section titled: "To aggregate logs in Amazon S3 using the AWS CLI".

I've verified that the hadoop-config bootstrap action puts the following in yarn-site.xml

<property><name>yarn.log-aggregation-enable</name><value>true</value></property>
<property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property>
<property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>3000</value></property>
<property><name>yarn.nodemanager.remote-app-log-dir</name><value>s3://mybucket/logs</value></property>

I can run a sample job (pi from hadoop-examples.jar) and see that it completed successfully on the ResourceManager's GUI.

It even creates a folder under s3://mybucket/logs named with the application id. But the folder is empty, and if I run yarn logs -applicationID <applicationId>, I get a stacktrace:

14/10/20 23:02:15 INFO client.RMProxy: Connecting to ResourceManager at /10.XXX.XXX.XXX:9022
Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
    at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:333)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:330)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
    at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:322)
    at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85)
    at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388)
    at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:112)
    at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
    at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199) 

Which is doesn't make any sense to me; I can run hdfs dfs -ls s3://mybucket/ and it lists the contents just fine. The machines are getting credentials from AWS IAM Roles, I've tried adding fs.s3n.awsAccessKeyId and such to core-site.xml with no change in behavior.

Any advice is much appreciated.


回答1:


Hadoop provides two fs interfaces - FileSystem and AbstractFileSystem. Most of the time, we work with FileSystem and use configuration options like fs.s3.impl to provide custom adapters.

yarn logs, however, uses the AbstractFileSystem interface.

If you can find an implementation of that for S3, you can specify it using fs.AbstractFileSystem.s3.impl.

See core-default.xml for examples of fs.AbstractFileSystem.hdfs.impl etc.



来源:https://stackoverflow.com/questions/26476728/yarn-log-aggregation-on-aws-emr-unsupportedfilesystemexception

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!