Error starting Spark in EMR 4.0

心不动则不痛 提交于 2019-12-13 14:29:29

问题


I created an EMR 4.0 instance in AWS with all available applications, including Spark. I did it manually, through AWS Console. I started the cluster and SSHed to the master node when it was up. There I ran pyspark. I am getting the following error when pyspark tries to create SparkContext:

2015-09-03 19:36:04,195 ERROR Thread-3 spark.SparkContext (Logging.scala:logError(96)) - -ec2-user, access=WRITE, inode="/user":hdfs:hadoop:drwxr-xr-x at

org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)

I haven't added any custom applications, nor bootstrapping and expected everything to work without errors. Not sure what's going on. Any suggestions will be greatly appreciated.


回答1:


Login as the user "hadoop" (http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-connect-master-node-ssh.html). It has all the proper environment and related settings for working as expected. The error you are receiving is due to logging in as "ec2-user".




回答2:


I've been working with Spark on EMR this week, and found a few weird things relating to user permissions and relative paths.

It seems that running Spark from a directory which you don't 'own', as a user, is problematic. In some situations Spark (or some of the underlying Java pieces) want to create files or folders, and they think that pwd - the current directory - is the best place to do that.

Try going to the home directory

cd ~

then running pyspark.



来源:https://stackoverflow.com/questions/32405768/error-starting-spark-in-emr-4-0

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!