问题
I created an EMR 4.0
instance in AWS with all available applications, including Spark
. I did it manually, through AWS Console. I started the cluster and SSHed to the master node when it was up. There I ran pyspark
. I am getting the following error when pyspark
tries to create SparkContext
:
2015-09-03 19:36:04,195 ERROR Thread-3 spark.SparkContext (Logging.scala:logError(96)) - -ec2-user, access=WRITE, inode="/user":hdfs:hadoop:drwxr-xr-x at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
I haven't added any custom applications, nor bootstrapping and expected everything to work without errors. Not sure what's going on. Any suggestions will be greatly appreciated.
回答1:
Login as the user "hadoop" (http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-connect-master-node-ssh.html). It has all the proper environment and related settings for working as expected. The error you are receiving is due to logging in as "ec2-user".
回答2:
I've been working with Spark on EMR this week, and found a few weird things relating to user permissions and relative paths.
It seems that running Spark from a directory which you don't 'own', as a user, is problematic. In some situations Spark (or some of the underlying Java pieces) want to create files or folders, and they think that pwd
- the current directory - is the best place to do that.
Try going to the home directory
cd ~
then running pyspark
.
来源:https://stackoverflow.com/questions/32405768/error-starting-spark-in-emr-4-0