amazon-emr | 易学教程

Folder won't delete on Amazon S3

阅读更多关于 Folder won't delete on Amazon S3

问题 I'm trying to delete a folder created as a result of a MapReduce job. Other files in the bucket delete just fine, but this folder won't delete. When I try to delete it from the console, the progress bar next to its status just stays at 0. Have made multiple attempts, including with logout/login in between. 回答1: First and foremost, Amazon S3 doesn't actually have a native concept of folders/directories, rather is a flat storage architecture comprised of buckets and objects/keys only - the

Any Scala SDK or interface for AWS?

阅读更多关于 Any Scala SDK or interface for AWS?

问题 Does anyone know of a Scala SDK for Amazon Web Services? I am particularly interested in the EMR jobs. 回答1: Take a look at AWScala (it's a simple wrapper on top of AWS SDK for Java): https://github.com/seratch/AWScala [UPDATE from 04/07/2015]: Another very promising library from @dwhjames: Asynchronous Scala Clients for Amazon Web Services https://dwhjames.github.io/aws-wrap/ 回答2: You could use the standard Java SDK directly without any problems from Scala, however I'm not aware of any Scala

Can we add more Amazon Elastic Mapreduce instances into an existing Amazon Elastic Mapreduce instances?

阅读更多关于 Can we add more Amazon Elastic Mapreduce instances into an existing Amazon Elastic Mapreduce instances?

问题 I am new to Amazon Services and facing some issues. Suppose I am running some Job Flow on Amazon Elastic Mapreduce with total 3 instances. While running my job flow on it I found that my job is taking more time to execute. And in such case I need to add more instances into it so that my instances will increase and hence job will execute fast. My question is that How to add such instance into an existing instances? Because If we terminate existed instance and again create the new instances

Can we add more Amazon Elastic Mapreduce instances into an existing Amazon Elastic Mapreduce instances?

阅读更多关于 Can we add more Amazon Elastic Mapreduce instances into an existing Amazon Elastic Mapreduce instances?

Spark History Server behind Load Balancer is redirecting to HTTP

阅读更多关于 Spark History Server behind Load Balancer is redirecting to HTTP

问题 I am currently using Spark on AWS EMR, but when this is behind a Load Balancer (AWS ELB), it is redirecting the traffic from https to http, which then ends up getting denied because I don't allow http traffic through the load balancer for the given port. It appears that this might derive from Yarn being a proxy as well, but I have no idea. 来源： https://stackoverflow.com/questions/56412083/spark-history-server-behind-load-balancer-is-redirecting-to-http

Spark History Server behind Load Balancer is redirecting to HTTP

阅读更多关于 Spark History Server behind Load Balancer is redirecting to HTTP

AWS EMR using spark steps in cluster mode. Application application_ finished with failed status

阅读更多关于 AWS EMR using spark steps in cluster mode. Application application_ finished with failed status

问题 I'm trying to launch a cluster using AWS Cli. I use the following command: aws emr create-cluster --name "Config1" --release-label emr-5.0.0 --applications Name=Spark --use-default-role --log-uri 's3://aws-logs-813591802533-us-west-2/elasticmapreduce/' --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.medium The cluster is created successfully. Then I add this command: aws emr add-steps --cluster-id ID

How to avoid reading old files from S3 when appending new data?

阅读更多关于 How to avoid reading old files from S3 when appending new data?

问题 Once in 2 hours, spark job is running to convert some tgz files to parquet. The job appends the new data into an existing parquet in s3: df.write.mode("append").partitionBy("id","day").parquet("s3://myBucket/foo.parquet") In spark-submit output I can see significant time is being spent on reading old parquet files, for example: 16/11/27 14:06:15 INFO S3NativeFileSystem: Opening 's3://myBucket/foo.parquet/id=123/day=2016-11-26/part-r-00003-b20752e9-5d70-43f5-b8b4-50b5b4d0c7da.snappy.parquet'

How to avoid reading old files from S3 when appending new data?

阅读更多关于 How to avoid reading old files from S3 when appending new data?

AWS EMR 5.11.0 - Apache Hive on Spark

阅读更多关于 AWS EMR 5.11.0 - Apache Hive on Spark

问题 I am trying to setup Apache Hive on Spark on AWS EMR 5.11.0. Apache Spark Version - 2.2.1 Apache Hive Version - 2.3.2 Yarn logs show below error: 18/01/28 21:55:28 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS java.lang.NoSuchFieldError: SPARK_RPC_SERVER_ADDRESS at org.apache.hive.spark.client.rpc.RpcConfiguration.(RpcConfiguration.java:47) at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:134) at org.apache.hive.spark