Emrfs file sync with s3 not working

不羁的心 提交于 2019-12-21 07:56:46

问题


After running a spark job on an Amazon EMR cluster, I deleted the output files directly from s3 and tried to rerun the job again. I received the following error upon trying to write to parquet file format on s3 using sqlContext.write:

'bucket/folder' present in the metadata but not s3
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:455)

I tried running

emrfs sync s3://bucket/folder

which did not appear to resolve the error even though it did remove some records from the DynamoDB instance that keeps track of the metadata. Not sure what else I can try. How do I resolve this error?


回答1:


It turned out that I needed to run

emrfs delete s3://bucket/folder

first before running sync. Running the above solved the issue.




回答2:


Mostly the consistent problem comes due to retry logic in spark and hadoop systems. When a process of creating a file on s3 failed, but it already updated in the dynamodb. when the hadoop process restarts the process as the entry is already present in the dynamodb. It throws the consistent error.

If you want to delete the metadata of s3 which is stored in the dynamaoDB, whose objects are already removed. This are the steps, Delete all the metadata

Deletes all the objects in the path, emrfs delete uses the hash function to delete the records, so it may delete unwanted entries also, so we are doing the import and sync in the consequent steps

emrfs delete   s3://path

Retrieves the metadata for the objects that are physically present in s3 into dynamo db

emrfs import s3://path

Sync the data between s3 and the metadata.

emrfs sync s3://path      

After all the operations, to see whether that particular object is present in both s3 and metadata

emrfs diff s3://path 

http://docs.aws.amazon.com/emr/latest/ManagementGuide/emrfs-cli-reference.html



来源:https://stackoverflow.com/questions/39823283/emrfs-file-sync-with-s3-not-working

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!