How to download data from Amazon's requester pay buckets?

前端 未结 4 489
感情败类
感情败类 2021-02-08 04:01

I have been struggling for about a week to download arXiv articles as mentioned here: http://arxiv.org/help/bulk_data_s3#src.

I have tried lots of things: s3Browse

相关标签:
4条回答
  • 2021-02-08 04:35

    At the bottom of this page arXiv explains that s3cmd gets denied because it does not support access to requester pays bucket as a non-owner and you have to apply a patch to the source code of s3cmd. However, the version of s3cmd they used is outdated and the patch does not apply to the latest version of s3cmd.

    Basically you need to allow s3cmd to add "x-amz-request-payer" header to its HTTP request to buckets. Here is how to fix it:

    1. Download the source code of s3cmd.
    2. Open S3/S3.py with a text editor.
    3. Add this two lines of code at the bottom of __init__ function:

      if self.s3.config.extra_headers:
          self.headers.update(self.s3.config.extra_headers)
      
    4. Install s3cmd as instructed.
    0 讨论(0)
  • 2021-02-08 04:37

    Requester Pays is a feature on Amazon S3 buckets that requires the user of the bucket to pay Data Transfer costs associated with accessing data.

    Normally, the owner of an S3 bucket pays Data Transfer costs, but this can be expensive for free / Open Source projects. Thus, the bucket owner can activated Requester Pays to reduce the portion of costs they will be charged.

    Therefore, when accessing a Requester Pays bucket, you will need to authenticate yourself so that S3 knows whom to charge.

    I recommend using the official AWS Command-Line Interface (CLI) to access AWS services. You can provide your credentials via:

    aws configure
    

    and then view the bucket via:

    aws s3 ls s3://arxiv/pdf/
    

    and download via:

    aws s3 cp s3://arxiv/pdf/arXiv_pdf_1001_001.tar .
    

    UPDATE: I just tried the above myself, and received Access Denied error messages (both on the bucket listing and the download command). When using s3cmd, it says ERROR: S3 error: Access Denied. It would appear that the permissions on the bucket no longer permit access. You should contact the owners of the bucket to request access.

    0 讨论(0)
  • 2021-02-08 04:47

    For me the problem was that my IAM user didn't have enough permissions. Setting AmazonS3FullAccess was the solution for me.

    Hope it'll save time to someone

    0 讨论(0)
  • 2021-02-08 04:53

    Try downloading s3cmd version 1.6.0: http://sourceforge.net/projects/s3tools/files/s3cmd/

    $ s3cmd --configure
    

    Enter your credentials found in the account management tab of the Amazon AWS website interface.

    $ s3cmd get --recursive --skip-existing s3://arxiv/src/ --requester-pays
    
    0 讨论(0)
提交回复
热议问题