I have an amazon s3 bucket that has tens of thousands of filenames in it. What\'s the easiest way to get a text file that lists all the filenames in the bucket?
aws s3api list-objects --bucket bucket-name
For more details see here - http://docs.aws.amazon.com/cli/latest/reference/s3api/list-objects.html
You can use standard s3 api -
aws s3 ls s3://root/folder1/folder2/
I'd recommend using boto. Then it's a quick couple of lines of python:
from boto.s3.connection import S3Connection
conn = S3Connection('access-key','secret-access-key')
bucket = conn.get_bucket('bucket')
for key in bucket.list():
print key.name.encode('utf-8')
Save this as list.py, open a terminal, and then run:
$ python list.py > results.txt
please try this bash script. it uses curl command with no need for any external dependencies
bucket=<bucket_name>
region=<region_name>
awsAccess=<access_key>
awsSecret=<secret_key>
awsRegion="${region}"
baseUrl="s3.${awsRegion}.amazonaws.com"
m_sed() {
if which gsed > /dev/null 2>&1; then
gsed "$@"
else
sed "$@"
fi
}
awsStringSign4() {
kSecret="AWS4$1"
kDate=$(printf '%s' "$2" | openssl dgst -sha256 -hex -mac HMAC -macopt "key:${kSecret}" 2>/dev/null | m_sed 's/^.* //')
kRegion=$(printf '%s' "$3" | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kDate}" 2>/dev/null | m_sed 's/^.* //')
kService=$(printf '%s' "$4" | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kRegion}" 2>/dev/null | m_sed 's/^.* //')
kSigning=$(printf 'aws4_request' | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kService}" 2>/dev/null | m_sed 's/^.* //')
signedString=$(printf '%s' "$5" | openssl dgst -sha256 -hex -mac HMAC -macopt "hexkey:${kSigning}" 2>/dev/null | m_sed 's/^.* //')
printf '%s' "${signedString}"
}
if [ -z "${region}" ]; then
region="${awsRegion}"
fi
# Initialize helper variables
authType='AWS4-HMAC-SHA256'
service="s3"
dateValueS=$(date -u +'%Y%m%d')
dateValueL=$(date -u +'%Y%m%dT%H%M%SZ')
# 0. Hash the file to be uploaded
# 1. Create canonical request
# NOTE: order significant in ${signedHeaders} and ${canonicalRequest}
signedHeaders='host;x-amz-content-sha256;x-amz-date'
canonicalRequest="\
GET
/
host:${bucket}.s3.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:${dateValueL}
${signedHeaders}
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
# Hash it
canonicalRequestHash=$(printf '%s' "${canonicalRequest}" | openssl dgst -sha256 -hex 2>/dev/null | m_sed 's/^.* //')
# 2. Create string to sign
stringToSign="\
${authType}
${dateValueL}
${dateValueS}/${region}/${service}/aws4_request
${canonicalRequestHash}"
# 3. Sign the string
signature=$(awsStringSign4 "${awsSecret}" "${dateValueS}" "${region}" "${service}" "${stringToSign}")
# Upload
curl -g -k "https://${baseUrl}/${bucket}" \
-H "x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" \
-H "x-amz-Date: ${dateValueL}" \
-H "Authorization: ${authType} Credential=${awsAccess}/${dateValueS}/${region}/${service}/aws4_request,SignedHeaders=${signedHeaders},Signature=${signature}"
In Java you can get the keys using ListObjects (see AWS documentation)
FileWriter fileWriter;
BufferedWriter bufferedWriter;
// [...]
AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName)
.withPrefix("myprefix");
ObjectListing objectListing;
do {
objectListing = s3client.listObjects(listObjectsRequest);
for (S3ObjectSummary objectSummary :
objectListing.getObjectSummaries()) {
// write to file with e.g. a bufferedWriter
bufferedWriter.write(objectSummary.getKey());
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
AWS CLI can let you see all files of an S3 bucket quickly and help in performing other operations too.
To use AWS CLI follow steps below:
To see all files of an S3 bucket use command
aws s3 ls s3://your_bucket_name --recursive
Reference to use AWS cli for different AWS services: https://docs.aws.amazon.com/cli/latest/reference/