I have an amazon s3 bucket that has tens of thousands of filenames in it. What\'s the easiest way to get a text file that lists all the filenames in the bucket?
Documentation for aws s3 ls
AWS have recently release their Command Line Tools. This works much like boto and can be installed using sudo easy_install awscli
or sudo pip install awscli
Once you have installed, you can then simply run
aws s3 ls
Which will show you all of your available buckets
CreationTime Bucket
------------ ------
2013-07-11 17:08:50 mybucket
2013-07-24 14:55:44 mybucket2
You can then query a specific bucket for files.
Command:
aws s3 ls s3://mybucket
Output:
Bucket: mybucket
Prefix:
LastWriteTime Length Name
------------- ------ ----
PRE somePrefix/
2013-07-25 17:06:27 88 test.txt
This will show you all of your files.
You can list all the files, in the aws s3 bucket using the command
aws s3 ls path/to/file
and to save it in a file, use
aws s3 ls path/to/file >> save_result.txt
if you want to append your result in a file otherwise:
aws s3 ls path/to/file > save_result.txt
if you want to clear what was written before.
It will work both in windows and Linux.
Be carefull, amazon list only returns 1000 files. If you want to iterate over all files you have to paginate the results using markers :
In ruby using aws-s3
bucket_name = 'yourBucket'
marker = ""
AWS::S3::Base.establish_connection!(
:access_key_id => 'your_access_key_id',
:secret_access_key => 'your_secret_access_key'
)
loop do
objects = Bucket.objects(bucket_name, :marker=>marker, :max_keys=>1000)
break if objects.size == 0
marker = objects.last.key
objects.each do |obj|
puts "#{obj.key}"
end
end
end
Hope this helps, vincent
function showUploads(){
if (!class_exists('S3')) require_once 'S3.php';
// AWS access info
if (!defined('awsAccessKey')) define('awsAccessKey', '234567665464tg');
if (!defined('awsSecretKey')) define('awsSecretKey', 'dfshgfhfghdgfhrt463457');
$bucketName = 'my_bucket1234';
$s3 = new S3(awsAccessKey, awsSecretKey);
$contents = $s3->getBucket($bucketName);
echo "<hr/>List of Files in bucket : {$bucketName} <hr/>";
$n = 1;
foreach ($contents as $p => $v):
echo $p."<br/>";
$n++;
endforeach;
}
Alternatively you can use Minio Client aka mc. Its Open Source and compatible with AWS S3. It is available for Linux, Windows, Mac, FreeBSD.
All you have do do is to run mc ls command for listing the contents.
$ mc ls s3/kline/ [2016-04-30 13:20:47 IST] 1.1MiB 1.jpg [2016-04-30 16:03:55 IST] 7.5KiB docker.png [2016-04-30 15:16:17 IST] 50KiB pi.png [2016-05-10 14:34:39 IST] 365KiB upton.pdf
Note:
Installing Minio Client Linux Download mc for:
$ chmod 755 mc $ ./mc --help
Setting up AWS credentials with Minio Client
$ mc config host add mys3 https://s3.amazonaws.com BKIKJAA5BMMU2RHO6IBB V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12
Note: Please replace mys3 with alias you would like for this account and ,BKIKJAA5BMMU2RHO6IBB, V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12 with your AWS ACCESS-KEY and SECRET-KEY
Hope it helps.
Disclaimer: I work for Minio
Simplified and updated version of the Scala answer by Paolo:
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}
import com.amazonaws.services.s3.AmazonS3
import com.amazonaws.services.s3.model.{ListObjectsRequest, ObjectListing, S3ObjectSummary}
def buildListing(s3: AmazonS3, request: ListObjectsRequest): List[S3ObjectSummary] = {
def buildList(listIn: List[S3ObjectSummary], bucketList:ObjectListing): List[S3ObjectSummary] = {
val latestList: List[S3ObjectSummary] = bucketList.getObjectSummaries.toList
if (!bucketList.isTruncated) listIn ::: latestList
else buildList(listIn ::: latestList, s3.listNextBatchOfObjects(bucketList))
}
buildList(List(), s3.listObjects(request))
}
Stripping out the generics and using the ListObjectRequest generated by the SDK builders.