问题
I'm doing some processing on Hive. Usually, the result of this process is a folder (on S3), with multiple files (named with some random letters and numbers, in order) that I can just 'cat' together.
But for reports, I only need the first and the last file in the folder. Now, if the files number in the hundreds, I can simply download it via the web-gui.
But if it's in the thousands, scrolling down is a pain. Not to mention, Amazon loads things on the fly when needed, as opposed to showing it all.
I tried s3cmd get
but my experience with that is basic at best. I end up downloading the contents of the entire folder.
As far as I know one can pipe in extra commands, but I'm not sure how to do that.
So, how do I use s3cmd get
to download only the last file in a specific folder?
Thanks.
回答1:
I guess this command should work for you,
s3cmd get $(s3cmd ls s3://bucket_name/folder_name/ | tail -1 | awk '{ print $4 }')
tail -1
will pick the last line in folder listing and awk '{ print $4 }'
will pick the name of the file(fourth field).
For first file just replace tail -1
with head -1
来源:https://stackoverflow.com/questions/24544577/using-s3cmd-how-do-i-get-the-first-and-last-file-in-a-folder