“head” command for aws s3 to view file contents

前端 未结 8 1289
不知归路
不知归路 2021-02-06 22:44

On Linux, we generally use the head/tail commands to preview the contents of a file. It helps in viewing a part of the file (to inspect the format for instance), rather than ope

相关标签:
8条回答
  • 2021-02-06 23:17

    If you don't want to download the whole file, you can download a portion of it with the --range option specified in the aws s3api command and after the file portion is downloaded, then run a head command on that file.

    Example:

    aws s3api get-object --bucket my_s3_bucket --key s3_folder/file.txt --range bytes=0-1000000 tmp_file.txt && head tmp_file.txt
    

    Explanation:

    The aws s3api get-object downloads a portion of the s3 file from the specified bucket and s3 folder with the a specified size in --range to a specified output file. The && executes the second command only if the first one has succeeded. The second command prints the 10 first line of the previously created output file.

    0 讨论(0)
  • 2021-02-06 23:18

    You can use the range switch to the older s3api get-object command to bring back the first bytes of a s3 object. (AFAICT s3 doesn't support the switch.)

    The pipe \dev\stdout can be passed as the target filename if you simply want to view the S3 object by piping to head. Here's an example:

    aws s3api get-object --bucket mybucket_name --key path/to/the/file.log --range bytes=0-10000 /dev/stdout | head

    Finally, if like me you're dealing with compressed .gz files, the above technique also works with zless enabling you to view the head of the decompressed file:

    aws s3api get-object --bucket mybucket_name --key path/to/the/file.log.gz --range bytes=0-10000 /dev/stdout | zless

    One tip with zless: if it isn't working try increasing the size of the range.

    0 讨论(0)
  • 2021-02-06 23:18

    One easy way to do is :-

    aws s3api get-object --bucket bucket_name --key path/to/file.txt  --range bytes=0-10000 /path/to/local/t3.txt | cat t3 | head -100
    

    For the gz file , you can do

    aws s3api get-object --bucket bucket_name --key path/to/file.gz  --range bytes=0-10000 /path/to/local/t3 | zless t3 | head -100
    

    If the data is being less, incerease the amount of bytes required

    0 讨论(0)
  • 2021-02-06 23:19

    There is no such capability. You can only retrieve the entire object. You can perform an HTTP HEAD request to view object metadata, but that isn't what you're looking for.

    0 讨论(0)
  • 2021-02-06 23:22

    As others have answered, assuming the file is large, use get-object command with --range bytes=0-1000 to download only part of the file.

    example:
    aws s3api get-object --profile opsrep --region eu-west-1 --bucket <MY-BUCKET> --key <DIR/MY-FILE.CSV> --range bytes=0-10000 "OUTPUT.csv" docs

    As of 2018 you can now run SELECT Queries in AWS CLI. Use LIMIT 10 to preview the "head" of your file.

    example:
    aws s3api select-object-content --bucket <MY-BUCKET> --key <DIR/MY-FILE.CSV> --expression "select * from s3object limit 10" --expression-type "SQL" --input-serialization "CSV={}" --output-serialization "CSV={}" "OUTPUT.csv" docs

    Now you can quickly run head OUTPUT.csv on the small local file

    0 讨论(0)
  • 2021-02-06 23:39

    You can specify a byte range when retrieving data from S3 to get the first N bytes, the last N bytes or anything in between. (This is also helpful since it allows you to download files in parallel – just start multiple threads or processes, each of which retrieves part of the total file.)

    I don't know which of the various CLI tools support this directly but a range retrieval does what you want.

    The AWS CLI tools ("aws s3 cp" to be precise) does not allow you to do range retrieval but s3curl (http://aws.amazon.com/code/128) should do the trick.(So does plain curl, e.g., using the --range parameter but then you would have to do the request signing on your own.)

    0 讨论(0)
提交回复
热议问题