How to grep a term from S3 and output object name

问题

I need to grep a term over thousands of files in S3, and list those file names in some output file. I'm quite new using cli, so I've been testing both on my local, and in a small subset in s3.

So far I've got this:

aws s3 cp s3://mybucket/path/to/file.csv - | grep -iln searchterm > output.txt

The problem with this is with the hyphen. Since I'm copying over to standard output, the -l switch in grep returns (standard input) instead of file.csv

My desired output is

file.csv

Eventually, I'll need to iterate this over the whole bucket, and then all buckets, to get

file1.csv
file2.csv
file3.csv

But I need to get over this hurdle first. Thanks!

回答1:

Because you print the file in STDOUT and pipe that to grep STDIN, grep has no idea that the original file was file.csv. If you have a long list of files, I would do:

while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | grep -q searchterm && { echo ${file} >> output.txt; }; done < files_list.txt

I cannot try it, because I do not have access to an AWS S3 instance, but the trick is to use grep quietly (-q), it will return true if it finds at least a match, false otherwise; Then you can print the name of the file.

EDIT: Explanation

The while loop will iterate over each line of files_list.txt
The aws command will print this file in stdout
We redirect stdout to grep in quiet mode (-q) which acts as a pattern matcher, returning true if a match was found, false ohter wise.
If grep returns true, we append the name of the file (${file}) to our output file.

EDIT2: Other solution

while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | sed -n /searchpattern/{F;q} >> output.txt; done < files_list.txt

Explanation

Steps 1 and 2 are the same, then:

stdout is redirected to sed, which will look in the file line by line until it finds the first stream pattern, and then quit (q), printing the file name (F) in the output file.

来源：https://stackoverflow.com/questions/42707646/how-to-grep-a-term-from-s3-and-output-object-name

标签

bash

amazon-s3

grep

command-line-interface

aws-cli