How to grep into files stored in S3

后端 未结 3 1207
北恋
北恋 2021-02-07 13:50

Do anybody know how to perform grep on S3 files with aws S3 directly into the bucket? For example I have FILE1.csv, FILE2.csv with many rows and want to look for the rows that

3条回答
  •  鱼传尺愫
    2021-02-07 14:26

    You can also use the GLUE/Athena combo which allows you to execute directly within AWS. Depending on data volumes, queries' cost can be significant and take time.

    Basically

    • Create a GLUE classifier that reads byline
    • Create a crawler to your S3 data directory against a database (csvdumpdb) - it will create a table with all the lines across all the csvs found
    • Use Athena to query, e.g.

      select "$path",line from where line like '%some%fancy%string%'

    • and get something like

      $path line

      s3://mybucket/mydir/my.csv "some I did find some,yes, "fancy, yes, string"

    Saves you from having to run any external infrastructure.

提交回复
热议问题