S3: How to do a partial read / seek without downloading the complete file?

前端未结

关注

 3  1827

Although they resemble files, objects in Amazon S3 aren\'t really \"files\", just like S3 buckets aren\'t really directories. On a Unix system I can use head to

相关标签:

3条回答

有刺的猬

2020-12-02 17:49

The AWS .Net SDK only shows only fixed-ended ranges are possible (RE: public ByteRange(long start, long end) ). What if I want to start in the middle and read to the end? An HTTP range of Range: bytes=1000- is perfectly acceptable for "start at 1000 and read to the end" I do not believe that they have allowed for this in the .Net library.

0 讨论(0)
发布评论:

提交评论
- 加载中...

既然无缘

2020-12-02 17:58

Using Python you can preview first records of compressed file.

Connect using boto.

#Connect:
s3 = boto.connect_s3()
bname='my_bucket'
self.bucket = s3.get_bucket(bname, validate=False)

Read first 20 lines from gzip compressed file

#Read first 20 records
limit=20
k = Key(self.bucket)
k.key = 'my_file.gz'
k.open()
gzipped = GzipFile(None, 'rb', fileobj=k)
reader = csv.reader(io.TextIOWrapper(gzipped, newline="", encoding="utf-8"), delimiter='^')
for id,line in enumerate(reader):
    if id>=int(limit): break
    print(id, line)

So it's an equivalent of a following Unix command:

zcat my_file.gz|head -20

0 讨论(0)

忘了有多久

2020-12-02 17:59

S3 files can be huge, but you don't have to fetch the entire thing just to read the first few bytes. The S3 APIs support the HTTP Range: header (see RFC 2616), which take a byte range argument.

Just add a Range: bytes=0-NN header to your S3 request, where NN is the requested number of bytes to read, and you'll fetch only those bytes rather than read the whole file. Now you can preview that 900 GB CSV file you left in an S3 bucket without waiting for the entire thing to download. Read the full GET Object docs on Amazon's developer docs.

0 讨论(0)
发布评论:

提交评论
- 加载中...