Accessing google cloud storage using hadoop FileSystem api

前端未结

关注

 1  1012

From my machine, I\'ve configured the hadoop core-site.xml to recognize the gs:// scheme and added gcs-connector-1.2.8.jar as a Hadoop lib. I can run <

相关标签:

1条回答

闹比i

2021-01-22 19:55
As to your first question, "expected" is questionable, but I think I can at least explain. When FileSystem.get() is used the default FileSystem is returned and by default that is HDFS. My guess is that the HDFS client (DistributedFileSystem) has code to prepend scheme + authority automatically to all files in the filesystem.

Instead of using FileSystem.get(conf), try
```
FileSystem gcsFs = new Path("gs://mybucket/").getFS(conf)
```
On disadvantages, I could probably argue that if you end up needing to access the object-store directly then you'll end up writing code to interact with the storage APIs directly anyways (and there are things that do not translate very well to the Hadoop FS API, e.g., object composition, complex object write preconditions other than simple object overwrite protection, etc).

I am admittedly biased (working on the team), but if you're intending to use GCS from Hadoop Map/Reduce, from Spark, etc, the GCS connector for Hadoop should be a fairly safe bet.
0 讨论(0)
发布评论:

提交评论
- 加载中...