Confusion about distributed cache in Hadoop

扶醉桌前 提交于 2019-12-01 16:14:19

DistributedCache is a facility provided by the Map-Reduce framework to cache files needed by applications. Once you cache a file for your job, hadoop framework will make it available on each and every data nodes (in file system, not in memory) where you map/reduce tasks are running. Then you can access the cache file as local file in your Mapper Or Reducer job. Now you can easily read the cache file and populate some collection (e.g Array, Hashmap etc.) in your code.

Refer https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/filecache/DistributedCache.html

Let me know if still you have some questions.

You can read the cache file as local file in your UDF code. After reading the file using JAVA APIs just populate any collection (In memory).

Refere URL http://www.lichun.cc/blog/2013/06/use-a-lookup-hashmap-in-hive-script/

-Ashish

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!