Confusion about distributed cache in Hadoop

前端 未结 1 1788
一整个雨季
一整个雨季 2021-01-17 22:43

What does the distribute cache actually mean? Having a file in distributed cache means that is it available in every datanode and hence there will be no internode communicat

相关标签:
1条回答
  • 2021-01-17 23:11

    DistributedCache is a facility provided by the Map-Reduce framework to cache files needed by applications. Once you cache a file for your job, hadoop framework will make it available on each and every data nodes (in file system, not in memory) where you map/reduce tasks are running. Then you can access the cache file as local file in your Mapper Or Reducer job. Now you can easily read the cache file and populate some collection (e.g Array, Hashmap etc.) in your code.

    Refer https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/filecache/DistributedCache.html

    Let me know if still you have some questions.

    You can read the cache file as local file in your UDF code. After reading the file using JAVA APIs just populate any collection (In memory).

    Refere URL http://www.lichun.cc/blog/2013/06/use-a-lookup-hashmap-in-hive-script/

    -Ashish

    0 讨论(0)
提交回复
热议问题