Distcp - Container is running beyond physical memory limits
I've been strugling with distcp for several days and I swear I have googled enough. Here is my use-case: USE CASE I have a main folder in a certain location say /hdfs/root , with a lot of subdirs (deepness is not fixed) and files. Volume: 200,000 files ~= 30 GO I need to copy only a subset for a client, /hdfs/root in another location, say /hdfs/dest This subset is defined by a list of absolute path that can be updated over time. Volume: 50,000 files ~= 5 GO You understand that I can't use a simple hdfs dfs -cp /hdfs/root /hdfs dest because it is not optimized, it will take every files, and it