In my job, I have the need to parse many historical logsets. Individual customers (there are thousands) may have hundreds of log subdirectories broken out by date. For example
I guess you need to investigate writing a custom InputFormat which you can pass the root directory too, it will create a split for each customer, and then the record reader for each split will do the directory walk and push the file contents to your mappers