I have dataset that can be greater than 100s of Terabytes. In these datasets there can be 1000s of files. Each files comes with their own format. Lets say file1 has D1, X1,