I am attempting to Tika parse dozens of millions of office documents. Pdfs, docs, excels, xmls, etc. Wide assortment of types.
Throughput is very important. I need to