I am working on a project that involves parsing through a LARGE amount of data rapidly. Currently this data is on disk and broken down into a directory hierarchy:
what is the fastest way I can selectively load entries from my filesystem from varying DataSources and Days?
selectively means filtering, so my answer is a localhost database. Generally speaking if you filter, sort, paginate or extract distinct records from a large number of records, it's hard to beat a localhost SQL server. You get a query optimizer (nobody does that Java), a cache (which requires effort in Java, especially the invalidation), database indexes (have not seen that being done in Java either) etc. It's possible to implement these things manually, but then your are writing a database in Java.
On top of this you gain access to higher level SQL functions like window aggegrates etc., so in most cases there is no need to post-process data in Java.
The issue could be solved both ways but it depends on few factors
go for FileIO.
go for DB
Depending on architecture you are using you can implement different ways of caching, in the Jboss there is a built-in Jboss Caching, there are also third party opensource software that lets utilizes caching, like Redis, or EhCache depending on your needs. Basically Caching stores objects in their memory, some are passivated/activated upon demand, when memory is exhausted it is stored as a physical IO file, which are also easily activated marshalled by the caching mechanism. It lowers the database connectivity held by your program. There are other caches but here are some of them that I've worked with: