Java File IO vs Local database

前端未结

关注

 3  1700

I am working on a project that involves parsing through a LARGE amount of data rapidly. Currently this data is on disk and broken down into a directory hierarchy:

相关标签:

3条回答

独厮守ぢ

2021-01-23 06:30

what is the fastest way I can selectively load entries from my filesystem from varying DataSources and Days?

selectively means filtering, so my answer is a localhost database. Generally speaking if you filter, sort, paginate or extract distinct records from a large number of records, it's hard to beat a localhost SQL server. You get a query optimizer (nobody does that Java), a cache (which requires effort in Java, especially the invalidation), database indexes (have not seen that being done in Java either) etc. It's possible to implement these things manually, but then your are writing a database in Java.

On top of this you gain access to higher level SQL functions like window aggegrates etc., so in most cases there is no need to post-process data in Java.

0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2021-01-23 06:40
The issue could be solved both ways but it depends on few factors

go for FileIO.
1. if the volume is < millons of rows
2. if your dont do a complicated query like Jon Skeet said
3. if your referance for fetching the row is by using hte Folder Name: "DataSource" as the key
go for DB
1. if you see your program reading through millions of records
2. you can do complicated selection, even multiple rows using a single select.
3. if you have knowledge of creating a basic table structure for DB
0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2021-01-23 06:41
Depending on architecture you are using you can implement different ways of caching, in the Jboss there is a built-in Jboss Caching, there are also third party opensource software that lets utilizes caching, like Redis, or EhCache depending on your needs. Basically Caching stores objects in their memory, some are passivated/activated upon demand, when memory is exhausted it is stored as a physical IO file, which are also easily activated marshalled by the caching mechanism. It lowers the database connectivity held by your program. There are other caches but here are some of them that I've worked with:
- Jboss:http://www.jboss.org/jbosscache/
- Redis:http://redis.io/
- EhCache:http://ehcache.org/
0 讨论(0)
发布评论:

提交评论
- 加载中...