Java: fastest way to do random reads on huge disk file(s)

前端 未结 4 652
走了就别回头了
走了就别回头了 2021-02-04 19:12

I\'ve got a moderately big set of data, about 800 MB or so, that is basically some big precomputed table that I need to speed some computation by several orders of magnitude (cr

4条回答
  •  囚心锁ツ
    2021-02-04 19:37

    800MB is not that much to load up and store in memory. If you can afford to have multicore machines ripping away at a data set for days on end, you can afford an extra GB or two of RAM, no?

    That said, read up on Java's java.nio.MappedByteBuffer. It is clear from your comment "I think I don't want to map the 800 MB in memory" that the concept is not clear.

    In a nut shell, a mapped byte buffer allows one to programmatically access the data as it were in memory, although it may be on disk or in memory--this is for the OS to decide, as Java's MBB is based on the OS's Virtual Memory subsystem. It is also nice and fast. You will also be able to access a single MBB from multiple threads safely.

    Here are the steps I recommend you take:

    1. Instantiate a MappedByteBuffer that maps your data file to the MBB. The creation is kinda expensive, so keep it around.
    2. In your look up method...
      1. instantiate a byte[4] array
      2. call .get(byte[] dst, int offset, int length)
      3. the byte array will now have your data, which you can turn into a value

    And presto! You have your data!

    I'm a big fan of MBBs and have used them successfully for such tasks in the past.

提交回复
热议问题