How to read/write vector<Chunk*> as memory mapped file(s)?

别说谁变了你拦得住时间么 提交于 2019-12-13 02:23:46

问题


I have a large set of data chunks (~50GB). In my code I have to be able to do the following things:

  1. Repeatedly iterate over all chunks and do some computations on them.

  2. Repeatedly iterate over all chunks and do some computations on them, where in each iteration the order of visited chunks is (as far as possible) randomized.

So far, I have split the data into 10 binary files (created with boost::serialization) and repeatedly read one after the other and perform the computations. For (2), I read the 10 files in random order and process each one in sequence, which is good enough.

However, reading the one of the files (using boost::serialization) takes a long time and I'd like to speed it up.

Can I use memory mapped files instead of boost::serialization?

In particular, I'd have a vector<Chunk*> in each file. I want to be able to read in such a file very, very quickly.

How can I read/write such a vector<Chunk*> data structure? I have looked at boost::interprocess::file_mapping, but I'm not sure how to do it.

I read this (http://boost.cowic.de/rc/pdf/interprocess.pdf), but it doesn't say much about memory mapped files. I think I'd store the vector<Chunk*> first in the mapped memory, then store the Chunks themselves. And, vector<Chunk*> would actually become offset_ptr<Chunk>*, i.e., an array of offset_ptr?


回答1:


A memory mapped file is a chunk of memory, as any other memory it may be organized in bytes, little endian words, bits, or any other data structure. If portability is a concern (e.g. endianness) some care is needed.

The following code may be a good starting point:

#include <cstdint>
#include <memory>
#include <vector>
#include <iostream>
#include <boost/iostreams/device/mapped_file.hpp>

struct entry {
  std::uint32_t a;
  std::uint64_t b;
} __attribute__((packed)); /* compiler specific, but supported 
                              in other ways by all major compilers */

static_assert(sizeof(entry) == 12, "entry: Struct size mismatch");
static_assert(offsetof(entry, a) == 0, "entry: Invalid offset for a");
static_assert(offsetof(entry, b) == 4, "entry: Invalid offset for b");

int main(void) {
  boost::iostreams::mapped_file_source mmap("map");
  assert(mmap.is_open());
  const entry* data_begin = reinterpret_cast<const entry*>(mmap.data());
  const entry* data_end = data_begin + mmap.size()/sizeof(entry);
  for(const entry* ii=data_begin; ii!=data_end; ++ii)
    std::cout << std::hex << ii->a << " " << ii->b << std::endl;
  return 0;
}

The data_begin and data_end pointers can be used with most STL functions as any other iterator.



来源:https://stackoverflow.com/questions/19531243/how-to-read-write-vectorchunk-as-memory-mapped-files

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!