istream vs memory mapping a file?

前端 未结 3 1764
生来不讨喜
生来不讨喜 2021-02-02 17:31

I am trying to map a file to memory and then parse line by line- is istream what I should be using?

Is istream the same as mapping a file to memory on Windows? I have h

相关标签:
3条回答
  • 2021-02-02 18:14

    Is istream the same as mapping a file to memory on windows?

    Not exactly. They are not the same in the same sense a "stream" is not a "file".

    Think to a file as a stored sequence, and to a stream as the interface for the "channel" (a stream_buffer) that sequence flows when moving from its store towards the receiving variables.

    Think to a memory mapped file as a "file" that -instead been stored outside the processing unit- is stored in-sync in memory. It has the advantage to be visible as a raw memory buffer being a file. If you want to read it as a stream, the simplest way is probably using a istringstream that has that raw buffer as the place to read from.

    0 讨论(0)
  • 2021-02-02 18:21

    Abstractly speaking, reading a file sequentially will not be sped up by using memory mapped files or by first reading it into memory. Memory mapped files make sense if reading the file sequentially is not feasible. Pre-caching the file like in the other answer or just by copying the file to a large string which you could then process by other means - again - only makes sense if reading the file once in sequence is not feasible and you have the RAM for it. This is because the slowest part of the operation is actually getting the data off the disk. And this has to be done regardless, whether you copy the file to RAM or you let the operating system map the data before you can access it or when you let std::iostream read it line by line and let it cache from the file just enough to make this work smoothly.

    In practice you could potentially eliminate some copying from ram to ram with the mapped or cached versions, by making shallow copies of the buffer ranges. Still this will not change much because this is RAM->RAM and therefore negligible in comparison to disk->RAM.

    The best advice in a situation like yours is therefore not to worry too much and just use std::iostream.

    [Ths answer is for archival purposes, because the correct answer is buried in the comments]

    0 讨论(0)
  • 2021-02-02 18:29

    std::istream is an abstract type – you cannot use it directly. You should be deriving from it with a custom array-backed streambuf:

    #include <cstddef>
    #include <string>
    #include <streambuf>
    #include <istream>
    
    template<typename CharT, typename TraitsT = std::char_traits<CharT>>
    struct basic_membuf : std::basic_streambuf<CharT, TraitsT> {
        basic_membuf(CharT const* const buf, std::size_t const size) {
            CharT* const p = const_cast<CharT*>(buf);
            this->setg(p, p, p + size);
        }
    
        //...
    };
    
    template<typename CharT, typename TraitsT = std::char_traits<CharT>>
    struct basic_imemstream
    : virtual basic_membuf<CharT, TraitsT>, std::basic_istream<CharT, TraitsT> {
        basic_imemstream(CharT const* const buf, std::size_t const size)
        : basic_membuf(buf, size),
          std::basic_istream(static_cast<std::basic_streambuf<CharT, TraitsT>*>(this))
        { }
    
        //...
    };
    
    using imemstream = basic_imemstream<char>;
    
    char const* const mmaped_data = /*...*/;
    std::size_t const mmap_size = /*...*/;
    imemstream s(mmaped_data, mmap_size);
    // s now uses the memory mapped data as its underlying buffer.
    

    As for the memory-mapping itself, I recommend using Boost.Interprocess for this purpose:

    #include <cstddef>
    #include <string>
    #include <boost/interprocess/file_mapping.hpp>
    #include <boost/interprocess/mapped_region.hpp>
    
    namespace bip = boost::interprocess;
    
    //...
    
    std::string filename = /*...*/;
    bip::file_mapping mapping(filename.c_str(), bip::read_only);
    bip::mapped_region mapped_rgn(mapping, bip::read_only);
    char const* const mmaped_data = static_cast<char*>(mapped_rgn.get_address());
    std::size_t const mmap_size = mapped_rgn.get_size();
    

    Code for imemstream taken from this answer by Dietmar Kühl.

    0 讨论(0)
提交回复
热议问题