How do I write a range pipeline that uses temporary containers?

前端 未结 6 2332
予麋鹿
予麋鹿 2020-11-30 02:09

I have a third-party function with this signature:

std::vector f(T t);

I also have an existing potentially infinite range (of the

相关标签:
6条回答
  • 2020-11-30 02:43

    range-v3 forbids views over temporary containers to help us avoid the creation of dangling iterators. Your example demonstrates exactly why this rule is necessary in view compositions:

    auto rng = src | view::transform(f) | view::join;
    

    If view::join were to store the begin and end iterators of the temporary vector returned by f, they would be invalidated before ever being used.

    "That's all great, Casey, but why don't range-v3 views store temporary ranges like this internally?"

    Because performance. Much like how the performance of the STL algorithms is predicated on the requirement that iterator operations are O(1), the performance of view compositions is predicated on the requirement that view operations are O(1). If views were to store temporary ranges in internal containers "behind your back" then the complexity of view operations - and hence compositions - would become unpredictable.

    "Ok, fine. Given that I understand all of this wonderful design, how do I MAKE THIS WORK?!??"

    Since the view composition won't store the temporary ranges for you, you need to dump them into some kind of storage yourself, e.g.:

    #include <iostream>
    #include <vector>
    #include <range/v3/range_for.hpp>
    #include <range/v3/utility/functional.hpp>
    #include <range/v3/view/iota.hpp>
    #include <range/v3/view/join.hpp>
    #include <range/v3/view/transform.hpp>
    
    using T = int;
    
    std::vector<T> f(T t) { return std::vector<T>(2, t); }
    
    int main() {
        std::vector<T> buffer;
        auto store = [&buffer](std::vector<T> data) -> std::vector<T>& {
            return buffer = std::move(data);
        };
    
        auto rng = ranges::view::ints
            | ranges::view::transform(ranges::compose(store, f))
            | ranges::view::join;
    
        unsigned count = 0;
        RANGES_FOR(auto&& i, rng) {
            if (count) std::cout << ' ';
            else std::cout << '\n';
            count = (count + 1) % 8;
            std::cout << i << ',';
        }
    }
    

    Note that the correctness of this approach depends on the fact that view::join is an input range and therefore single-pass.

    "This isn't novice-friendly. Heck, it isn't expert-friendly. Why isn't there some kind of support for 'temporary storage materialization™' in range-v3?"

    Because we haven't gotten around to it - patches welcome ;)

    0 讨论(0)
  • 2020-11-30 02:43

    Edited

    Apparently, the code below violates the rule that views cannot own data they refer to. (However, I don't know if it's strictly forbidden to write something like this.)

    I use ranges::view_facade to create a custom view. It holds a vector returned by f (one at a time), changing it to a range. This makes it possible to use view::join on a range of such ranges. Certainly, we can't have a random or bidirectional access to elements (but view::join itself degrades a range to an Input range), nor can we assign to them.

    I copied struct MyRange from Eric Niebler's repository modifying it slightly.

    #include <iostream>
    #include <range/v3/all.hpp>
    
    using namespace ranges;
    
    std::vector<int> f(int i) {
        return std::vector<int>(static_cast<size_t>(i), i);
    }
    
    template<typename T>
    struct MyRange: ranges::view_facade<MyRange<T>> {
    private:
        friend struct ranges::range_access;
        std::vector<T> data;
        struct cursor {
        private:
            typename std::vector<T>::const_iterator iter;
        public:
            cursor() = default;
            cursor(typename std::vector<T>::const_iterator it) : iter(it) {}
            T const & get() const { return *iter; }
            bool equal(cursor const &that) const { return iter == that.iter; }
            void next() { ++iter; }
            // Don't need those for an InputRange:
            // void prev() { --iter; }
            // std::ptrdiff_t distance_to(cursor const &that) const { return that.iter - iter; }
            // void advance(std::ptrdiff_t n) { iter += n; }
        };
        cursor begin_cursor() const { return {data.begin()}; }
        cursor   end_cursor() const { return {data.end()}; }
    public:
        MyRange() = default;
        explicit MyRange(const std::vector<T>& v) : data(v) {}
        explicit MyRange(std::vector<T>&& v) noexcept : data (std::move(v)) {}
    };
    
    template <typename T>
    MyRange<T> to_MyRange(std::vector<T> && v) {
        return MyRange<T>(std::forward<std::vector<T>>(v));
    }
    
    
    int main() {
        auto src = view::ints(1);        // infinite list
    
        auto rng = src | view::transform(f) | view::transform(to_MyRange<int>) | view::join;
    
        for_each(rng | view::take(42), [](int i) {
            std::cout << i << ' ';
        });
    }
    
    // Output:
    // 1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 9 9 9 9 9 9 
    

    Compiled with gcc 5.3.0.

    0 讨论(0)
  • 2020-11-30 02:48

    The problem here of course is the whole idea of a view - a non-storing layered lazy evaluator. To keep up with this contract, views have to pass around references to range elements, and in general they can handle both rvalue and lvalue references.

    Unfortunately in this specific case view::transform can only provide an rvalue reference as your function f(T t) returns a container by value, and view::join expects an lvalue as it tries to bind views (view::all) to inner containers.

    Possible solutions will all introduce some kind of temporary storage somewhere into the pipeline. Here are the options I came up with:

    • Create a version of view::all that can internally store a container passed by an rvalue reference (As suggested by Barry). From my point of view, this violates the "non-storing view" conception and also requires some painful template coding so I would suggest against this option.
    • Use a temporary container for the whole intermediate state after the view::transform step. Can be done either by hand:

      auto rng1 = src | view::transform(f)
      vector<vector<T>> temp = rng1;
      auto rng = temp | view::join;
      

      Or using action::join. This would result in "premature evaluation", will not work with infinite src, will waste some memory, and overall has a completely different semantics from your original intention, so that is hardly a solution at all, but at least it complies with view class contracts.

    • Wrap a temporary storage around the function you pass into view::transform. The simpliest example is

      const std::vector<T>& f_store(const T& t)
      {
        static std::vector<T> temp;
        temp = f(t);
        return temp;
      }
      

      and then pass f_store to the view::transform. As f_store returns an lvalue reference, view::join will not complain now.

      This of course is somewhat of a hack and will only work if you then streamline the whole range into some sink, like an output container. I believe it will withstand some straightforward transformations, like view::replace or more view::transforms, but anything more complex can try to access this temp storage in non-straightforward order.

      In that case other types of storage can be used, e.g. std::map will fix that problem and will still allow infinite src and lazy evaluation at the expense of some memory:

      const std::vector<T>& fc(const T& t)
      {
          static std::map<T, vector<T>> smap;
          smap[t] = f(t);
          return smap[t];
      }
      

      If your f function is stateless, this std::map can also be used to potentially save some calls. This approach can possibly be improved further if there is a way to guarantee that an element will no longer be required and remove it from the std::map to conserve memory. This however depends on further steps of the pipeline and the evaluation.

    As these 3 solutions pretty much cover all the places to introduce temporary storage between view::transform and view::join, I think these are all the options you have. I would suggest going with #3 as it will allow you to keep the overall semantics intact and it is quite simple to implement.

    0 讨论(0)
  • 2020-11-30 02:52

    UPDATE

    range-v3 now has views::cache1, a view that caches the most recent element in the view object itself, and returns a reference to that object. That is how this problem is cleanly and efficiently solved today, as pointed out by user @bradgonesurfing in his answer.

    Old, out-of-date answer below, preserved for historical curiosity.


    This is another solution that doesn't require much fancy hacking. It comes at the cost of a call to std::make_shared at each call to f. But you're allocating and populating a container in f anyway, so maybe this is an acceptable cost.

    #include <range/v3/core.hpp>
    #include <range/v3/view/iota.hpp>
    #include <range/v3/view/transform.hpp>
    #include <range/v3/view/join.hpp>
    #include <vector>
    #include <iostream>
    #include <memory>
    
    std::vector<int> f(int i) {
        return std::vector<int>(3u, i);
    }
    
    template <class Container>
    struct shared_view : ranges::view_interface<shared_view<Container>> {
    private:
        std::shared_ptr<Container const> ptr_;
    public:
        shared_view() = default;
        explicit shared_view(Container &&c)
        : ptr_(std::make_shared<Container const>(std::move(c)))
        {}
        ranges::range_iterator_t<Container const> begin() const {
            return ranges::begin(*ptr_);
        }
        ranges::range_iterator_t<Container const> end() const {
            return ranges::end(*ptr_);
        }
    };
    
    struct make_shared_view_fn {
        template <class Container,
            CONCEPT_REQUIRES_(ranges::BoundedRange<Container>())>
        shared_view<std::decay_t<Container>> operator()(Container &&c) const {
            return shared_view<std::decay_t<Container>>{std::forward<Container>(c)};
        }
    };
    
    constexpr make_shared_view_fn make_shared_view{};
    
    int main() {
        using namespace ranges;
        auto rng = view::ints | view::transform(compose(make_shared_view, f)) | view::join;
        RANGES_FOR( int i, rng ) {
            std::cout << i << '\n';
        }
    }
    
    0 讨论(0)
  • 2020-11-30 02:53

    It looks like there are now test cases in the range-v3 library that show how to do this correctly. It is necessary to add the views::cache1 operator into the pipeline:

    auto rng = views::iota(0,4)
            | views::transform([](int i) {return std::string(i, char('a'+i));})
            | views::cache1
            | views::join('-');
    check_equal(rng, {'-','b','-','c','c','-','d','d','d'});
    CPP_assert(input_range<decltype(rng)>);
    CPP_assert(!range<const decltype(rng)>);
    CPP_assert(!forward_range<decltype(rng)>);
    CPP_assert(!common_range<decltype(rng)>);
    

    so the solutions for the OP's question would be to write

    auto rng = src | views::transform(f) | views::cache1 | views::join;
    
    0 讨论(0)
  • 2020-11-30 02:53

    I suspect it just can't. None of the views have any machinery to store temporaries anywhere - that's explicitly against the concept of view from the docs:

    A view is a lightweight wrapper that presents a view of an underlying sequence of elements in some custom way without mutating or copying it. Views are cheap to create and copy, and have non-owning reference semantics.

    So in order for that join to work and outlive the expression, something somewhere has to hold onto those temporaries. That something could be an action. This would work (demo):

    auto rng = src | view::transform(f) | action::join;
    

    except obviously not for src being infinite, and even for finite src probably adds too much overhead for you to want to use anyway.

    You would probably have to copy/rewrite view::join to instead use some subtly modified version of view::all (required here) that instead of requiring an lvalue container (and returning an iterator pair into it), allowed for an rvalue container that it would store internally (and returning an iterator pair into that stored version). But that's several hundred lines' worth of copying code, so seems pretty unsatisfactory, even if that works.

    0 讨论(0)
提交回复
热议问题