Using an iterator to Divide an Array into Parts with Unequal Size

I have an array which I need to divide up into 3-element sub-arrays. I wanted to do this with iterators, but I end up iterating past the end of the array and segfaulting even though I don't dereference the iterator. given: auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; I'm doing:

auto bar = cbegin(foo);

for (auto it = next(bar, 3); it < foo.end(); bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << endl; });

Now I can solve this by defining a finish iterator:

auto bar = cbegin(foo);
auto finish = next(cend(foo), -(size(foo) % 3));

for (auto it = next(bar, 3); it != finish; bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, finish, [](const auto& i) { cout << i << endl; });
for_each(finish, cend(foo), [](const auto& i) { cout << i << endl; });

But this seems unnecessary when I don't dereference the iterator. Why can't I do the first version?

Ben Voigt

The reason this is prohibited is covered well at your other question Are iterators past the "one past-the-end" iterator undefined behavior? so I'll just address improved solutions.

For random-access iterators (which you must have if you are using <), there's no need whatsoever for the expensive modulo operation.

The salient points are that:

it + stride fails when it nears the end
end() - stride fails if the container contains too few elements
end() - it is always legal

From there, it's simple algebraic manipulation to change it + stride < end() into a legal form (subtract it from both sides).

The final result, which I have used many times:

for( auto it = c.cbegin(), end = c.cend(); end - it >= stride; it += stride )

The compiler is free to optimize that back to comparison to a precomputed end - stride * sizeof(*it) if the memory model is flat -- the limitations of C++ behavior don't apply to the primitive operations which the compiler translates C++ into.

You may of course use std::distance(it, end) if you prefer to use the named functions instead of operators, but that will only be efficient for random-access iterators.

For use with forward iterators, you should use something that combines the increment and termination conditions like

struct less_preferred { size_t value; less_preferred(size_t v) : value(v){} };

template<typename Iterator>
bool try_advance( Iterator& it, less_preferred step, Iterator end )
{
     while (step.value--) {
         if (it == end) return false;
         ++it;
     }
     return true;
}

With this additional overload, you'll get efficient behavior for random-access iterators:

template<typename RandomIterator>
auto try_advance( RandomIterator& it, size_t stride, RandomIterator end )
     -> decltype(end - it < stride) // SFINAE
{
     if (end - it < stride) return false;
     it += stride;
     return true;
}

Jonathan Mee

The segfault you are seeing is coming from next checking the range for you is an assertion in your Debug implementation to check against undefined behavior. The behavior of iterators and pointers is not defined beyond the their allocated range, and the "one past-the-end" element: Are iterators past the "one past-the-end" iterator undefined behavior?

This means that incrementing past the "one past-the-end" element is undefined behavior independent of the iterator's subsequent use. In order to have defined behavior you must use a solution like your Integer Modulo algorithm or similar, but you will have to change auto it = next(bar, 3) to something that conditionalizes based on the availability of at least the size of your sub-array, so something like: auto it = size(foo) <= 3 ? finish : next(bar, 3).

Where available the best solution here is going to cause the least redundant iteration is to track the size remaining in the container as an integer which does not suffer from undefined behavior when it falls outside the range and "one past-the-end". This can be accomplished by:

auto bar = cbegin(foo);

for (auto i = size(foo); i > STEP; i -= STEP) {
    for(auto j = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
    cout << endl;
}

for(auto i = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
cout << endl;

EDIT: I had previously suggested using pointers which are not Debug conditioned, this is undefined behavior.

The problem is that next is checking the range for you. We use pointers outside of allocated memory all the time, for example nullptr and end, and that's all it here is. If you just use C-style pointer arithmetic here you'll be fine:

auto bar = cbegin(foo);

for (auto it = bar + 3; it < cend(foo); bar = it, it = bar + 3) {
    for_each(bar, it, [](const auto& i) { cout << i << endl; });
}

for_each(bar, cend(foo), [](const auto& i) { cout << '\t' << i << endl; });

Live Example

Alternatively, if you run in Release configuration the range checks should be removed, so you will be able to use the first version of your code.

Jonathan Mee

There is some disagreement about the most effective way to accomplish this iteration through array partitions.

First the one time integer modulo method, this must define auto size in addition to the changes in my answer because gcc does not yet support size:

auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };  
auto size = distance(cbegin(foo), cend(foo));
auto bar = cbegin(foo);
auto finish = prev(cend(foo), size % 3);

for(auto it = size <= 3 ? cend(foo) : next(bar, 3); it != finish; bar = it, it = next(bar, 3)) {
    for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

for_each(bar, finish, [](const auto& i) { cout << i << '\t'; });
cout << endl;
for_each(finish, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;

This creates 112 lines of assembly, most notably the conditional it != finish generates these instructions:

cmpq    %r12, %r13
je      .L19
movq    %r12, %rbx
jmp     .L10

Second the repeated iterator subtraction using Ben Voigt's try_advance but only with the random access specialization because there is a compiler conflict for random access iterators:

auto foo = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };  
auto bar = cbegin(foo);

for (auto it = cbegin(foo), end = cend(foo); try_advance(it, 3, end); bar = it) {
    for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;

This creates 119 lines of assembly, most notably the conditional in try_advance: if (end - it < stride) return false; incurs a per iteration generating the code:

movq    %r12, %rax
subq    %rbp, %rax
cmpq    $11, %rax
ja      .L3

Upon learning that cmpq is really just a subtract and compare operation I have written some bench-marking code: http://coliru.stacked-crooked.com/a/ad869f69c8dbd96f I needed to use Coliru to be able to turn on optimization, but it keeps giving me bogus increments of my test count for times, I'm not sure what's going on there. ~~What I can say is locally, the repeated iterator subtraction is always faster, sometimes significantly so. Upon learning this I believe that Ben Voigt's answer should be marked as the correct one.~~

EDIT:

I've made an interesting discovery. It's the algorithm that goes first that always looses. I've rewriten the code to swap the first algorithm on each pass. When this is done the integer modulo method always beats the iterator subtraction method as would be suspected by looking at the assembly, again something fishy is going on with Coliru, but you can take this code and run it locally: http://coliru.stacked-crooked.com/a/eb3e0c70cc138ecf

The next issue is that both of these algorithms are lazy; in the event that size(foo) is a multiple of 3 they allocate an empty vector at the end of the vector. That requires significant branching for the integer modulo algorithm to remedy, but only the simplest of changes for the repeated iterator subtraction algorithm. The resulting algorithms exhibit effectively equal benchmark numbers but the edge goes to the repeated iterator subtraction for simplicity:

Integer modulo algorithm:

auto bar = cbegin(foo);
const auto size = distance(bar, cend(foo));

if (size <= 3) {
    for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}
else {
    auto finish = prev(cend(testValues), (size - 1) % 3 + 1);

    for (auto it = next(bar, 3); it != finish; bar = it, advance(it, 3)) {
        for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
        cout << endl;
    }

    for_each(bar, finish, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
    for_each(finish, cend(foo), [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

Repeated iterator subtraction algorithm:

auto bar = cbegin(foo);

for (auto it = cbegin(foo); distance(it, cend(foo)) > 3; bar = it) {
    advance(it, 3);
    for_each(bar, it, [](const auto& i) { cout << i << '\t'; });
    cout << endl;
}

for_each(bar, cend(foo), [](const auto& i) { cout << i << '\t'; });
cout << endl;

EDIT: Throwing the Remaining Size Algorithm into the hat

Both the Integer Modulo and Repeated Subtraction Algorithms above suffer from iterating over the input sequence more than once, other than being slower this isn't that serious because currently we're using a Bidirectional Iterator, but should our input iterator fail to qualify for Bidirectional Iterator this would be excessively expensive. Independent of iterator type the Remaining Size Algorithm beats all challengers every time at 10,000,000+ testbench iterations:

auto bar = cbegin(foo);

for (auto i = size(foo); i > STEP; i -= STEP) {
    for(auto j = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
    cout << endl;
}

for(auto i = 0; j < STEP; ++j, ++bar) cout << *bar << '\t';
cout << endl;

I've again copied my local testing to Coliru, which gives weird results but you can verify locally: http://coliru.stacked-crooked.com/a/361f238216cdbace

来源：https://stackoverflow.com/questions/36425393/using-an-iterator-to-divide-an-array-into-parts-with-unequal-size

标签

c++

iterator

modulo

termination

data-partitioning