What does enabling STL iterator debugging really do?

前端 未结 3 1769
日久生厌
日久生厌 2021-01-04 18:51

I\'ve enabled iterator debugging in an application by defining

_HAS_ITERATOR_DEBUGGING = 1

I was expecting this to really just check vector

3条回答
  •  鱼传尺愫
    2021-01-04 19:28

    There is a number of operations with iterators which lead to undefined behavior, the goal of this trigger is to activate runtime checks to prevent it from occurring (using asserts).

    The issue

    The obvious operation is to use an invalid iterator, but this invalidity may arise from various reasons:

    • Uninitialized iterator
    • Iterator to an element that has been erased
    • Iterator to an element which physical location has changed (reallocation for a vector)
    • Iterator outside of [begin, end)

    The standard specifies in excruciating details for each container which operation invalidates which iterator.

    There is also a somehow less obvious reason that people tend to forget: mixing iterators to different containers:

    std::vector cats, dogs;
    
    for_each(cats.begin(), dogs.end(), /**/); // obvious bug
    

    This pertain to a more general issue: the validity of ranges passed to the algorithms.

    • [cats.begin(), dogs.end()) is invalid (unless one is an alias for the other)
    • [cats.end(), cats.begin()) is invalid (unless cats is empty ??)

    The solution

    The solution consists in adding information to the iterators so that their validity and the validity of the ranges they defined can be asserted during execution thus preventing undefined behavior to occur.

    The _HAS_ITERATOR_DEBUGGING symbol serves as a trigger to this capability, because it unfortunately slows down the program. It's quite simple in theory: each iterator is made an Observer of the container it's issued from and is thus notified of the modification.

    In Dinkumware this is achieved by two additions:

    • Each iterator carries a pointer to its related container
    • Each container holds a linked list of the iterators it created

    And this neatly solves our problems:

    • An uninitialized iterator does not have a parent container, most operations (apart from assignment and destruction) will trigger an assertion
    • An iterator to an erased or moved element has been notified (thanks to the list) and know of its invalidity
    • On incrementing and decrementing an iterator it can checks it stays within the bounds
    • Checking that 2 iterators belong to the same container is as simple as comparing their parent pointers
    • Checking the validity of a range is as simple as checking that we reach the end of the range before we reach the end of the container (linear operation for those containers which are not randomly accessible, thus most of them)

    The cost

    The cost is heavy, but does correctness have a price? We can break down the cost:

    • extra memory allocation (the extra list of iterators maintained): O(NbIterators)
    • notification process on mutating operations: O(NbIterators) (Note that push_back or insert do not necessarily invalidate all iterators, but erase does)
    • range validity check: O( min(last-first, container.end()-first) )

    Most of the library algorithms have of course been implemented for maximum efficiency, typically the check is done once and for all at the beginning of the algorithm, then an unchecked version is run. Yet the speed might severely slow down, especially with hand-written loops:

    for (iterator_t it = vec.begin();
         it != vec.end();              // Oops
         ++it)
    // body
    

    We know the Oops line is bad taste, but here it's even worse: at each run of the loop, we create a new iterator then destroy it which means allocating and deallocating a node for vec's list of iterators... Do I have to underline the cost of allocating/deallocating memory in a tight loop ?

    Of course, a for_each would not encounter such an issue, which is yet another compelling case toward the use of STL algorithms instead of hand-coded versions.

提交回复
热议问题