Trouble implementing a “rope” data structure in C++

前端 未结 2 1968
一整个雨季
一整个雨季 2021-01-20 05:38

I\'m trying to make a rope data structure. It\'s a type of binary tree, i.e. a recursive data structure.

The purpose of a rope is that splitting and concatenation sh

相关标签:
2条回答
  • 2021-01-20 06:03

    In practice, though, it involves a heap allocation for (almost?) every modification of the strings.

    If you want to avoid frequent heap allocation performance issues, then I'd suggest using a memory pool for your class that allocates a chunk of memory, and only needs to request a new allocation from the OS when it's full, which should occur pretty infrequently. You can then optimize accesses to the memory pool for allocating small-block data-types like char, etc.

    Andrei Alexandrescu has a great example of a small-block memory allocator in his "Modern C++ Design" book.

    0 讨论(0)
  • 2021-01-20 06:24

    The traditional approach would be copy-on-write: for this, you need to refcount each allocated node.

    If the modified node has a refcount of 1 (no-one else refers to it), you don't need to duplicate it.

    The practical problem with this is usefully segregating mutating from non-mutating operations:

        char& rope::operator[] (std::string::pos)
    

    may mutate the referenced char, but there's no trivial way to force selection of the const overload when it actually won't. This means you either have to assume mutation will happen, and possibly trigger an unnecessary copy, or return instead some proxy which overloads char conversion and assignment.

    This approach was tried for early implementations of std::string (where a string is equivalent to a single-node rope) iirc, and fell out of favour; partly because of the mutation problem, and partly because COW and the required refcounting become increasingly expensive if you have to worry about multi-threading.


    As you say, the rope has an additional problem if your node contains two independent types of state: its own string and references to its child nodes (since this causes refcount/child reference mutations to propogate up the tree).

    If instead the characters are stored only at the leaf nodes, and you do a full copy of the non-leaf nodes (so each rope has its own "directory structure") you still avoid copying the characters, and the refcounted shared state is much simpler.

    Does this get the logarithmic-time concatenation you want? Perhaps not:

    • you have to copy all the non-leaf nodes (and add a new root), and the number of these is log the number of leaves
    • you also have to increment the leaf refcounts though, which is linear

    Whether it looks closer to linear or logarithmic time depends on the relative cost of incrementing a refcount versus copying the directory tree.

    Without this though, you get fast concatenations but arbitrary character accesses may (unpredictably) degenerate to logarithmic time if they have to propagate a COW operation up the tree.

    I guess, if it suits your use case, you could implement move concatenations: this would give possibly constant-time addition and you still avoid the extra COW complexity.

    0 讨论(0)
提交回复
热议问题