Red-Black Trees

前端 未结 12 2203
逝去的感伤
逝去的感伤 2020-12-07 10:15

I\'ve seen binary trees and binary searching mentioned in several books I\'ve read lately, but as I\'m still at the beginning of my studies in Computer Science, I\'ve yet to

相关标签:
12条回答
  • 2020-12-07 11:08

    If you would like to see how a Red-Black tree is supposed to look graphically, I have coded an implementation of a Red-Black tree that you can download here

    0 讨论(0)
  • 2020-12-07 11:10

    Red Black Trees and B-trees are used in all sorts of persistent storage; because the trees are balanced the performance of breadth and depth traversals are mitigated.

    Nearly all modern database systems use trees for data storage.

    0 讨论(0)
  • 2020-12-07 11:12

    Red-black trees stay balanced, so you don't have to traverse deep to get items out. The time saved makes RB trees O(log()n)) in the WORST case, whereas unlucky binary trees can get into a lop sided configuration and cause retrievals in O(n) a bad case. This does happen in practice or on random data. So if you need time critical code (database retrievals, network server etc.) you use RB trees to support ordered or unordered lists/sets .

    But RBTrees are for noobs! If you are doing AI and you need to perform a search, you find you fork the state information alot. You can use a persistent red-black to fork new states in O(log(n)). A persistent red black tree keeps a copy of the tree before and after a morphological operation (insert/delete), but without copying the entire tree (normally and O(log(n)) operation). I have open sourced a persistent red-black tree for java

    http://edinburghhacklab.com/2011/07/a-java-implementation-of-persistent-red-black-trees-open-sourced/

    0 讨论(0)
  • 2020-12-07 11:13

    Red Black trees are good for creating well-balanced trees. The major problem with binary search trees is that you can make them unbalanced very easily. Imagine your first number is a 15. Then all the numbers after that are increasingly smaller than 15. You'll have a tree that is very heavy on the left side and has nothing on the right side.

    Red Black trees solve that by forcing your tree to be balanced whenever you insert or delete. It accomplishes this through a series of rotations between ancestor nodes and child nodes. The algorithm is actually pretty straightforward, although it is a bit long. I'd suggest picking up the CLRS (Cormen, Lieserson, Rivest and Stein) textbook, "Introduction to Algorithms" and reading up on RB Trees.

    The implementation is also not really so short so it's probably not really best to include it here. Nevertheless, trees are used extensively for high performance apps that need access to lots of data. They provide a very efficient way of finding nodes, with a relatively small overhead of insertion/deletion. Again, I'd suggest looking at CLRS to read up on how they're used.

    While BSTs may not be used explicitly - one example of the use of trees in general are in almost every single modern RDBMS. Similarly, your file system is almost certainly represented as some sort of tree structure, and files are likewise indexed that way. Trees power Google. Trees power just about every website on the internet.

    0 讨论(0)
  • 2020-12-07 11:16

    The best description of red-black trees I have seen is the one in Cormen, Leisersen and Rivest's 'Introduction to Algorithms'. I could even understand it enough to partially implement one (insertion only). There are also quite a few applets such as This One on various web pages that animate the process and allow you to watch and step through a graphical representation of the algorithm building a tree structure.

    0 讨论(0)
  • 2020-12-07 11:16

    Lots and lots of heat here, but not much light, so lets see if we can provide some.

    First, a RB tree is an associative data structure, unlike, say an array, which cannot take a key and return an associated value, well, unless that's an integer "key" in a 0% sparse index of contiguous integers. An array cannot grow in size either (yes, I know about realloc() too, but under the covers that requires a new array and then a memcpy()), so if you have either of these requirements, an array won't do. An array's memory efficiency is perfect. Zero waste, but not very smart, or flexible - realloc() not withstanding.

    Second, in contrast to a bsearch() on an array of elements, which IS an associative data structure, a RB tree can grow (AND shrink) itself in size dynamically. The bsearch() works fine for indexing a data structure of a known size, which will remain that size. So if you don't know the size of your data in advance, or new elements need to be added, or deleted, a bsearch() is out. Bsearch() and qsort() are both well supported in classic C, and have good memory efficiency, but are not dynamic enough for many applications. They are my personal favorite though because they're quick, easy, and if you're not dealing with real-time apps, quite often are flexible enough. In addition, in C/C++ you can sort an array of pointers to data records, pointing to the struc{} member, for example, you wish to compare, and then rearranging the pointer in the pointer array such that reading the pointers in order at the end of the pointer sort yields your data in sorted order. Using this with memory-mapped data files is extremely memory efficient, fast, and fairly easy. All you need to do is add a few "*"s to your compare function/s.

    Third, in contrast to a hashtable, which also must be a fixed size, and cannot be grown once filled, a RB tree will automagically grow itself and balance itself to maintain its O(log(n)) performance guarantee. Especially if the RB tree's key is an int, it can be faster than a hash, because even though a hashtable's complexity is O(1), that 1 can be a very expensive hash calculation. A tree's multiple 1-clock integer compares often outperform 100-clock+ hash calculations, to say nothing of rehashing, and malloc()ing space for hash collisions and rehashes. Finally, if you want ISAM access, as well as key access to your data, a hash is ruled out, as there is no ordering of the data inherent in the hashtable, in contrast to the natural ordering of data in any tree implementation. The classic use for a hash table is to provide keyed access to a table of reserved words for a compiler. It's memory efficiency is excellent.

    Fourth, and very low on any list, is the linked, or doubly-linked list, which, in contrast to an array, naturally supports element insertions and deletions, and as that implies, resizing. It's the slowest of all the data structures, as each element only knows how to get to the next element, so you have to search, on average, (element_knt/2) links to find your datum. It is mostly used where insertions and deletions somewhere in the middle of the list are common, and especially, where the list is circular and feeds an expensive process which makes the time to read the links relatively small. My general RX is to use an arbitrarily large array instead of a linked list if your only requirement is that it be able to increase in size. If you run out of size with an array, you can realloc() a larger array. The STL does this for you "under the covers" when you use a vector. Crude, but potentially 1,000s of times faster if you don't need insertions, deletions or keyed lookups. It's memory efficiency is poor, especially for doubly-linked lists. In fact, a doubly-linked list, requiring two pointers, is exactly as memory inefficient as a red-black tree while having NONE of its appealing fast, ordered retrieval characteristics.

    Fifth, trees support many additional operations on their sorted data than any other data structure. For example, many database queries make use of the fact that a range of leaf values can be easily specified by specifying their common parent, and then focusing subsequent processing on the part of the tree that parent "owns". The potential for multi-threading offered by this approach should be obvious, as only a small region of the tree needs to be locked - namely, only the nodes the parent owns, and the parent itself.

    In short, trees are the Cadillac of data structures. You pay a high price in terms of memory used, but you get a completely self-maintaining data structure. This is why, as pointed out in other replies here, transaction databases use trees almost exclusively.

    0 讨论(0)
提交回复
热议问题