Why aren't FingerTrees used enough to have a stable implementation?

前端 未结 3 1492
萌比男神i
萌比男神i 2021-02-07 11:31

A while ago, I ran across an article on FingerTrees (See Also an accompanying Stack Overflow Question) and filed the idea away. I have finally found a reason to make use of the

3条回答
  •  春和景丽
    2021-02-07 12:24

    In addition to John Lato's answer, I'll add some specific details about the performance of finger trees, since I spent some time looking at that in the past.

    The broad summary is:

    • Data.Sequence has great constant factors and asymptotics: it is almost as fast as [] when accessing the front of the list (where both data structures have O(1) asymptotics), and much faster elsewhere in the list (where Data.Sequence's logarithmic asymptotics trounce []'s linear asymptotics).

    • Data.FingerTree has the same asymptotics as Data.Sequence, but is about an order of magnitude slower.

    Just like lists, finger trees have high per-element memory overheads, so they should be combined with chunking for better memory and cache use. Indeed, a few packages do this (yi, trifecta, rope). If Data.FingerTree could be brought close to Data.Sequence in performance, I would hope to see a Data.Text.Sequence type, which implemented a finger tree of Data.Text values. Such a type would lose the streaming behaviour of Data.Text.Lazy, but benefit from improved random access and concatenation performance. (Similarly, I would want to see Data.ByteString.Sequence and Data.Vector.Sequence.)

    The obstacle to implementing these now is that no efficient and generic implementation of finger trees exists (see below where I discuss this further). To produce efficient implementations of Data.Text.Sequence one would have to completely reimplement finger trees, specialised to Text - just as Data.Text.Lazy completely reimplements lists, specialised to Text. Unfortunately, finger trees are much more complex than lists (especially concatenation!), so this is a considerable amount of work.

    So as I see it the answer is:

    • specialised finger trees are great, but a lot of work to implement
    • chunked finger trees (e.g. Data.Text.Sequence) would be great, but at present the poor performance of Data.FingerTree means they are not a viable alternative to chunked lists in the common case
    • builders and chunked lists achieve many of the benefits of chunked finger trees, and so they suffice for the common case
    • in the uncommon case where builders and chunked lists don't suffice, we grit our teeth and put up with the poor constant factors of chunked finger trees (e.g. in yi and trifecta).

    Obstacles to an efficient and generic finger tree

    Much of the performance gap between Data.Sequence and Data.FingerTree is due to two optimisations in Data.Sequence:

    • The measure type is specialised to Int, so measure manipulations will compile down to efficient integer arithmetic rather

    • The measure type is unpacked into the Deep constructor, which saves pointer dereferences in the inner loops of the tree operations.

    It is possible to apply these optimisations in the general case of Data.FingerTree by using data families for generic unpacking and by exploiting GHC's inliner and specialiser - see my fingertree-unboxed package, which brings generic finger tree performance almost up to that of Data.Sequence. Unfortunately, these techniques have some significant problems:

    • data families for generic unpacking is unpleasant for the user, because they have to define lots of instances. There is no clear solution to this problem.

    • finger trees use polymorphic recursion, which GHC's specialiser doesn't handle well (1, 2). This means that, to get sufficient specialisation on the measure type, we need lots of INLINE pragmas, which causes GHC to generate huge amounts of code.

    Due to these problems, I never released the package on Hackage.

提交回复
热议问题