A while ago, I ran across an article on FingerTrees (See Also an accompanying Stack Overflow Question) and filed the idea away. I have finally found a reason to make use of the
In addition to John Lato's answer, I'll add some specific details about the performance of finger trees, since I spent some time looking at that in the past.
The broad summary is:
Data.Sequence
has great constant factors and asymptotics: it is almost as fast as []
when accessing the front of the list (where both data structures have O(1) asymptotics), and much faster elsewhere in the list (where Data.Sequence
's logarithmic asymptotics trounce []
's linear asymptotics).
Data.FingerTree
has the same asymptotics as Data.Sequence
, but is about an order of magnitude slower.
Just like lists, finger trees have high per-element memory overheads, so they should be combined with chunking for better memory and cache use. Indeed, a few packages do this (yi, trifecta, rope). If Data.FingerTree
could be brought close to Data.Sequence
in performance, I would hope to see a Data.Text.Sequence
type, which implemented a finger tree of Data.Text
values. Such a type would lose the streaming behaviour of Data.Text.Lazy
, but benefit from improved random access and concatenation performance. (Similarly, I would want to see Data.ByteString.Sequence
and Data.Vector.Sequence
.)
The obstacle to implementing these now is that no efficient and generic implementation of finger trees exists (see below where I discuss this further). To produce efficient implementations of Data.Text.Sequence
one would have to completely reimplement finger trees, specialised to Text
- just as Data.Text.Lazy
completely reimplements lists, specialised to Text
. Unfortunately, finger trees are much more complex than lists (especially concatenation!), so this is a considerable amount of work.
So as I see it the answer is:
Data.Text.Sequence
) would be great, but at present the poor performance of Data.FingerTree
means they are not a viable alternative to chunked lists in the common caseMuch of the performance gap between Data.Sequence
and Data.FingerTree
is due to two optimisations in Data.Sequence
:
The measure type is specialised to Int
, so measure manipulations will compile down to efficient integer arithmetic rather
The measure type is unpacked into the Deep constructor, which saves pointer dereferences in the inner loops of the tree operations.
It is possible to apply these optimisations in the general case of Data.FingerTree
by using data families for generic unpacking and by exploiting GHC's inliner and specialiser - see my fingertree-unboxed package, which brings generic finger tree performance almost up to that of Data.Sequence
. Unfortunately, these techniques have some significant problems:
data families for generic unpacking is unpleasant for the user, because they have to define lots of instances. There is no clear solution to this problem.
finger trees use polymorphic recursion, which GHC's specialiser doesn't handle well (1, 2). This means that, to get sufficient specialisation on the measure type, we need lots of INLINE
pragmas, which causes GHC to generate huge amounts of code.
Due to these problems, I never released the package on Hackage.