A while ago, I ran across an article on FingerTrees (See Also an accompanying Stack Overflow Question) and filed the idea away. I have finally found a reason to make use of the
To answer your question about finger trees in particular, I think the problem is that they have relatively high constant costs compared to arrays, and are more complex than other ways of achieving efficient concatenation. A Builder has a more efficient interface for just appending chunks, and they're usually readily available (see the links in @informatikr's answer). Suppose that Data.Text.Lazy
is implemented with a linked list of chunks, and you're creating a Data.Text.Lazy
from a builder. Unless you have a lot of chunks (probably more than 50), or are accessing data near the end of the list repeatedly, the high constant cost of a finger tree probably isn't worth it.
The Data.Sequence
implementation is specialized for performance reasons, and isn't as general as the full interface provided by the fingertree
package. That's why it isn't exported; it's not really possible to use it for anything other than a Sequence
.
I also suspect that many programmers are at a loss as to how to actually use the monoidal annotation, as it's behind a fairly significant abstraction barrier. So many people wouldn't use it because they don't see how it can be useful compared to other data types.
I didn't really get it until I read Chung-chieh Shan's blog series on word numbers (part2, part3, part4). That's proof that the idea can definitely be used in practical code.
In your case, if you need to both inspect partial results and have efficient appends, using a fingertree may be better than a builder. Depending on the builder's implementation, you may end up doing a lot of repeated work as you convert to Text
, add more stuff to the builder, convert to Text
again, etc. It would depend on your usage pattern though.
You might be interested in my splaytree package, which provides splay trees with monoidal annotations, and several different structures build upon them. Other than the splay tree itself, the Set
and RangeSet
modules have more-or-less complete API's, the Sequence
module is mostly a skeleton I used for testing. It's not a "batteries included" solution to what you're looking for (again, @informatikr's answer provides those), but if you want to experiment with monoidal annotations it may be more useful than Data.FingerTree
. Be aware that a splay tree can get unbalanced if you traverse all the elements in sequence (or continually snoc onto the end, or similar), but if appends and lookups are interleaved performance can be excellent.
Ignoring your Finger Tree question and only responding to your further explanation: did you look into Data.Text.Lazy.Builder or, specifically for building HTML, blaze-html?
Both allow fast concatenation. For slicing, if that is important for solving your problem, they might not have ideal performance.
In addition to John Lato's answer, I'll add some specific details about the performance of finger trees, since I spent some time looking at that in the past.
The broad summary is:
Data.Sequence
has great constant factors and asymptotics: it is almost as fast as []
when accessing the front of the list (where both data structures have O(1) asymptotics), and much faster elsewhere in the list (where Data.Sequence
's logarithmic asymptotics trounce []
's linear asymptotics).
Data.FingerTree
has the same asymptotics as Data.Sequence
, but is about an order of magnitude slower.
Just like lists, finger trees have high per-element memory overheads, so they should be combined with chunking for better memory and cache use. Indeed, a few packages do this (yi, trifecta, rope). If Data.FingerTree
could be brought close to Data.Sequence
in performance, I would hope to see a Data.Text.Sequence
type, which implemented a finger tree of Data.Text
values. Such a type would lose the streaming behaviour of Data.Text.Lazy
, but benefit from improved random access and concatenation performance. (Similarly, I would want to see Data.ByteString.Sequence
and Data.Vector.Sequence
.)
The obstacle to implementing these now is that no efficient and generic implementation of finger trees exists (see below where I discuss this further). To produce efficient implementations of Data.Text.Sequence
one would have to completely reimplement finger trees, specialised to Text
- just as Data.Text.Lazy
completely reimplements lists, specialised to Text
. Unfortunately, finger trees are much more complex than lists (especially concatenation!), so this is a considerable amount of work.
So as I see it the answer is:
Data.Text.Sequence
) would be great, but at present the poor performance of Data.FingerTree
means they are not a viable alternative to chunked lists in the common caseMuch of the performance gap between Data.Sequence
and Data.FingerTree
is due to two optimisations in Data.Sequence
:
The measure type is specialised to Int
, so measure manipulations will compile down to efficient integer arithmetic rather
The measure type is unpacked into the Deep constructor, which saves pointer dereferences in the inner loops of the tree operations.
It is possible to apply these optimisations in the general case of Data.FingerTree
by using data families for generic unpacking and by exploiting GHC's inliner and specialiser - see my fingertree-unboxed package, which brings generic finger tree performance almost up to that of Data.Sequence
. Unfortunately, these techniques have some significant problems:
data families for generic unpacking is unpleasant for the user, because they have to define lots of instances. There is no clear solution to this problem.
finger trees use polymorphic recursion, which GHC's specialiser doesn't handle well (1, 2). This means that, to get sufficient specialisation on the measure type, we need lots of INLINE
pragmas, which causes GHC to generate huge amounts of code.
Due to these problems, I never released the package on Hackage.