问题
I'm looking at Data.STM.LinkedList implementation for a high performance linked list. Looking at the documentation, the length function run in O(n) - why is that ? Was there any real issue to implement it in O(1) ?
Here is the source code https://hackage.haskell.org/package/stm-linkedlist-0.1.0.0/docs/src/Data-STM-LinkedList-Internal.html#length
Is it possible implement it in O(1) ? i'm new to Haskell so I'm not sure if holding some metadata about the list is problematic.
Thanks!
回答1:
To a first approximation, Haskell is a sufficiently expressive language that any algorithm implemented in another general purpose language can also be implemented in Haskell while preserving the asymptotic performance characteristics. (This is a pretty low bar. Most general-purpose languages are this expressive.)
In particular, though Haskell most naturally supports immutable data structures, it has sufficient support for mutable data that mutable data structures and their algorithms can usually be fairly directly translated into Haskell code. There may be some overhead (often substantial overhead), and mutable data structures may be significantly more awkward to use than their immutable cousins, but it's still possible.
As a practical matter, though, matching the actual (as opposed to asymptotic) performance of a C++ implementation of a mutable data structure is likely to prove extremely difficult if not impossible. It may be reasonable to get within 2-3 times the performance of C++, and getting within 5-10 times is pretty easy (see below). However, if you need to match C++ performance, you would be probably better off writing the high performance mutating code in C++ and using the FFI (foreign function interface) to interface to that code.
Anyway, a "moderate performance" doubly linked-list with O(1) length
is certainly possible, and there's no fundamental difficulty with maintaining mutable list-wide metadata. The reason that stm-linkedlist
does not provide an O(1) length
is probably the same reason that C++ guaranteed only O(n) std::list<>::size
performance before C++11. Namely, many practical uses of doubly-linked lists don't ever need to call length
/size
, and providing O(1) performance comes with an additional bookkeeping cost.
As a proof of concept, the following data types are sufficient to implement a fully mutable doubly-linked list with an O(1) length function. Here, types and identifiers ending in underscores are for internal use only. The list is strict in its pointers (so no infinite lists!) but lazy in its values.
data List a = List
{ headNode_ :: !(IORef (Node_ a))
, length_ :: !(IORef Int) }
data Node_ a = Node_
{ prev_ :: !(IORef (Node_ a))
, next_ :: !(IORef (Node_ a))
, value_ :: a }
The List
type contains a pointer (i.e., IORef
) to an incomplete headNode
that points to the start and end of the list (or to itself for an empty list) but has an undefined value field. That makes this an unsafe node value, so it should never be directly accessible to the end-user. The List
also contains a pointer to the list length value.
An additional type Node
(no underscore) is used to decorate a node pointer with its corresponding list (like the "iterator" from the comments), to make the list metadata available to functions that need it:
data Node a = Node
{ node_ :: !(IORef (Node_ a))
, list_ :: !(List a) }
Note that List
and Node
are the user-facing data types for working with lists.
You create an empty
list like so:
empty :: IO (List a)
empty = mdo
n <- newIORef (Node_ n n undefined)
List n <$> newIORef 0
Insertion before and after a given node works as follows. Here's where the unsafe head node representation pays off, since the algorithm can treat insertion at the beginning and end of the list as special cases of insertion between the head node and an actual list node.
insertBefore :: a -> Node a -> IO (Node a)
insertBefore x Node{node_=rnode2, list_} = do
Node_{prev_=rnode1} <- readIORef rnode2
insertBetween_ x list_ rnode1 rnode2
insertAfter :: a -> Node a -> IO (Node a)
insertAfter x Node{node_=rnode1, list_} = do
Node_{next_=rnode2} <- readIORef rnode1
insertBetween_ x list_ rnode1 rnode2
insertBetween_ :: a -> List a -> IORef (Node_ a) -> IORef (Node_ a) -> IO (Node a)
insertBetween_ x l rnode1 rnode2 = do
modifyIORef' (length_ l) succ
newnode <- newIORef (Node_ rnode1 rnode2 x)
modifyIORef' rnode1 (\n -> n{next_=newnode})
modifyIORef' rnode2 (\n -> n{prev_=newnode})
return $ Node newnode l
Since a user isn't allowed to "have" a head node, we need additional user-facing functions to insert at the beginning and end of a list:
prepend :: a -> List a -> IO (Node a)
prepend x l = insertAfter x (Node (headNode_ l) l)
append :: a -> List a -> IO (Node a)
append x l = insertBefore x (Node (headNode_ l) l)
Observe that all insertions go through insertBetween_
which is responsible for increasing the length value.
Deletion is straightforward and uniform whether it's an internal node or one at the start or end. All deletions go through this delete
function which is responsible for decreasing the length value.
delete :: Node a -> IO ()
delete Node{node_,list_} = do
modifyIORef' (length_ list_) pred
Node_{next_, prev_} <- readIORef node_
modifyIORef' prev_ (\n -> n{next_=next_})
modifyIORef' next_ (\n -> n{prev_=prev_})
Deletion of the head node would be a disaster, but users aren't allowed to have such a Node
, so we're safe.
If a user has a Node
, she can move back and forth through the list:
prev :: Node a -> IO (Maybe (Node a))
prev Node{node_, list_} = do
Node_{prev_} <- readIORef node_
return $ maybeNode_ prev_ list_
next :: Node a -> IO (Maybe (Node a))
next Node{node_, list_} = do
Node_{next_} <- readIORef node_
return $ maybeNode_ next_ list_
maybeNode_ :: IORef (Node_ a) -> List a -> Maybe (Node a)
maybeNode_ n l =
if n == headNode_ l
then Nothing
else Just (Node n l)
Note that we must take care never to give the user the head node, so maybeNode_
here checks for it and returns Nothing
instead.
To get started, the user can get the start or end of a List
using the following functions (which use prev
or next
on the forbidden head node):
start :: List a -> IO (Maybe (Node a))
start l = next $ Node (headNode_ l) l
end :: List a -> IO (Maybe (Node a))
end l = prev $ Node (headNode_ l) l
All that's missing are a few miscellaneous query functions:
value :: Node a -> IO a
value = fmap value_ . readIORef . node_
null :: List a -> IO Bool
null l = (==0) <$> length l
length :: List a -> IO Int
length = readIORef . length_
some utilities to convert to plain lists:
toList :: List a -> IO [a]
toList = toList_ next_
toListRev :: List a -> IO [a]
toListRev = toList_ prev_
toList_ :: (Node_ a -> IORef (Node_ a)) -> List a -> IO [a]
toList_ dir l = go =<< readIORef h
where h = headNode_ l
go n = do
if dir n == h then return []
else do
n' <- readIORef (dir n)
(value_ n':) <$> go n'
and a Show
instance for debugging:
instance (Show a) => Show (List a) where
showsPrec d lst = showParen (d > 10) $ showString "fromList " . showsPrec 11 (unsafePerformIO $ toList lst)
WARNING: This Show
instance is unsafe if the list is mutated before the generated string is fully evaluated, so it should only be used for debugging (and probably removed from a production version).
Also, while it's not strictly necessary since we can delete and re-insert, no self-respecting mutable structure would be complete without in-place modification of elements:
modify :: (a -> a) -> Node a -> IO ()
modify f Node{node_} = modifyIORef' node_ (\n -> n { value_ = f (value_ n) })
Here's the full code. (See the definition ex1
for example usage.) You're welcome to use it as a starting point for your own implementation. It's untested and unbenchmarked, except that a couple of quick tests show that it's probably about 5-10x slower than a C++ implementation.
{-# LANGUAGE NamedFieldPuns, RecursiveDo #-}
module LinkedList
( List, Node
, value, null, length
, empty, prepend, append, insertBefore, insertAfter, delete, modify
, prev, next, start, end
, toList, toListRev
) where
import System.IO.Unsafe
import Control.Monad
import Prelude hiding (null, length)
import Data.IORef
data List a = List
{ headNode_ :: !(IORef (Node_ a))
, length_ :: !(IORef Int) }
data Node a = Node
{ node_ :: !(IORef (Node_ a))
, list_ :: !(List a) }
data Node_ a = Node_
{ prev_ :: !(IORef (Node_ a))
, next_ :: !(IORef (Node_ a))
, value_ :: a }
-- unsafe show instance: remove from production version
instance (Show a) => Show (List a) where
showsPrec d lst = showParen (d > 10) $ showString "fromList " . showsPrec 11 (unsafePerformIO $ toList lst)
value :: Node a -> IO a
value = fmap value_ . readIORef . node_
null :: List a -> IO Bool
null l = (==0) <$> length l
length :: List a -> IO Int
length = readIORef . length_
empty :: IO (List a)
empty = mdo
n <- newIORef (Node_ n n undefined)
List n <$> newIORef 0
prepend :: a -> List a -> IO (Node a)
prepend x l = insertAfter x (Node (headNode_ l) l)
append :: a -> List a -> IO (Node a)
append x l = insertBefore x (Node (headNode_ l) l)
insertBefore :: a -> Node a -> IO (Node a)
insertBefore x Node{node_=rnode2, list_} = do
Node_{prev_=rnode1} <- readIORef rnode2
insertBetween_ x list_ rnode1 rnode2
insertAfter :: a -> Node a -> IO (Node a)
insertAfter x Node{node_=rnode1, list_} = do
Node_{next_=rnode2} <- readIORef rnode1
insertBetween_ x list_ rnode1 rnode2
insertBetween_ :: a -> List a -> IORef (Node_ a) -> IORef (Node_ a) -> IO (Node a)
insertBetween_ x l rnode1 rnode2 = do
modifyIORef' (length_ l) succ
newnode <- newIORef (Node_ rnode1 rnode2 x)
modifyIORef' rnode1 (\n -> n{next_=newnode})
modifyIORef' rnode2 (\n -> n{prev_=newnode})
return $ Node newnode l
delete :: Node a -> IO ()
delete Node{node_,list_} = do
modifyIORef' (length_ list_) pred
Node_{next_, prev_} <- readIORef node_
modifyIORef' prev_ (\n -> n{next_=next_})
modifyIORef' next_ (\n -> n{prev_=prev_})
modify :: (a -> a) -> Node a -> IO ()
modify f Node{node_} = modifyIORef' node_ (\n -> n { value_ = f (value_ n) })
prev :: Node a -> IO (Maybe (Node a))
prev Node{node_, list_} = do
Node_{prev_} <- readIORef node_
return $ maybeNode_ prev_ list_
next :: Node a -> IO (Maybe (Node a))
next Node{node_, list_} = do
Node_{next_} <- readIORef node_
return $ maybeNode_ next_ list_
maybeNode_ :: IORef (Node_ a) -> List a -> Maybe (Node a)
maybeNode_ n l =
if n == headNode_ l
then Nothing
else Just (Node n l)
start :: List a -> IO (Maybe (Node a))
start l = next $ Node (headNode_ l) l
end :: List a -> IO (Maybe (Node a))
end l = prev $ Node (headNode_ l) l
toList :: List a -> IO [a]
toList = toList_ next_
toListRev :: List a -> IO [a]
toListRev = toList_ prev_
toList_ :: (Node_ a -> IORef (Node_ a)) -> List a -> IO [a]
toList_ dir l = go =<< readIORef h
where h = headNode_ l
go n = do
if dir n == h then return []
else do
n' <- readIORef (dir n)
(value_ n':) <$> go n'
ex1 :: IO (List Int)
ex1 = do
t <- empty
mapM_ (flip prepend t) [10,9..1]
mapM_ (flip append t) [11..20]
return t
来源:https://stackoverflow.com/questions/61550527/data-stm-linkedlist-implementation