I find it very common to want to model relational data in my functional programs. For example, when developing a web-site I may want to have the following data structure to stor
The ixset library will help you with this. It's the library that backs the relational part of acid-state, which also handles versioned serialization of your data and/or concurrency guarantees, in case you need it.
The thing about ixset
is that it manages "keys" for your data entries automatically.
For your example, one would create one-to-many relationships for your data types like this:
data User =
User
{ name :: String
, birthDate :: Date
} deriving (Ord, Typeable)
data Message =
Message
{ user :: User
, timestamp :: Date
, content :: String
} deriving (Ord, Typeable)
instance Indexable Message where
empty = ixSet [ ixGen (Proxy :: Proxy User) ]
You can then find the message of a particular user. If you have built up an IxSet
like this:
user1 = User "John Doe" undefined
user2 = User "John Smith" undefined
messageSet =
foldr insert empty
[ Message user1 undefined "bla"
, Message user2 undefined "blu"
]
... you can then find messages by user1
with:
user1Messages = toList $ messageSet @= user1
If you need to find the user of a message, just use the user
function like normal. This models a one-to-many relationship.
Now, for many-to-many relations, with a situation like this:
data User =
User
{ name :: String
, birthDate :: Date
, messages :: [Message]
} deriving (Ord, Typeable)
data Message =
Message
{ users :: [User]
, timestamp :: Date
, content :: String
} deriving (Ord, Typeable)
... you create an index with ixFun
, which can be used with lists of indexes. Like so:
instance Indexable Message where
empty = ixSet [ ixFun users ]
instance Indexable User where
empty = ixSet [ ixFun messages ]
To find all the messages by an user, you still use the same function:
user1Messages = toList $ messageSet @= user1
Additionally, provided that you have an index of users:
userSet =
foldr insert empty
[ User "John Doe" undefined [ messageFoo, messageBar ]
, User "John Smith" undefined [ messageBar ]
]
... you can find all the users for a message:
messageFooUsers = toList $ userSet @= messageFoo
If you don't want to have to update the users of a message or the messages of a user when adding a new user/message, you should instead create an intermediary data type that models the relation between users and messages, just like in SQL (and remove the users
and messages
fields):
data UserMessage = UserMessage { umUser :: User, umMessage :: Message }
instance Indexable UserMessage where
empty = ixSet [ ixGen (Proxy :: Proxy User), ixGen (Proxy :: Proxy Message) ]
Creating a set of these relations would then let you query for users by messages and messages for users without having to update anything.
The library has a very simple interface considering what it does!
EDIT: Regarding your "costly data that needs to be compared": ixset
only compares the fields that you specify in your index (so to find all the messages by a user in the first example, it compares "the whole user").
You regulate which parts of the indexed field it compares by altering the Ord
instance. So, if comparing users is costly for you, you can add an userId
field and modify the instance Ord User
to only compare this field, for example.
This can also be used to solve the chicken-and-egg problem: what if you have an id, but neither a User
, nor a Message
?
You could then simply create an explicit index for the id, find the user by that id (with userSet @= (12423 :: Id)
) and then do the search.
IxSet is the ticket. To help others who might stumble on this post here's a more fully expressed example,
{-# LANGUAGE OverloadedStrings, DeriveDataTypeable, TypeFamilies, TemplateHaskell #-}
module Main (main) where
import Data.Int
import Data.Data
import Data.IxSet
import Data.Typeable
-- use newtype for everything on which you want to query;
-- IxSet only distinguishes indexes by type
data User = User
{ userId :: UserId
, userName :: UserName }
deriving (Eq, Typeable, Show, Data)
newtype UserId = UserId Int64
deriving (Eq, Ord, Typeable, Show, Data)
newtype UserName = UserName String
deriving (Eq, Ord, Typeable, Show, Data)
-- define the indexes, each of a distinct type
instance Indexable User where
empty = ixSet
[ ixFun $ \ u -> [userId u]
, ixFun $ \ u -> [userName u]
]
-- this effectively defines userId as the PK
instance Ord User where
compare p q = compare (userId p) (userId q)
-- make a user set
userSet :: IxSet User
userSet = foldr insert empty $ fmap (\ (i,n) -> User (UserId i) (UserName n)) $
zip [1..] ["Bob", "Carol", "Ted", "Alice"]
main :: IO ()
main = do
-- Here, it's obvious why IxSet needs distinct types.
showMe "user 1" $ userSet @= (UserId 1)
showMe "user Carol" $ userSet @= (UserName "Carol")
showMe "users with ids > 2" $ userSet @> (UserId 2)
where
showMe :: (Show a, Ord a) => String -> IxSet a -> IO ()
showMe msg items = do
putStr $ "-- " ++ msg
let xs = toList items
putStrLn $ " [" ++ (show $ length xs) ++ "]"
sequence_ $ fmap (putStrLn . show) xs
Another radically different approach to representing relational data is used by the database package haskelldb. It doesn't work quite like the types you describe in your example, but it is designed to allow a type-safe interface to SQL queries. It has tools for generating data types from a database schema and vice versa. Data types such as the ones you describe work well if you always want to work with whole rows. But they don't work in situations where you want to optimize your queries by only selecting certain columns. This is where the HaskellDB approach can be useful.
I don't have a complete solution, but I suggest taking a look at the ixset package; it provides a set type with an arbitrary number of indices that lookups can be performed with. (It's intended to be used with acid-state for persistence.)
You do still need to manually maintain a "primary key" for each table, but you could make it significantly easier in a few ways:
Adding a type parameter to Id
, so that, for instance, a User
contains an Id User
rather than just an Id
. This ensures you don't mix up Id
s for separate types.
Making the Id
type abstract, and offering a safe interface to generating new ones in some context (like a State
monad that keeps track of the relevant IxSet
and the current highest Id
).
Writing wrapper functions that let you, for example, supply a User
where an Id User
is expected in queries, and that enforce invariants (for example, if every Message
holds a key to a valid User
, it could allow you to look up the corresponding User
without handling a Maybe
value; the "unsafety" is contained within this helper function).
As an additional note, you don't actually need a tree structure for regular data types to work, since they can represent arbitrary graphs; however, this makes simple operations like updating a user's name impossible.
I've been asked to write an answer using Opaleye. In fact there's not an awful lot to say, as the Opaleye code is fairly standard once you have a database schema. Anyway, here it is, assuming there is a user_table
with columns user_id
, name
and birthdate
, and a message_table
with columns user_id
, time_stamp
and content
.
This sort of design is explained in more detail in the Opaleye Basic Tutorial.
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE Arrows #-}
import Opaleye
import Data.Profunctor.Product (p2, p3)
import Data.Profunctor.Product.TH (makeAdaptorAndInstance)
import Control.Arrow (returnA)
data UserId a = UserId { unUserId :: a }
$(makeAdaptorAndInstance "pUserId" ''UserId)
data User' a b c = User { userId :: a
, name :: b
, birthDate :: c }
$(makeAdaptorAndInstance "pUser" ''User')
type User = User' (UserId (Column PGInt4))
(Column PGText)
(Column PGDate)
data Message' a b c = Message { user :: a
, timestamp :: b
, content :: c }
$(makeAdaptorAndInstance "pMessage" ''Message')
type Message = Message' (UserId (Column PGInt4))
(Column PGDate)
(Column PGText)
userTable :: Table User User
userTable = Table "user_table" (pUser User
{ userId = pUserId (UserId (required "user_id"))
, name = required "name"
, birthDate = required "birthdate" })
messageTable :: Table Message Message
messageTable = Table "message_table" (pMessage Message
{ user = pUserId (UserId (required "user_id"))
, timestamp = required "timestamp"
, content = required "content" })
An example query which joins the user table to the message table on the user_id
field:
usersJoinMessages :: Query (User, Message)
usersJoinMessages = proc () -> do
aUser <- queryTable userTable -< ()
aMessage <- queryTable messageTable -< ()
restrict -< unUserId (userId aUser) .== unUserId (user aMessage)
returnA -< (aUser, aMessage)