edit distance algorithm in Haskell - performance tuning

前端 未结 6 1736
陌清茗
陌清茗 2020-12-28 09:18

I\'m trying to implement the levenshtein distance (or edit distance) in Haskell, but its performance decreases rapidly when the string lenght increases.

I\'m still q

6条回答
  •  礼貌的吻别
    2020-12-28 09:48

    This version is much quicker than those memorised versions, but still I would love to have it even quicker. Works fine with 100's character long strings. I was written with other distances in mind(change the init function and cost) , and use classical dynamic programming array trick. The long line could be converted into a separate function with top 'do', but I like this way.

    import Data.Array.IO
    import System.IO.Unsafe
    
    editDistance = dist ini med
    
    dist :: (Int -> Int -> Int) -> (a -> a -> Int ) -> [a] -> [a] -> Int
    dist i f a b  = unsafePerformIO $ distM i f a b
    
    -- easy to create other distances 
    ini i 0 = i
    ini 0 j = j
    ini _ _ = 0
    med a b = if a == b then 0 else 2
    
    
    distM :: (Int -> Int -> Int) -> (a -> a -> Int) -> [a] -> [a] -> IO Int
    distM ini f a b = do
            let la = length a
            let lb = length b
    
            arr <- newListArray ((0,0),(la,lb)) [ini i j | i<- [0..la], j<-[0..lb]] :: IO (IOArray (Int,Int) Int)
    
    -- all on one line
            mapM_ (\(i,j) -> readArray arr (i-1,j-1) >>= \ld -> readArray arr (i-1,j) >>= \l -> readArray arr (i,j-1) >>= \d-> writeArray arr (i,j) $ minimum [l+1,d+1, ld + (f (a !! (i-1) ) (b !! (j-1))) ] ) [(i,j)| i<-[1..la], j<-[1..lb]]
    
            readArray arr (la,lb)
    

提交回复
热议问题