问题
Is there any way to implement hash tables efficiently in a purely functional language? It seems like any change to the hash table would require creating a copy of the original hash table. I must be missing something. Hash tables are pretty darn important data structures, and a programming language would be limited without them.
回答1:
Is there any way to implement hash tables efficiently in a purely functional language?
Hash tables are a concrete implementation of the abstract "dictionary" or "associative array" data structure. So I think you really want to ask about the efficiency of purely functional dictionaries compared to imperative hash tables.
It seems like any change to the hash table would require creating a copy of the original hash table.
Yes, hash tables are inherently imperative and there is no direct purely functional equivalent. Perhaps the most similar purely functional dictionary type is the hash trie but they are significantly slower than hash tables due to allocations and indirections.
I must be missing something. Hash tables are pretty darn important data structures, and a programming language would be limited without them.
Dictionaries are a very important data structure (although its worth noting that they were rare in the mainstream until Perl made them popular in the 1990s, so people coded stuff for decades without benefit of dictionaries). I agree that hash tables are also important because they are often by far the most efficient dictionaries.
There are many purely functional dictionaries:
Balanced trees (red-black, AVL, weight-balanced, finger trees etc.), e.g.
Map
in OCaml and F# andData.Map
in Haskell.Hash tries, e.g.
PersistentHashMap
in Clojure.
But these purely functional dictionaries are all much slower than a decent hash table (e.g. the .NET Dictionary
).
Beware Haskell benchmarks comparing hash tables to purely functional dictionaries claiming that purely functional dictionaries are competitively performant. The correct conclusion is that Haskell's hash tables are so inefficient that they are almost as slow as purely functional dictionaries. If you compare with .NET, for example, you find that a .NET Dictionary can be 26× faster than Haskell's hash table!
I think to really conclude what you're trying to conclude about Haskell's performance you would need to test more operations, use a non-ridiculous key-type (doubles as keys, what?), not use
-N8
for no reason, and compare to a 3rd language that also boxes its parametric types, like Java (as Java has acceptable performance in most cases), to see if its a common problem of boxing or some more serious fault of the GHC runtime. These benchmarks are along these lines (and ~2x as fast as the current hashtable implementation).
This is exactly the kind of misinformation I was referring to. Pay no attention to Haskell's hash tables in this context, just look at the performance of the fastest hash tables (i.e. not Haskell) and the fastest purely functional dictionaries.
回答2:
Hash tables can be implemented with something like the ST monad in Haskell, which basically wraps IO actions in a purely functional interface. It does so by forcing the IO actions to be performed sequentially, so it maintains referential transparency: you can't access the old "version" of the hash-table.
See: hackage.haskell.org/package/hashtables
回答3:
The existing answers all have good points to share, and I thought I would just add one more piece of data to the equation: comparing performance of a few different associative data structures.
The test consists of sequentially inserting then looking up and adding the elements of the array. This test isn't incredibly rigorous, and it shouldn't be taken as such, it just an indication of what to expect.
First in Java using HashMap
the unsynchronized Map
implementation:
import java.util.Map;
import java.util.HashMap;
class HashTest {
public static void main (String[] args)
{
Map <Integer, Integer> map = new HashMap<Integer, Integer> ();
int n = Integer.parseInt (args [0]);
for (int i = 0; i < n; i++)
{
map.put (i, i);
}
int sum = 0;
for (int i = 0; i < n; i++)
{
sum += map.get (i);
}
System.out.println ("" + sum);
}
}
Then a Haskell implementation using the recent hashtable work done by Gregory Collins (its in the hashtables
package). This can be both pure (through the ST
monad) or impure through IO
, I'm using the IO
version here:
{-# LANGUAGE ScopedTypeVariables, BangPatterns #-}
module Main where
import Control.Monad
import qualified Data.HashTable.IO as HashTable
import System.Environment
main :: IO ()
main = do
n <- read `fmap` head `fmap` getArgs
ht :: HashTable.BasicHashTable Int Int <- HashTable.new
mapM_ (\v -> HashTable.insert ht v v) [0 .. n - 1]
x <- foldM (\ !s i -> HashTable.lookup ht i >>=
maybe undefined (return . (s +)))
(0 :: Int) [0 .. n - 1]
print x
Lastly, one using the immutable HashMap
implementation from hackage (from the hashmap
package):
module Main where
import Data.List (foldl')
import qualified Data.HashMap as HashMap
import System.Environment
main :: IO ()
main = do
n <- read `fmap` head `fmap` getArgs
let
hashmap =
foldl' (\ht v -> HashMap.insert v v ht)
HashMap.empty [0 :: Int .. n - 1]
let x = foldl' (\ s i -> hashmap HashMap.! i + s) 0 [0 .. n - 1]
print x
Examining the performance for n=10,000,000 , I find the total running time is the following:
- Java HashMap -- 24.387s
- Haskell HashTable -- 7.705s, 41% time in GC (
- Haskell HashMap -- 9.368s, 62% time in GC
Knocking it down to n=1,000,000, we get:
- Java HashMap -- 0.700s
- Haskell HashTable -- 0.723s
- Haskell HashMap -- 0.789s
This is interesting for two reasons:
- The performance is generally pretty close (except where Java diverges above 1M entries)
- A huge amount of time is spent in collection! (killing Java in the case of n=10,0000,000).
This would seem to indicate that in languages like Haskell and Java which have boxed the map's keys see a big hit from this boxing. Languages that either do not need, or can unbox the keys and values would likely see couple times more performance.
Clearly these implementations are not the fastest, but I would say that using Java as a baseline, they are at least acceptable/usable for many purposes (though perhaps someone more familiar with Java wisdom could say whether HashMap is considered reasonable).
I would note that the Haskell HashMap takes up a lot of space compared to the HashTable.
The Haskell programs were compiled with GHC 7.0.3 and -O2 -threaded
, and run with only the +RTS -s
flag for runtime GC statistics. Java was compiled with OpenJDK 1.7.
来源:https://stackoverflow.com/questions/6793259/how-does-one-implement-hash-tables-in-a-functional-language