问题
I wrote quantification functions exists
, forall
, and none
for Haskell's build-in []
list data type. On multiple occasions, these seemed to prove much more efficient than Prelude
/Data.List
s any
and all
. I naively suspect that this performance is due to any
and all
being implemented using Θ(n) folds. Since I am relatively new to Haskell, I think I must be mistaken, or that there would be a good reason for this phenomenon.
From Data.Foldable
:
-- | Determines whether any element of the structure satisfies the predicate.
any :: Foldable t => (a -> Bool) -> t a -> Bool
any p = getAny #. foldMap (Any #. p)
-- | Determines whether all elements of the structure satisfy the predicate.
all :: Foldable t => (a -> Bool) -> t a -> Bool
all p = getAll #. foldMap (All #. p)
My implementations:
exists :: (a -> Bool) -> [a] -> Bool
exists _ [] = False
exists pred (x : xs) | pred x = True
| otherwise = exists pred xs
and
forall pred = not . exists (not . pred)
none pred = not . exists pred = forall (not . pred)
Eliminating boolean inversions:
forall, none :: (a -> Bool) -> [a] -> Bool
forall _ [] = True
forall pred (x : xs) | pred x = forall pred xs
| otherwise = False
none _ [] = True
none pred (x : xs) | pred x = False
| otherwise = none pred xs
all
:
time 327.8 μs (322.4 μs .. 333.0 μs)
0.997 R² (0.996 R² .. 0.998 R²)
mean 328.7 μs (324.1 μs .. 334.2 μs)
std dev 16.95 μs (14.63 μs .. 22.02 μs)
and forall
:
time 113.2 μs (111.2 μs .. 115.0 μs)
0.997 R² (0.996 R² .. 0.998 R²)
mean 112.0 μs (110.0 μs .. 113.9 μs)
std dev 6.333 μs (5.127 μs .. 7.896 μs)
Performance measured using criterion's nf
.
As anticipated, I did not reinvent the fold, but had underestimated compiler flags, and naively did not expect -O2
to make such drastic overall difference compared to default optimization level performance, nor the optimization effectiveness disparity between the individual custom-wrote and library formulations. Many highly effective specialized standard function optimizations evidently kick in only when explicitly enabled.
The "performance" section of the Haskell tag info emphasizes importance of optimization level compiler flags when testing code efficiency. It is generally advised to trust the sophistication of library function implementations, and, rather than rewire RULES
pragmas or reformulate basic forms, to try exploiting already cultivated optimization potential.
回答1:
I find it instructive to re-implement any
in the various ways:
import Prelude hiding (any)
import Criterion.Main
import Data.Foldable (foldMap)
import Data.Monoid
Your exists
:
exists :: (a -> Bool) -> [a] -> Bool
exists _ [] = False
exists pred (x : xs)
= if pred x
then True
else exists pred xs
A version using (||)
:
existsOr :: (a -> Bool) -> [a] -> Bool
existsOr _ [] = False
existsOr pred (x : xs) = pred x || existsOr pred xs
Using foldr
:
any :: (a -> Bool) -> [a] -> Bool
any pred = foldr ((||) . pred) False
Using foldr
and Any
:
anyF :: (a -> Bool) -> [a] -> Bool
anyF pred = getAny . foldr (mappend . (Any . pred)) mempty
Using foldMap
and Any
:
anyFM :: (a -> Bool) -> [a] -> Bool
anyFM pred = getAny . foldMap (Any . pred)
Benchmarks with ghc -O0
:
benchmarking exists
time 1.552 μs (1.504 μs .. 1.593 μs)
0.989 R² (0.983 R² .. 0.993 R²)
mean 1.482 μs (1.427 μs .. 1.545 μs)
std dev 196.1 ns (168.8 ns .. 229.2 ns)
variance introduced by outliers: 93% (severely inflated)
benchmarking existsOr
time 2.699 μs (2.616 μs .. 2.768 μs)
0.992 R² (0.988 R² .. 0.995 R²)
mean 2.629 μs (2.554 μs .. 2.704 μs)
std dev 277.8 ns (235.8 ns .. 351.1 ns)
variance introduced by outliers: 89% (severely inflated)
benchmarking any
time 5.551 μs (5.354 μs .. 5.777 μs)
0.990 R² (0.986 R² .. 0.995 R²)
mean 5.553 μs (5.395 μs .. 5.750 μs)
std dev 584.2 ns (447.5 ns .. 835.5 ns)
variance introduced by outliers: 88% (severely inflated)
benchmarking anyF
time 7.330 μs (7.081 μs .. 7.612 μs)
0.988 R² (0.982 R² .. 0.994 R²)
mean 7.502 μs (7.272 μs .. 7.762 μs)
std dev 848.2 ns (712.6 ns .. 1.022 μs)
variance introduced by outliers: 89% (severely inflated)
benchmarking anyFM
time 5.668 μs (5.451 μs .. 6.008 μs)
0.987 R² (0.975 R² .. 0.996 R²)
mean 5.807 μs (5.659 μs .. 5.975 μs)
std dev 542.5 ns (446.4 ns .. 721.8 ns)
variance introduced by outliers: 86% (severely inflated)
Your version (exists
) is indeed the fastest, and the foldr
versions are rather slow.
With ghc -O2
, your version (exists
) is the slowest, and all other functions are nearly equally fast to each other:
benchmarking exists
time 753.5 ns (725.4 ns .. 779.9 ns)
0.990 R² (0.986 R² .. 0.995 R²)
mean 762.4 ns (737.0 ns .. 787.0 ns)
std dev 82.47 ns (66.79 ns .. 105.1 ns)
variance introduced by outliers: 91% (severely inflated)
benchmarking existsOr
time 491.5 ns (478.2 ns .. 503.2 ns)
0.994 R² (0.992 R² .. 0.996 R²)
mean 494.5 ns (481.1 ns .. 512.9 ns)
std dev 54.97 ns (42.54 ns .. 80.34 ns)
variance introduced by outliers: 92% (severely inflated)
benchmarking any
time 461.2 ns (442.0 ns .. 479.7 ns)
0.989 R² (0.985 R² .. 0.993 R²)
mean 456.0 ns (439.3 ns .. 476.3 ns)
std dev 60.04 ns (47.27 ns .. 89.47 ns)
variance introduced by outliers: 94% (severely inflated)
benchmarking anyF
time 436.9 ns (415.8 ns .. 461.0 ns)
0.978 R² (0.967 R² .. 0.988 R²)
mean 450.8 ns (430.1 ns .. 472.6 ns)
std dev 70.64 ns (57.04 ns .. 85.92 ns)
variance introduced by outliers: 96% (severely inflated)
benchmarking anyFM
time 438.9 ns (426.9 ns .. 449.5 ns)
0.993 R² (0.989 R² .. 0.996 R²)
mean 435.8 ns (421.4 ns .. 447.6 ns)
std dev 45.32 ns (36.73 ns .. 58.74 ns)
variance introduced by outliers: 90% (severely inflated)
If one looks into the simplified Core code (ghc -O2 -ddump-simpl
), one sees that there are no foldr
s anymore (with -O0
, everything is still in there, fold
s included).
I would thus venture to say that your code is faster (in the un-optimized version, -O0
) than the library code, because it is simpler (for the potential price of being less general). The optimized library code is faster than your version, since it is written in a way that its optimization potential is recognized by the compiler. (admittedly, this is a bit of guess work)
来源:https://stackoverflow.com/questions/42248592/haskell-ghc-performance-of-any-all