I saw an answer to this question, which, in it’s first revision, had code similar to this:
let numbers = Array(0 ..< 50)
let result = numbers.lazy
.f
The first culprit here is the slicing of the lazy filter collection through the use of prefix(_:)
– which, in this case, returns a BidirectionalSlice
of the LazyFilterBidirectionalCollection
.
In general, the slicing of a Collection
entails the storing of the base collection, along with the range of indices that are valid for the slice to ‘view’. Therefore, in order to create a slice of a LazyFilterBidirectionalCollection
to view the first 5 elements, the range of indices stored must be startIndex ..< indexAfterTheFifthElement
.
In order to get indexAfterTheFifthElement
, the LazyFilterBidirectionalCollection
must iterate through the base collection (numbers
) in order to find the 6th element that meets the predicate (you can see the exact implementation of the indexing here).
Therefore all the elements in the range 0...15 from the above example need to be checked against the predicate simply in order to create a slice of the lazy filter collection.
The second culprit is Array
’s init(_:)
, which accepts a Sequence
of elements of the same type as the array’s Element
type. The implementation of this initialiser calls _copyToContiguousArray()
on the sequence, which, for most sequences, forwards the call to this function:
internal func _copySequenceToContiguousArray<S : Sequence>
(_ source: S) -> ContiguousArray<S.Iterator.Element> {
let initialCapacity = source.underestimatedCount // <- problem here
var builder =
_UnsafePartiallyInitializedContiguousArrayBuffer<S.Iterator.Element>(
initialCapacity: initialCapacity)
var iterator = source.makeIterator()
// FIXME(performance): use _copyContents(initializing:).
// Add elements up to the initial capacity without checking for regrowth.
for _ in 0..<initialCapacity {
builder.addWithExistingCapacity(iterator.next()!)
}
// Add remaining elements, if any.
while let element = iterator.next() {
builder.add(element)
}
return builder.finish()
}
The problem here is underestimatedCount
. For plain sequences, this just has a default implementation that returns 0 – however, for collections, it has a default implementation which gets the count
of the collection (I go into this in more detail here).
The default implementation for the count
of a Collection
(which BidirectionalSlice
will use here) is simply:
public var count: IndexDistance {
return distance(from: startIndex, to: endIndex)
}
Which, for our slice, will walk through the indices up to indexAfterTheFifthElement
, thus re-evaluating the elements in the range 0...15, again.
Finally, an iterator of the slice is made, and is iterated through, adding each element in the sequence to the new array’s buffer. For a BidirectionalSlice
, this will use an IndexingIterator
, which simply works by advancing indices and outputting the element for each index.
The reason why this walk over the indices doesn’t re-evaluate the elements up to the first element of the result (note in the question’s example, 0 is evaluated one time less) is due to the fact that it doesn’t directly access the startIndex
of the LazyFilterBidirectionalCollection
, which has to evaluate all elements up to the first element in the result. Instead, the iterator can work from the start index of the slice itself.
The simple solution is to avoid slicing the lazy filter collection in order to get its prefix, but instead apply the prefixing lazily.
There are actually two implementations of prefix(_:)
. One is provided by Collection, and returns a SubSequence
(which is some form of slice for most standard library collections).
The other is provided by Sequence, that returns an AnySequence
– which, under the hood uses a base sequence of _PrefixSequence, which simply takes an iterator and allows iteration through it until a given number of elements have been iterated over – then just stops returning elements.
For lazy collections, this implementation of prefix(_:)
is great, as it doesn’t require any indexing – it just lazily applies the prefixing.
Therefore, if you say:
let result : AnySequence = numbers.lazy
.filter {
// gets called 1x per element :)
print("Calling filter for: \($0)")
return $0 % 3 == 0
}
.prefix(5)
The elements of numbers
(up to the 5th match) will only be evaluated once by filter(_:)
’s predicate when passed to Array
's initialiser, as you’re forcing Swift to use Sequence
’s default implementation of prefix(_:)
.
The foolproof way of preventing all indexing operations on a given lazy filter collection is to simply use a lazy filter sequence instead – this can be done by just wrapping the collection you wish to perform lazy operations on in an AnySequence
:
let numbers = Array(0 ..< 50)
let result = AnySequence(numbers).lazy
.filter {
// gets called 1x per element :)
print("Calling filter for: \($0)")
return $0 % 3 == 0
}
.dropFirst(5) // neither of these will do indexing,
.prefix(5) // but instead return a lazily evaluated AnySequence.
print(Array(result)) // [15, 18, 21, 24, 27]
However note that for bidirectional collections, this may have an adverse effect for operations on the end of the collection – as then the entire sequence will have to be iterated through in order to reach the end.
For such operations as suffix(_:)
and dropLast(_:)
, it may well be more efficient to work with a lazy collection over a sequence (at least for small inputs), as they can simply index from the end of the collection.
Although, as with all performance related concerns, you should first check to see if this is even a problem in the first place, and then secondly run your own tests to see which method is better for your implementation.
So, after all of that – the lesson to learn here is that you should be wary of the fact that slicing a lazy filter collection can re-evaluate every element of the base collection up to the end index which the slice can ‘view’.
Often it’s more desirable to treat a lazy filter collection as a sequence instead, which cannot be indexed, therefore meaning that lazy operations cannot evaluate any elements (doing so would risk destructively iterating over them) until an eager operation happens.
However, you should then be wary of the fact that you’re potentially sacrificing being able to index the collection from the end, which is important for operations such as suffix(_:)
.
Finally, it’s worth noting that this isn't a problem with lazy views such as LazyMapCollection
, as their elements don’t depend on the ‘results’ of previous elements – therefore they can be indexed in constant time if their base collection is a RandomAccessCollection
.