问题
Using C#'s Vector<T>
, how can we most efficiently vectorize the operation of finding the index of a particular element in a set?
As constraints, the set will always be a Span<T>
of an integer primitive, and it will contain at most 1 matching element.
I have come up with a solution that seems alright, but I'm curious if we can do better. Here is the approach:
- Create a
Vector<T>
consisting only of the target element, in each slot. - Use
Vector.Equals()
between the input set vector and the vector from the previous step, to get a mask containing a 1 in the single matching slot (or only 0s if there is no match). - Using a pre-initialized vector containing 1-based indexes (1, 2, 3, 4, ...), call
Vector.Dot()
between that vector and the mask from the previous step. Each index will be multiplied by 0, except the potential matching index, which will be multiplied by 1. What we get back is the sum of those multiplications, which is either 0, or the 1-based index of the matching element. If the result was 0, return -1 for no match. Otherwise, subtract one from the result to make it 0-based, and return that.
// One-time initialized vector containing { 1, 2, 3, 4, ... } Vector<ushort> indexes = MemoryMarshal.Cast<ushort, Vector<ushort>>(Enumerable.Range(1, Vector<ushort>.Count).Select(index => (ushort)index).ToArray())[0]; // The input set and the element to search for Span<ushort> set = stackalloc ushort[]{ 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 }; ushort element = 22; // Interpret input set as a sequence of vectors (set is assumed to have length power of two for brevity) var setVectors = MemoryMarshal.Cast<ushort, Vector<ushort>>(set); // Create a vector that contains the target element in each slot var elementVector = new Vector<ushort>(element); // Loop per vector rather than per element foreach (var vector in setVectors) { // Get a mask that has a 1 in the single matching slot, or only 0s var mask = Vector.Equals(vector, elementVector); // Get the dot product of the mask and the indexes // This will multiple each index by 0, or by 1 if it is the matching one, and return their sum, i.e. the matching index or 0 // Note that the indexes are deliberately 1-based, to distinguished from 0 (no match) var index = Vector.Dot(indexes, mask); // Either return 0 for no match, or reduce the index by 1 to get the 0-based index return index == 0 ? -1 : index - 1; }
回答1:
The x86 asm you want the compiler to generate is compare-for-equal (pcmpeqb
), pmovmskb
or movmskps (vector to bitmask with 1-byte or 4-byte elements) and then if the mask is non-zero, bit-scan for first set bit (bsf
or tzcnt
).
That'll be more efficient than an integer dot product!!
You already have the compare-for-equal, and I think I've seen other C# Q&As with an intrinsic for vector->bitmap. If someone wants to edit this answer or post their own with C# that compiles / JITs to this asm, please do so. I don't know C#, I'm just here for the x86 SIMD.
回答2:
As I can see the simple Span<char>.IndexOf already is using Intrinsics for searching a simple value. You don't even need to cast to char to use it, since MemoryExtensions.IndexOf
only cares about size and Unsafe.SizeOf<ushort>() == sizeof(char)
!
Also in JsonReaderHelper.IndexOfOrLessThan you will find a more complex Vectorization example for searching. It is using byte searching, but I am sure that you can adapt it to your needs if Span<ushort>.IndexOf
is not suitable.
来源:https://stackoverflow.com/questions/56955433/use-c-sharp-vectort-simd-to-find-index-of-matching-element