Explaining the different types in Metal and SIMD

前端 未结 2 1965
别那么骄傲
别那么骄傲 2021-02-08 08:33

When working with Metal, I find there\'s a bewildering number of types and it\'s not always clear to me which type I should be using in which context.

In Apple\'s Metal

2条回答
  •  失恋的感觉
    2021-02-08 08:53

    which type should you use to represent two floating values (x/y)

    If you can avoid it, don't use a single SIMD vector to represent a single geometry x,y vector if you're using CPU SIMD.

    CPU SIMD works best when you have many of the same thing in each SIMD vector, because they're actually stores in 16-byte or 32-byte vector registers where "vertical" operations between two vectors are cheap (packed add or multiply), but "horizontal" operations can mostly only be done with a shuffle + a vertical operation.

    For example a vector of 4 x values and another vector of 4 y values lets you do 4 dot-products or 4 cross-products in parallel with no shuffling, so the overall throughput is significantly more dot-products per clock cycle than if you had a vector of [x1, y1, x2, y2].

    See https://stackoverflow.com/tags/sse/info, and especially these slides: SIMD at Insomniac Games (GDC 2015) for more about planning your data layout and program design for doing many similar operations in parallel instead of trying to accelerate single operations.


    The one exception to this rule is if you're only adding / subtracting to translate coordinates, because that's still purely a vertical operation even with an array-of-structs. And thus fine for CPU short-vector SIMD based on 16-byte vectors. (e.g. the 2nd element in one vector only interacts with the 2nd element in another vector, so no shuffling is needed.)


    GPU SIMD is different, and I think has no problem with interleaved data. I'm not a GPU expert.

    (I don't use Objective C or Metal, so I can't help you with the details of their type names, just what the underlying CPU hardware is good at. That's basically the same for x86 SSE/AVX, ARM NEON / AArch64 SIMD, or PowerPC Altivec. Horizontal operations are slower.)

提交回复
热议问题