ARM NEON: How to implement a 256bytes Look Up table

后端 未结 1 1024

I am porting some code I wrote to NEON using inline assembly.

One of the things I need is to convert byte values ranging [0..128] to other byte values in a table which t

1条回答
  •  一个人的身影
    2021-02-04 12:04

    The proper sequence is through

    vtbl d0, { d2,d3,d4,d5 }, d1   // first value
    vsub d1, d1, d31               // decrement index
    vtbx d0, { d6,d7,d8,d9 }, d1   // all the subsequent values
    vsub d1, d1, d31               // decrement index
    vtbx d0, { q5,q6 }, d1         // q5 = d10,d11
    vsub d1, d1, d31
    vtbx d0, { q7,q8 }, d1
    

    The difference between vtbl and vtbx is that vtbl zeroes the element d0, when d1 >= 32, where as vtbx leaves the original value in d0 intact. Thus there's no need for the trickery as in my comment and no need to merge the partial values.

    0 讨论(0)
提交回复
热议问题