I am porting some code I wrote to NEON using inline assembly.
One of the things I need is to convert byte values ranging [0..128] to other byte values in a table which t
The proper sequence is through
vtbl d0, { d2,d3,d4,d5 }, d1 // first value
vsub d1, d1, d31 // decrement index
vtbx d0, { d6,d7,d8,d9 }, d1 // all the subsequent values
vsub d1, d1, d31 // decrement index
vtbx d0, { q5,q6 }, d1 // q5 = d10,d11
vsub d1, d1, d31
vtbx d0, { q7,q8 }, d1
The difference between vtbl and vtbx is that vtbl
zeroes the element d0, when d1 >= 32, where as vtbx leaves the original value in d0 intact. Thus there's no need for the trickery as in my comment and no need to merge the partial values.