I am thinking on how to implement the conversion of an integer (4byte, unsigned) to string with SSE instructions. The usual routine is to divide the number and store it in a loc
The first step to optimizing your code is getting rid of the arbitrary base support. This is because dividing by a constant is almost surely multiplication, but dividing by base
is division, and because '0'+n
is faster than "0123456789abcdef"[n]
(no memory involved in the former).
If you need to go beyond that, you could make lookup tables for each byte in the base you care about (e.g. 10), then vector-add the (e.g. decimal) results for each byte. As in:
00 02 00 80 (input)
0000000000 (place3[0x00])
+0000131072 (place2[0x02])
+0000000000 (place1[0x00])
+0000000128 (place0[0x80])
==========
0000131200 (result)