I\'ve lately been using the SSE intrinsic int _mm_extract_epi8 (__m128i src, const int ndx)
that, according to the reference \"extracts an integer byte from a p
Just to summarize and close the question.
We discussed 3 options to extract a byte at index i in [0..15] from a _m128i sse
where i cannot be reduced to a literal at compile time:
1) Switch & _mm_extract_epi8
: have a switch
over i and a case for each i in [0..15] that does a _mm_extract_epi8(sse,i)
; works as i now is a compile-time literal.
2) Union hack: have a union SSE128i { __m128i sse; char[16] array; }
, initialize it as SSE128i sse = { _mm_loadu_si128(...) }
and access the byte at index i with sse.array[i]
.
3) Shuffle ith element to position 0 and _mm_extract_epi8
: use _mm_shuffle_epi8(sse,_mm_set1_epi8(i))
to shuffle the ith element to position 0; extract it with _mm_extract_epi8(sse,0)
.
Evaluation: I benchmarked the three options on an Intel Sandy Bridge and a AMD Bulldozer architecture. The switch option won by a small margin. If someone's interested I can post more detailed numbers and the benchmark setup.
Update: Evaluation
Benchmark setup: parse each byte of a 1GB file. For certain special bytes, increase a counter. Use _mm_cmpistri
to find the index of a special byte; then "extract" the byte using one of the three methods mentioned and do a case distinction in which the counters are incremented. Code was compiled using GCC 4.6 with -std=c++0x -O3 -march=native
.
For each method, the benchmark was run 25 times on a Sandy Bridge machine. Results (mean and std. dev. of running time in seconds):
Switch and extract: Mean: 1071.45 Standard deviation: 2.72006
Union hack: Mean: 1078.61 Standard deviation: 2.87131
Suffle and extract from position 0: Mean: 1079.32 Standard deviation: 2.69808
The differences are marginal. I haven't had a chance to look at the generated asm yet. Might be interesting to see the difference though. For now I can't release the full code of the benchmark as it contains non-public sources. If I have time I'll extract these and post the sources.