char[] to hex string exercise

前端未结

关注

 16  746

Below is my current char* to hex string function. I wrote it as an exercise in bit manipulation. It takes ~7ms on a AMD Athlon MP 2800+ to hexify a 10 million byte array. Is

相关标签:

16条回答

长发绾君心

2021-01-12 19:47

I assume this is Windows+IA32.
Try to use short int instead of the two hexadecimal letters.

short int hex_table[256] = {'0'*256+'0', '1'*256+'0', '2'*256+'0', ..., 'E'*256+'F', 'F'*256+'F'};
unsigned short int* pszHex = &str[0];

stick = clock();

for (const unsigned char* pChar = _pArray; pChar != pEnd; pChar++) 
    *pszHex++ = hex_table[*pChar];

etick = clock();

0 讨论(0)

南笙

2021-01-12 19:49
At the cost of more memory you can create a full 256-entry table of the hex codes:
```
static const char _hex2asciiU_value[256][2] =
    { {'0','0'}, {'0','1'}, /* ..., */ {'F','E'},{'F','F'} };
```
Then direct index into the table, no bit fiddling required.
```
const char *pHexVal = pHex[*pChar];
pszHex[0] = pHexVal[0];
pszHex[1] = pHexVal[1];
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2021-01-12 19:50

I have found that using an index into an array, rather than a pointer, can speed things up a tick. It all depends on how your compiler chooses to optimize. The key is that the processor has instructions to do complex things like [i*2+1] in a single instruction.

0 讨论(0)
发布评论:

提交评论
- 加载中...

星月不相逢

2021-01-12 19:50

Consistently getting ~4ms on my Athlon 64 4200+ (~7ms with original code)

for( const unsigned char* pChar = _pArray; pChar != pEnd; pChar++) {
    const char* pchars = _hex2asciiU_value[*pChar];
    *pszHex++ = *pchars++;
    *pszHex++ = *pchars;
}

0 讨论(0)

死守一世寂寞

2021-01-12 19:53
If you're rather obsessive about speed here, you can do the following:

Each character is one byte, representing two hex values. Thus, each character is really two four-bit values.

So, you can do the following:
1. Unpack the four-bit values to 8-bit values using a multiplication or similar instruction.
2. Use pshufb, the SSSE3 instruction (Core2-only though). It takes an array of 16 8-bit input values and shuffles them based on the 16 8-bit indices in a second vector. Since you have only 16 possible characters, this fits perfectly; the input array is a vector of 0 through F characters, and the index array is your unpacked array of 4-bit values.
Thus, in a single instruction, you will have performed 16 table lookups in fewer clocks than it normally takes to do just one (pshufb is 1 clock latency on Penryn).

So, in computational steps:
1. A B C D E F G H I J K L M N O P (64-bit vector of input values, "Vector A") -> 0A 0B 0C 0D 0E 0F 0G 0H 0I 0J 0K 0L 0M 0N 0O 0P (128-bit vector of indices, "Vector B"). The easiest way is probably two 64-bit multiplies.
2. pshub [0123456789ABCDEF], Vector B
0 讨论(0)
发布评论:

提交评论
- 加载中...

情书的邮戳

2021-01-12 19:54

This assembly function (based off my previous post here, but I had to modify the concept a bit to get it to actually work) processes 3.3 billion input characters per second (6.6 billion output characters) on one core of a Core 2 Conroe 3Ghz. Penryn is probably faster.

%include "x86inc.asm"

SECTION_RODATA
pb_f0: times 16 db 0xf0
pb_0f: times 16 db 0x0f
pb_hex: db 48,49,50,51,52,53,54,55,56,57,65,66,67,68,69,70

SECTION .text

; int convert_string_to_hex( char *input, char *output, int len )

cglobal _convert_string_to_hex,3,3
    movdqa xmm6, [pb_f0 GLOBAL]
    movdqa xmm7, [pb_0f GLOBAL]
.loop:
    movdqa xmm5, [pb_hex GLOBAL]
    movdqa xmm4, [pb_hex GLOBAL]
    movq   xmm0, [r0+r2-8]
    movq   xmm2, [r0+r2-16]
    movq   xmm1, xmm0
    movq   xmm3, xmm2
    pand   xmm0, xmm6 ;high bits
    pand   xmm2, xmm6
    psrlq  xmm0, 4
    psrlq  xmm2, 4
    pand   xmm1, xmm7 ;low bits
    pand   xmm3, xmm7
    punpcklbw xmm0, xmm1
    punpcklbw xmm2, xmm3
    pshufb xmm4, xmm0
    pshufb xmm5, xmm2
    movdqa [r1+r2*2-16], xmm4
    movdqa [r1+r2*2-32], xmm5
    sub r2, 16
    jg .loop
    REP_RET

Note it uses x264 assembly syntax, which makes it more portable (to 32-bit vs 64-bit, etc). To convert this into the syntax of your choice is trivial: r0, r1, r2 are the three arguments to the functions in registers. Its a bit like pseudocode. Or you can just get common/x86/x86inc.asm from the x264 tree and include that to run it natively.

P.S. Stack Overflow, am I wrong for wasting time on such a trivial thing? Or is this awesome?

0 讨论(0)

1 2 3 下一页