I'm writing a routine to convert between BCD (4 bits per decimal digit) and Densely Packed Decimal (DPD) (10 bits per 3 decimal digits). DPD is further documented (with the suggestion for software to use lookup-tables) on Mike Cowlishaw's web site . This routine only ever requires the lower 16 bit of the registers it uses, yet for shorter instruction encoding I have used 32 bit instructions wherever possible. Is a speed penalty associated with code like: mov data,%eax # high 16 bit of data are cleared ... shl %al shr %eax or and $0x888,%edi # = 0000 a000 e000 i000 imul $0x0490,%di # = aei0