I have a problem for which I have eight elements that can contain 0, 1, or 2. I can easily represent this in 16 bits, but for SIMD efficiency reasons, I need it to occupy 13