Suppose I have an array:
uint8_t arr[256];
and an element
__m128i x
containing 16 bytes,
This kind of capability in SIMD architectures is known as load/store scatter/gather. Unfortunately SSE does not have it. Future SIMD architectures from Intel may have this - the ill-fated Larrabee processor was one case in point. For now though you will just need to design your data structures in such a way that this kind of functionality is not needed.
Note that you can achieve the equivalent effect by using e.g. _mm_set_epi8:
y = _mm_set_epi8(arr[x_16], arr[x_15], arr[x_14], ..., arr[x_1]);
although of course this will just generate a bunch of scalar code to load your y vector. This is fine if you are doing this kind of operation outside any performance-critical loops, e.g. as part of initialisation prior to looping, but inside a loop it is likely to be a performance-killer.