Left-shift (of float32 array) with AVX2 and filling up with a zero

问题

I have been using the following "trick" in C code with SSE2 for single precision floats for a while now:

static inline __m128 SSEI_m128shift(__m128 data)
{
    return (__m128)_mm_srli_si128(_mm_castps_si128(data), 4);
}

For data like [1.0, 2.0, 3.0, 4.0], it results in [2.0, 3.0, 4.0, 0.0], i.e. it does a left shift by one position and fills the data structure with a zero. If I remember correctly, the above inline function compiles down to a single instruction (with gcc at least).

I am somehow failing to wrap my head around doing the same with AVX2. How could I achieve this in an efficient manner?