After reviewing the Intel Digital Random Number Generator (DRNG) Software Implementation Guide, I have a few questions about what happens to the internal state of the genera
Don't read anything into the 4*128 bit FIFO in the DRNG output. It is certainly there (I put it there) but it isn't something that has a software visible effect. The logic behind the DRNG doesn't produce data smoothly. It sometime schedules other things, like reseeding or conditioning, as per the SP800-90 spec. So the flow of data under load is irregular.
The buffer length of 4 was chosen because at 800MBytes/s (the speed of the locally attached bus) 4 is deep enough to prevent underflow when pulling at the maximum rate, given the worst case scheduling excursion, so there is a constant, smooth 800MByte/s supply with no interruption in the output.
If the attached bus was slower, the buffer would be shorter because a shorter buffer would be sufficient to prevent underflow.