What are the exhaustion characteristics of RDRAND on Ivy Bridge?

前端未结

关注

 3  796

After reviewing the Intel Digital Random Number Generator (DRNG) Software Implementation Guide, I have a few questions about what happens to the internal state of the genera

相关标签:

3条回答

伪装坚强ぢ

2021-01-04 09:41

Don't read anything into the 4*128 bit FIFO in the DRNG output. It is certainly there (I put it there) but it isn't something that has a software visible effect. The logic behind the DRNG doesn't produce data smoothly. It sometime schedules other things, like reseeding or conditioning, as per the SP800-90 spec. So the flow of data under load is irregular.

The buffer length of 4 was chosen because at 800MBytes/s (the speed of the locally attached bus) 4 is deep enough to prevent underflow when pulling at the maximum rate, given the worst case scheduling excursion, so there is a constant, smooth 800MByte/s supply with no interruption in the output.

If the attached bus was slower, the buffer would be shorter because a shorter buffer would be sufficient to prevent underflow.

0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2021-01-04 09:57

Regarding 2: http://download.intel.com/products/processor/manual/253665.pdf, 7.3.17

The CF indicates that the demand for random data exceeds the throughput of the DRNG.

Regarding 1:

If it is performance you are concerned about, why not read 64bit random value from the DRNG, then you can read 2bits from that 32 times, before you need to call the instruction again. You don't have to invoke new rdrand every time you need to bits.

0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2021-01-04 10:04

Part 1. Does it make a difference pulling 16, 32 or 64 bits?

No.

On Ivy Bridge, the CPU cores pull 64 bits over the internal communication links to the DRNG, regardless of the size of the destination register. So if you read 32 bits, it pulls 64 bits and throws away the top half. If you read 16 bits, it pulls 64 and throws away the top 3/4.

This is not described in the instruction documentation because it may not continue to be true in future products. A chip might be designed which stashes and uses the unused parts of the 64 bit word. However there isn't a significant performance imperative to do this today.

For the highest throughput, the most effective strategy is to pull from parallel threads. This is because there is parallelism in the bus hierarchy on chip. Most of the time for the instruction is transit time across the buses. Performing that transit in parallel is going to yield a linear increase in throughput with the number of threads, up to the maximum of 800MBytes/s. The second thing is to use 64-bit RdRands, because they get more data per instruction.

Part 2. What does CF=0 mean really?

It means 'random data not available'. This is because the details of why it can't get a number are not available to the CPU core without it going off and reading more registers, which it isn't going to do because there is nothing it can do with the information.

If you sucked the output buffer of the DRNG dry, you would get an underflow (CF=0) but you could expect the next RdRand to succeed, because the DRNG is fast.

If the DRNG failed (e.g. a transistor popped in the entropy source and it no longer was random) then the online health tests would detect this and shut down the DRNG. Then all your RdRand invocations would yield CF=0.

However on Ivy Bridge, you will not be able to underflow the buffer. The DRNG is a little faster than the bus to which it is attached. The effect of pulling more data per unit time (with parallel threads) will be to increase the execution time of each individual RdRand as contention on the bus causes the instructions to have to wait in line at the DRNG's local bus. You can never pull so fast the the DRNG will underflow. You will asymptotically reach 800 MBytes/s.

This also is not described in the documentation because it may not continue to be true in future products. We can envisage products where the buses are faster and the cores faster and the DRNG would be able to be underflowed. These things are not known yet, so we can't make claims about them.

What will remain true is that the basic loop (try up to 10 times, then report a failure up the stack) given in the software implementors guide will continue to work in future products, because we've made the claim that it will and so we will engineer all future products to meet this.

So no, CF=0 cannot occur because "the buffers happen to be (transiently) empty when RDRAND is invoked" on Ivy Bridge, but it might occur on future silicon, so design your software to cope.

0 讨论(0)
发布评论:

提交评论
- 加载中...