问题
I've heard about people using light sensors, geiger counters, and other physical sensors to generate random numbers, but I'm skeptical. Is there really a way to generate random numbers from taking measurements of the physical world (using an Arduino or any other microcontroller)? If so, would these numbers ever be really random?
to clarify: the question is about the feasibility of using microcontroller-gathered data to generate random numbers that could be applied soundly to cryptography-- an alternative to relying on a device's entropy.
回答1:
Taking analog "real world" measurements usually is a good source of entropy (a.k.a. real random data). Analog sources always have some unpredictable noise superimposed which can be "harvested". However, as was stated before, the measured data is rarely un-biased.
Analog electrical measurements may also be more or less susceptible to uncontrollable influence or even attacks from outside, e.g. by causing saturation of the sensor(s). EMI is also likely to interfere with the measurement; a regular cell phone placed reasonably close to the circuit during a call will most likely wreak havoc on any analog signals.
Bias
Un-biased, uniformly distributed numbers of high entropy are commonly those one wants, because their properties (not values) are somewhat normalized and can therefore be more reliably predicted.
When measuring analog input with, say, 10 bit resolution, ideally the range of numbers gathered from the measurement will cover all values from 0 to 1024 and each value will occur with the same frequency (or probability) as any other value from that range.
In reality, those values will often be (more or less) normally distributed (Gauss distributed) around some avarage value with some characteristic standard deviation, for example around 500 @ 10 bit per sample.
De-biasing
So, in order to generate random values with the desired properties (see above), some de-biasing needs to be done: A randomness extractor of some kind is needed.
Using cryptographic functions, like (one way) hash functions or cipher algorithms, usually easily produces the desired result as well; this comes at a cost though: The "mixing" of those functions is by design extremely strong, which makes it impossible to determine the quality (= randomness) of the source data after the transformation. Even the simplest deterministic sequences of values, like {1,2,3,4,5...}, when hashed produce data that will most likely pass any and all statistical randomness tests pretty well, although it is not "random" at all.
Generating entropy on a µC
Timing
In microcontroller environments a good source of entropy that is seldom thought of is the timing of events. Using a high-speed timer, true entropy can be gathered by reading the timer value in response to some asynchronous event. Such an event, which is uncorrelated with the running timer, may be the push of a button by a user, the start of communication initiated by another (sub-)system or IC, or basically any other event not triggered by the µC (or any synchronous subsystem) itself.
In fact, entropy can even be harvested from just two independent clock sources; for instance by timing cycles of one clock via the other clock. This opens a couple of very interesting possibilities on Atmel AVR µCs (which are used in the Arduino) depending on the µC's capabilities:
Most AVRs have internal EEPROM memory. Write operations to this memory are timed by a dedicated timer which is independent of the main system clock
(- reportedly there are some chips (not types!) where measurements indicated that this may not be the case)(edit: note that in some AVRs, ATTiny25/45/85 for example, the EEPROM timing is derived from the internal RC oscillator, so that no entropy can be gathered when this oscillator is also selected as the system clock source); this may depend on the main clock source (internal R/C vs. external crystal/resonator). Therefore, there is some (truly random) jitter to be expected in the time it takes to write to the EEPROM with respect to the main system clock, which again can be measured be a high-speed timer/counter.Newer AVRs have the capability to let the watchdog timer generate a software interrupt instead of a hardware reset. The watchdog timer is by design controlled by its own independent clock source, which yields the relative jitter one can measure.
Many AVRs have the capability to have a dedicated timer/counter be clocked from an external 32kHz crystal for improved accuracy of real-time measurements. This external crystal is another source of events uncorrelated with the main clock. (Otherwise there would be no use in the extra crystal in the first place...)
The latter seems to be promising for its potential of relatively high bandwidth: When timing each 32kHz clock cycle with a system timer running significantly faster (a factor of 600+ can be achieved on current AVRs @ 20 MHz!) and conservatively assuming only 1 bit of entropy per measurement, this results in 32000+ bits of entropy per second - far more than a µC will ever consume by itself.
EDIT: Meanwhile, I have conducted some simple tests of the 32kHz timer approach, and the short-term results seem to be of pretty low quality. The upper boundary on the generated entropy per sample seems to be really low, while I have not even tested the samples for non-obvious patterns originating from more or less regular phase shifts. This result may be due to the fact that my DUT had its main clock driven by an external crystal which may be expected to be (within the precison of the measurements) equally stable in frequency as the 32kHz quartz when observed over a limited time range. Extending the the time between taking two samples (minutes?) will probably return good entropy, yet at a very low bandwith. (N.b.: The jitter measured may also be partly due to varying interrupt latency depending on the machine instruction executed right at the time the interrupt is triggered.)
EDIT #2: It appears that the internal RC oscillator of my DUT (ATmega1284) produces significant frequency jitter (several kHz/s); running on this oscillator indeed seems to produce pretty much entropy (kBits/s) when timed by the external 32kHz crystal.
In a little experiment I recently investigated the former two methods. On my DUT the EEPROM timing would generally be advantageous over the WDT:
Timing the EEPROM write produced about 4.82 bits of entropy per write operation, while the watchdog timer seems more stable frequency-wise yielding about 3.92 bits per watchdog timeout. Additionally, the EEPROM's write times seem much more smoothly Gauss-distributed where the WDT's distribution seems somewhat asymmetric and with a lot of aberrations.
N.b.: Aggregating multiple "random" events for a single entropy measurement may actually degrade the entropy obtained: Fast, random fluctuations in the source may partially compensate each other, yielding result values with lower deviation from the mean value. So, instead of timing, for instance, one real time second (32k cycles of the RTC crystal) much more entropy can be expected from taking 32k timings (one for each cycle of the crystal) during the same time.
Uninitialized RAM
Avr-gcc compiled applications usually have the whole on-chip RAM cleared to 0x00 before executing user code, i.e. main()
. Putting code into an early .init
section provides access to the raw uninitialized RAM content before it is overwritten by gcc's initialization routines.
Due to miniscule variances in the RAM's physical storage cells (bits) and depending on some true random thermal noise (and other effects), not every cell will initialize itself to the same known state when power is (re-)applied to the chip. Combining the contents of the chip's RAM right after power up with some function can yield significant amounts of entropy to be used later. - The downside of this is that it will only work reliably when power has been turned off for some time and is then turned on again. A normal chip reset, by hardware, software, or external signal, will preserve the RAM's previous content and is therefore not (always) a good source of entropy. However, since the state of the whole system (RAM) at the time of the reset can hardly be predicted in a reasonably complex application some entropy may be gathered immediately after a reset anyway.
Bandwidth requirements
The quality of an entropy source has to be seen in relation to its bandwidth and the bandwidth of the use of entropy by the application. Some methods of entropy gathering may not yield more than one bit of entropy during some seconds time while others (not really on µCs...) may produce 100 kbit/s or more.
It must be noted that one cannot algorithmically "create" new entropy from existing entropy! - One bit of entropy cannot be computationally transformed to two bits of entropy.
Thus, one cannot (on avarage) consume more (real) entropy per time unit than what is gathered from the entropy source(s) in the same time.
Proposal
When in need of strong random numbers, it is not uncommon to combine one or more sources of real entropy with a strong PRNG, using the entropy gathered to (re-)seed the PRNG each time new entropy is available.
The PRNG can be used to generate much more basically unpredictable data than the entropy source would actually provide in the same time.
As a special kind of PRNG, cryptographic cipher functions can be used, where entropy is used to initialize and update the cipher's key.
Linux's /dev/urandom is commonly implemented this way.
Conclusion
As discussed above, it is quite feasible to generate truly random numbers on a common microcontroller. As for all other sources of entropy, one needs to analyze the raw numbers provided by the entropy source(s) for the amount of real entropy they contain and for the amount of entropy generated per time unit to determine if the source is suitable for the use case or not.
The combination of a true entropy source and a strong PRNG is the approach that is commonly implemented and which should be used on a microcontroller aswell.
EDIT:
The PRNG approach may not be the best choice for cryptographic key generation. For that one should only use truly random bits to produce a secure key. Gathering this amount of entropy may take some time (seconds maybe), but since key generation is usually not performed very frequently on a µC this may well be acceptable. (On a heavily loaded server witch hundreds or more SSL (HTTPS) connections per second this will be quite another issue...)
To produce quality high entropy bitstreams suitable for key generation a randomness extractor as mentioned above should be employed.
(On the other hand, if the amount of entropy in the source's output can be measured or estimated one may simply scale the key length by the factor of (bitlength of key)/(entropy per bit sampled)
and then use the raw low entropy data from the entropy source directly to generate this longer key, which will then have the same overall entropy as a fully random key of the original length. If this really does the trick depends, however, on how the cipher handles keys of different lenghts.)
回答2:
Depends on the range, sampling frequency and sensitivity of the sensor. You can consider the sensor measurements as a bitstrings or as floats, doesn't really matter. The point is, the most significant bits / the leading decimals are not very random, they might even hardly change. Likewise, the least significant bits are unreliable sources of randomness, since they might show a step effect due to limited sensor sensitivity, and depending on the sensor they might be strongly correlated in time (e.g., temperature or voltage will have a tendency to change gradually). The middle bits/figures however might very well be a source of true random values.
Suppose you have a sensor that outputs values in the range 0 to 200 , with a precision of 0.01. Let's say it's a pressure meter, maybe a decibel meter. You'd need to test this extensively and look at the distribution of values for your specific sensor and environment, but I think that the figures at the 10^0 and 10^-1 positions might very well be distributed uniformly and without discernible order.
Best suited are sensors that can make very precise measurements but have to deal with a high level of noise anyway. This might pose a problem since most sensors aren't designed to provide an unnecessary/unreliable level of precision. Also, measurements that are roughly the same always and everywhere (except for the noise) are to be preferred of course. Cosmic background radiation is a good example of this.
回答3:
I have been experimenting with hardware sources for random number generation, and so far they seem to provide an unpredictable source. If whitening techniques are used, any underlying biases are reduced/eliminated to allow for them to produce a uniform sequence of random integers.
If you would like to try and experiment, I have implemented a Arduino compatible library that uses the jitter associated with the watchdog timer to generate random numbers. Preliminary results indicate it is producing results suitable for cryptographic purposes.
Can be found at Code.google library repository
回答4:
Google: True Random Number Generator
That is how you do it, you create noise and sample it. You cannot create something truly random from something that is deterministic like a software algorithm.
来源:https://stackoverflow.com/questions/10864668/is-it-possible-to-generate-random-numbers-using-physical-sensors