How can I determine the statistical randomness of a binary string?

前端 未结 4 1837
忘了有多久
忘了有多久 2020-12-30 16:04

How can I determine the statistical randomness of a binary string?

Ergo, how can I code my own test, and return a single value that corresponds to the statistical ra

相关标签:
4条回答
  • 2020-12-30 16:26

    Some time ago, I developed the a simple heuristic that worked for my purposes.

    You simply calculate the "even-ness" of 0s and 1s not only in the string itself, but also on derivatives of the string. For example, the first derivative of 01010101 is 11111111, because every bit changes, and the second derivative is 00000000, because no bit in the first derivative changes. Then you simply have to weigh these "even-nesses" according to your taste.

    Here is an example:

    #include <string>
    #include <algorithm>
    
    float variance(const std::string& x)
    {
        int zeroes = std::count(x.begin(), x.end(), '0');
        float total = x.length();
        float deviation = zeroes / total - 0.5f;
        return deviation * deviation;
    }
    
    void derive(std::string& x)
    {
        char last = *x.rbegin();
        for (std::string::iterator it = x.begin(); it != x.end(); ++it)
        {
            char current = *it;
            *it = '0' + (current != last);
            last = current;
        }
    }
    
    float randomness(std::string x)
    {
        float sum = variance(x);
        float weight = 1.0f;
        for (int i = 1; i < 5; ++i)
        {
            derive(x);
            weight *= 2.0f;
            sum += variance(x) * weight;
        }
        return 1.0f / sum;
    }
    
    int main()
    {
        std::cout << randomness("00000000") << std::endl;
        std::cout << randomness("01010101") << std::endl;
        std::cout << randomness("00000101") << std::endl;
    }
    

    Your example inputs yield a "randomness" of 0.129032, 0.133333 and 3.2 respectively.

    On a side note, you can get cool fractal graphics by deriving strings ;)

    int main()
    {
        std::string x = "0000000000000001";
        for (int i = 0; i < 16; ++i)
        {
            std::cout << x << std::endl;
            derive(x);
        }
    }
    
    0000000000000001
    1000000000000001
    0100000000000001
    1110000000000001
    0001000000000001
    1001100000000001
    0101010000000001
    1111111000000001
    0000000100000001
    1000000110000001
    0100000101000001
    1110000111100001
    0001000100010001
    1001100110011001
    0101010101010101
    1111111111111111
    
    0 讨论(0)
  • 2020-12-30 16:30

    You might try a compression algorithm on the string. The more repetition (less randomness) there is, the more the string can be compressed.

    0 讨论(0)
  • 2020-12-30 16:39

    You seem to be asking for a way to find the Kolmogorov complexity of a binary string. Sadly, this is incomputable. The size of your string after running it through a compression algorithm will give you an idea of how random it is, in that more random strings are less compressible.

    0 讨论(0)
  • 2020-12-30 16:42

    This will give you an entropy count from 0 to 1.0:

    You might want to try looking into the Shannon Entropy, which is a measure of entropy as applied to data and information. In fact, it is actually almost a direct analogue of Physical formula for entropy as defined by the most accepted interpretations of Thermodynamics.

    More specifically, in your case, with a binary string, you can see the Binary Entropy Function, which is a special case involving randomness in binary bits of data.

    This is calculated by

    H(p) = -p*log(p) - (1-p)*log(1-p)
    

    (logarithms in base 2; assume 0*log(0) is 0)

    Where p is your percentage of 1's (or of 0's; the graph is symmetrical, so your answer is the same either way)

    Here is what the function yields:

    Binary Entropy Function

    As you can see, if p is 0.5 (same amount of 1's as 0's), your entropy is at the maximum (1.0). If p is 0 or 1.0, the entropy is 0.

    This appears to be just what you want, right?

    The only exception is your Size 1 cases, which could just be put as an exception. However, 100% 0's and 100% 1's don't seem too entropic to me. But implement them as you'd like.

    Also, this does not take into account any "ordering" of the bits. Only the sum total of them. So, repetition/palindromes won't get any boost. You might want to add an extra heuristic for this.

    Here are your other case examples:

    00:   -0*log(0) - (1-0)*log(1-0)               = 0.0
    01:   -0.5*log(0.5) - (1-0.5)*log(1-0.5)       = 1.0
    010:  -(1/3)*log(1/3) - (2/3)*log(2/3)         = 0.92
    0100: -0.25*log(0.25) - (1-0.25)*log(1-0.25)   = 0.81
    
    0 讨论(0)
提交回复
热议问题