An interview question: About Probability

前端 未结 10 1475
清酒与你
清酒与你 2021-01-29 21:14

An interview question:

Given a function f(x) that 1/4 times returns 0, 3/4 times returns 1. Write a function g(x) using f(x) that 1/2 times returns 0, 1/2 times returns

10条回答
  •  天涯浪人
    2021-01-29 21:31

    A refinement of the same approach used in btilly's answer, achieving an average ~1.85 calls to f() per g() result (further refinement documented below achieves ~1.75, tbilly's ~2.6, Jim Lewis's accepted answer ~5.33). Code appears lower in the answer.

    Basically, I generate random integers in the range 0 to 3 with even probability: the caller can then test bit 0 for the first 50/50 value, and bit 1 for a second. Reason: the f() probabilities of 1/4 and 3/4 map onto quarters much more cleanly than halves.


    Description of algorithm

    btilly explained the algorithm, but I'll do so in my own way too...

    The algorithm basically generates a random real number x between 0 and 1, then returns a result depending on which "result bucket" that number falls in:

    result bucket      result
             x < 0.25     0
     0.25 <= x < 0.5      1
     0.5  <= x < 0.75     2
     0.75 <= x            3
    

    But, generating a random real number given only f() is difficult. We have to start with the knowledge that our x value should be in the range 0..1 - which we'll call our initial "possible x" space. We then hone in on an actual value for x:

    • each time we call f():
      • if f() returns 0 (probability 1 in 4), we consider x to be in the lower quarter of the "possible x" space, and eliminate the upper three quarters from that space
      • if f() returns 1 (probability 3 in 4), we consider x to be in the upper three-quarters of the "possible x" space, and eliminate the lower quarter from that space
      • when the "possible x" space is completely contained by a single result bucket, that means we've narrowed x down to the point where we know which result value it should map to and have no need to get a more specific value for x.

    It may or may not help to consider this diagram :-):

        "result bucket" cut-offs 0,.25,.5,.75,1
    
        0=========0.25=========0.5==========0.75=========1 "possible x" 0..1
        |           |           .             .          | f() chooses x < vs >= 0.25
        |  result 0 |------0.4375-------------+----------| "possible x" .25..1
        |           | result 1| .             .          | f() chooses x < vs >= 0.4375
        |           |         | .  ~0.58      .          | "possible x" .4375..1
        |           |         | .    |        .          | f() chooses < vs >= ~.58
        |           |         ||.    |    |   .          | 4 distinct "possible x" ranges
    

    Code

    int g() // return 0, 1, 2, or 3                                                 
    {                                                                               
        if (f() == 0) return 0;                                                     
        if (f() == 0) return 1;                                                     
        double low = 0.25 + 0.25 * (1.0 - 0.25);                                    
        double high = 1.0;                                                          
    
        while (true)                                                                
        {                                                                           
            double cutoff = low + 0.25 * (high - low);                              
            if (f() == 0)                                                           
                high = cutoff;                                                      
            else                                                                    
                low = cutoff;                                                       
    
            if (high < 0.50) return 1;                                              
            if (low >= 0.75) return 3;                                              
            if (low >= 0.50 && high < 0.75) return 2;                               
        }                                                                           
    }
    

    If helpful, an intermediary to feed out 50/50 results one at a time:

    int h()
    {
        static int i;
        if (!i)
        {
            int x = g();
            i = x | 4;
            return x & 1;
        }
        else
        {
            int x = i & 2;
            i = 0;
            return x ? 1 : 0;
        }
    }
    

    NOTE: This can be further tweaked by having the algorithm switch from considering an f()==0 result to hone in on the lower quarter, to having it hone in on the upper quarter instead, based on which on average resolves to a result bucket more quickly. Superficially, this seemed useful on the third call to f() when an upper-quarter result would indicate an immediate result of 3, while a lower-quarter result still spans probability point 0.5 and hence results 1 and 2. When I tried it, the results were actually worse. A more complex tuning was needed to see actual benefits, and I ended up writing a brute-force comparison of lower vs upper cutoff for second through eleventh calls to g(). The best result I found was an average of ~1.75, resulting from the 1st, 2nd, 5th and 8th calls to g() seeking low (i.e. setting low = cutoff).

提交回复
热议问题