Interview Question: Find Median From Mega Number Of Integers

后端 未结 9 1143
暖寄归人
暖寄归人 2020-12-12 14:02

There is a file that contains 10G(1000000000) number of integers, please find the Median of these integers. you are given 2G memory to do this. Can anyone come up with an re

9条回答
  •  醉梦人生
    2020-12-12 14:22

    My best guess that probabilistic median of medians would be the fastest one. Recipe:

    1. Take next set of N integers (N should be big enough, say 1000 or 10000 elements)
    2. Then calculate median of these integers and assign it to variable X_new.
    3. If iteration is not first - calculate median of two medians:

      X_global = (X_global + X_new) / 2

    4. When you will see that X_global fluctuates not much - this means that you found approximate median of data.

    But there some notes :

    • question arises - Is median error acceptable or not.
    • integers must be distributed randomly in a uniform way, for solution to work

    EDIT: I've played a bit with this algorithm, changed a bit idea - in each iteration we should sum X_new with decreasing weight, such as:

    X_global = k*X_global + (1.-k)*X_new :

    k from [0.5 .. 1.], and increases in each iteration.

    Point is to make calculation of median to converge fast to some number in very small amount of iterations. So that very approximate median (with big error) is found between 100000000 array elements in only 252 iterations !!! Check this C experiment:

    #include 
    #include 
    #include 
    
    #define ARRAY_SIZE 100000000
    #define RANGE_SIZE 1000
    
    // probabilistic median of medians method
    // should print 5000 as data average
    // from ARRAY_SIZE of elements
    int main (int argc, const char * argv[]) {
        int iter = 0;
        int X_global = 0;
        int X_new = 0;
        int i = 0;
        float dk = 0.002;
        float k = 0.5;
        srand(time(NULL));
    
        while (i0) {
                k += dk;
                k = (k>1.)? 1.:k;
                X_global = k*X_global+(1.-k)*X_new;
    
            }
            else {
                X_global = X_new;
            }
    
            i+=RANGE_SIZE+1;
            iter++;
            printf("iter %d, median = %d \n",iter,X_global);
        }
    
        return 0;
    
    }
    

    Opps seems i'm talking about mean, not median. If it is so, and you need exactly median, not mean - ignore my post. In any case mean and median are very related concepts.

    Good luck.

提交回复
热议问题