Find max/min occurrence in integer array

柔情痞子 提交于 2019-12-08 02:45:25

问题


I just finished writing an algorithm that finds values in an input integer array with max/min occurrences. My idea is to sort the array (all the occurrences are now in sequence) and use a <value:occurrences> pair to store for every value the number of occurrences correspondent.

It should be O(nlogn) complexity but I think that there are some constant multipliers. What can I do to improve performance?

#include <stdio.h>
#include <stdlib.h>
#include "e7_8.h"

#define N 20
/*Structure for <value, frequencies_count> pair*/
typedef struct {
    int value;
    int freq;
} VAL_FREQ;


void  get_freq(int *v, int n, int *most_freq, int *less_freq) {

    int v_i, vf_i, current_value, current_freq;

    VAL_FREQ* sp = malloc(n*sizeof(VAL_FREQ));
    if(sp == NULL) exit(EXIT_FAILURE);

    mergesort(v,n);

    vf_i = 0;
    current_value = v[0];
    current_freq = 1;
    for(v_i=1; v_i<n+1; v_i++) {
        if(v[v_i] == current_value) current_freq++;
        else{
            sp[vf_i].value = current_value;
            sp[vf_i++].freq = current_freq;
            current_value = v[v_i];
            current_freq = 1;
        }
    }
    /*Finding max,min frequency*/
    int i, max_freq_val, max_freq, min_freq_val, min_freq;

    max_freq = sp[0].freq;
    max_freq_val = sp[0].value;
    min_freq = sp[0].freq;
    min_freq_val = sp[0].value;
    for(i=1; i<vf_i; i++) {
        if(sp[i].freq > max_freq) {
            max_freq = sp[i].freq;
            max_freq_val = sp[i].value;
        }
        if(sp[i].freq < min_freq) {
            min_freq = sp[i].freq;
            min_freq_val = sp[i].value;
        }
    }

    *most_freq = max_freq_val;
    *less_freq = min_freq_val;

    free(sp);
}

回答1:


Let's start from the fact that your algorithm is already O(n*log(n)), as every step is O(n) apart the sorting which is O(n*log(n)). If it can be significantly improved depends on which kind of input you expect. Edit: Unless, and that appears to be the case, it is not part of the requirement having the values sorted (in any case by value, not by number of occurrences) at the end of the process, in which case do not miss Oli Charlesworth's answer.

There are 2 concept on the ground: the first is how many samples are you going to get (n); the second is "how concentrated" are their values, how narrow or wide is the range where these values can be distributed (w = MAX_VALUE - MIN_VALUE).

If n is smaller than w (so your values are sparse), than your approach is already optimal and has little space for improvement.

But if w is small and n is big, you have much to gain with the following method.

Let's say you know you cannot get any value less than MIN_VALUE, and no value more than MAX_VALUE. Then, you can use value as an index for an array where you collect your frequencies. In this way, you skip the sorting step (O(n*log(n)) ), and you compute your frequencies in O(n).

int buffer_frequencies[MAX_VALUE - MIN_VALUE + 1];

//Now reset the array with some convenient function like memset

int* value_frequencies = buffer_frequencies;
value_frequencies -= MIN_VALUE; //Shift the beginning of the array, so that 
                                //you can use the value directly as the array index
//You are allowed to use negative indexes
for(v_i=0; v_i < n; v_i++) {
  value_frequencies[v[v_i]]++;
  }

Or even (possibly slight faster version of the for cycle, but usually a good compiler will already convert it in the most efficient version):

int* p_v = v;
int* end_p_v = v+n;
for(; p_v < end_p_v; p_v++) {
  value_frequencies[*p_v]++;
  }

Be careful that this method (both versions) is very delicate to the input values, i.e. you will break memory boundaries if you get a value beyond MIN_VALUE or MAX_VALUE

Then the second part of the algorithm:

//First cycle could be optimized, but it has no impact
int i = MIN_VALUE;
max_freq = value_frequencies[i];
max_freq_val = i;
min_freq = value_frequencies[i];
min_freq_val = i;
for(; i<MAX_VALUE; i++) {
    max_freq_val = (value_frequencies[i] > max_freq) ? i : max_freq_val;
    max_freq = (value_frequencies[i] > max_freq) ? value_frequencies[i] : max_freq;
    min_freq_val = (value_frequencies[i] < min_freq) ? i : min_freq_val;
    min_freq = (value_frequencies[i] < min_freq) ? value_frequencies[i] : min_freq;
    }
}



回答2:


Use a hash-table to implement a key-value map? That should give you O(n) expected time.*


* However, note that it's O(n2) in the worst-case. This only occurs when all entries hash to the same bucket, and you effectively end up searching a linked-list for every iteration! For decent hash-table implementation, the probability of this occurring is very low indeed.

来源:https://stackoverflow.com/questions/17307097/find-max-min-occurrence-in-integer-array

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!