How to efficiently remove duplicates from an array without using Set

后端 未结 30 2415
情深已故
情深已故 2020-11-22 07:29

I was asked to write my own implementation to remove duplicated values in an array. Here is what I have created. But after tests with 1,000,000 elements it took very long ti

相关标签:
30条回答
  • 2020-11-22 07:50

    This is not using Set, Map, List or any extra collection, only two arrays:

    package arrays.duplicates;
    
    import java.lang.reflect.Array;
    import java.util.Arrays;
    
    public class ArrayDuplicatesRemover<T> {
    
        public static <T> T[] removeDuplicates(T[] input, Class<T> clazz) {
            T[] output = (T[]) Array.newInstance(clazz, 0);
            for (T t : input) {
                if (!inArray(t, output)) {
                    output = Arrays.copyOf(output, output.length + 1);
                    output[output.length - 1] = t;
                }
            }
            return output;
        }
    
        private static <T> boolean inArray(T search, T[] array) {
            for (T element : array) {
                if (element.equals(search)) {
                    return true;
                }
            }
            return false;
        }
    
    }
    

    And the main to test it

    package arrays.duplicates;
    
    import java.util.Arrays;
    
    public class TestArrayDuplicates {
    
        public static void main(String[] args) {
            Integer[] array = {1, 1, 2, 2, 3, 3, 3, 3, 4};
            testArrayDuplicatesRemover(array);
        }
    
        private static void testArrayDuplicatesRemover(Integer[] array) {
            final Integer[] expectedResult = {1, 2, 3, 4};
            Integer[] arrayWithoutDuplicates = ArrayDuplicatesRemover.removeDuplicates(array, Integer.class);
            System.out.println("Array without duplicates is supposed to be: " + Arrays.toString(expectedResult));
            System.out.println("Array without duplicates currently is: " + Arrays.toString(arrayWithoutDuplicates));
            System.out.println("Is test passed ok?: " + (Arrays.equals(arrayWithoutDuplicates, expectedResult) ? "YES" : "NO"));
        }
    
    }
    

    And the output:

    Array without duplicates is supposed to be: [1, 2, 3, 4]
    Array without duplicates currently is: [1, 2, 3, 4]
    Is test passed ok?: YES
    
    0 讨论(0)
  • 2020-11-22 07:53
    import java.util.Arrays;
    
    public class Practice {
    
    public static void main(String[] args) {
        int a[] = { 1, 3, 3, 4, 2, 1, 5, 6, 7, 7, 8, 10 };
        Arrays.sort(a);
        int j = 0;
        for (int i = 0; i < a.length - 1; i++) {
            if (a[i] != a[i + 1]) {
                a[j] = a[i];
                j++;
            }
        }
        a[j] = a[a.length - 1];
        for (int i = 0; i <= j; i++) {
            System.out.println(a[i]);
        }
    
    }
    }
    **This is the most simplest way**
    
    0 讨论(0)
  • 2020-11-22 07:53

    For a sorted Array, just check the next index:

    //sorted data!
    public static int[] distinct(int[] arr) {
        int[] temp = new int[arr.length];
    
        int count = 0;
        for (int i = 0; i < arr.length; i++) {
            int current = arr[i];
    
            if(count > 0 )
                if(temp[count - 1] == current)
                    continue;
    
            temp[count] = current;
            count++;
        }
    
        int[] whitelist = new int[count];
        System.arraycopy(temp, 0, whitelist, 0, count);
    
        return whitelist;
    }
    
    0 讨论(0)
  • 2020-11-22 07:53

    Okay, so you cannot use Set or other collections. One solution I don't see here so far is one based on the use of a Bloom filter, which essentially is an array of bits, so perhaps that passes your requirements.

    The Bloom filter is a lovely and very handy technique, fast and space-efficient, that can be used to do a quick check of the existence of an element in a set without storing the set itself or the elements. It has a (typically small) false positive rate, but no false negative rate. In other words, for your question, if a Bloom filter tells you that an element hasn't been seen so far, you can be sure it hasn't. But if it says that an element has been seen, you actually need to check. This still saves a lot of time if there aren't too many duplicates in your list (for those, there is no looping to do, except in the small probability case of a false positive --you typically chose this rate based on how much space you are willing to give to the Bloom filter (rule of thumb: less than 10 bits per unique element for a false positive rate of 1%).

    There are many implementations of Bloom filters, see e.g. here or here, so I won't repeat that in this answer. Let us just assume the api described in that last reference, in particular, the description of put(E e):

    true if the Bloom filter's bits changed as a result of this operation. If the bits changed, this is definitely the first time object has been added to the filter. If the bits haven't changed, this might be the first time object has been added to the filter. (...)

    An implementation using such a Bloom filter would then be:

    public static int[] removeDuplicates(int[] arr) {
        ArrayList<Integer> out = new ArrayList<>();
        int n = arr.length;
        BloomFilter<Integer> bf = new BloomFilter<>(...);  // decide how many bits and how many hash functions to use (compromise between space and false positive rate)
    
        for (int e : arr) {
            boolean might_contain = !bf.put(e);
            boolean found = false;
            if (might_contain) {
                // check if false positive
                for (int u : out) {
                    if (u == e) {
                        found = true;
                        break;
                    }
                }
            }
            if (!found) {
                out.add(e);
            }
        }
        return out.stream().mapToInt(i -> i).toArray();
    }
    

    Obviously, if you can alter the incoming array in place, then there is no need for an ArrayList: at the end, when you know the actual number of unique elements, just arraycopy() those.

    0 讨论(0)
  • 2020-11-22 07:54
    int tempvar=0; //Variable for the final array without any duplicates
         int whilecount=0;    //variable for while loop
         while(whilecount<(nsprtable*2)-1) //nsprtable can be any number
         {
    //to check whether the next value is idential in case of sorted array       
    if(temparray[whilecount]!=temparray[whilecount+1])
            {
                finalarray[tempvar]=temparray[whilecount];
                tempvar++;
                whilecount=whilecount+1;
            }
            else if (temparray[whilecount]==temparray[whilecount+1])
            {
                finalarray[tempvar]=temparray[whilecount];
                tempvar++;
                whilecount=whilecount+2;
            }
         }
    

    Hope this helps or solves the purpose.

    0 讨论(0)
  • 2020-11-22 07:54

    Here is my solution. The time complexity is o(n^2)

    public String removeDuplicates(char[] arr) {
            StringBuilder sb = new StringBuilder();
    
            if (arr == null)
                return null;
            int len = arr.length;
    
            if (arr.length < 2)
                return sb.append(arr[0]).toString();
    
            for (int i = 0; i < len; i++) {
    
                for (int j = i + 1; j < len; j++) {
                    if (arr[i] == arr[j]) {
                        arr[j] = 0;
    
                    }
                }
                if (arr[i] != 0)
                    sb.append(arr[i]);
            }
    
            return sb.toString().trim();
        }
    
    0 讨论(0)
提交回复
热议问题