Best algorithm for delete duplicates in array of strings

前端 未结 7 1892
一个人的身影
一个人的身影 2021-02-09 02:44

Today at school the teacher asked us to implement a duplicate-deletion algorithm. It\'s not that difficult, and everyone came up with the following solution (pseudocode):

<
相关标签:
7条回答
  • 2021-02-09 02:54

    You can often use a space-time tradeoff and invest more space to reduce time.

    In this case you could use a hash table to determine the unique words.

    0 讨论(0)
  • 2021-02-09 02:54

    This is the shortest algorithm that worked where arrNames and arrScores is parallel arrays and the highest score is taken.

    I := 0;
    J := 0;
    //iCount being the length of the array
    
    for I := 1 to iCount do
    for J := I + 1 to iCount do
    
       if arrNames[I] = arrNames[J] then
       begin
    
         if arrScores[I] <= arrScores[J] then
         arrScores[I] := arrScores[J];
    
       arrScores[J] := arrScores[iCount];
       arrNames[J] := arrNames[iCount];
    
       arrScores[iCount] := 0;
       arrNames[iCount] := '';
    
       Dec(iCount);
       end;
    
    0 讨论(0)
  • 2021-02-09 03:11
    def dedup(l):
        ht, et = [(None, None) for _ in range(len(l))], []
        for e in l:
            h, n = hash(e), h % len(ht)
            while True:
                if ht[n][0] is None:
                    et.append(e)
                    ht[n] = h, len(et) - 1
                if ht[n][0] == h and et[ht[n][1]] == e:
                    break
                if (n := n + 1) == len(ht):
                    n = 0
        return et
    
    0 讨论(0)
  • 2021-02-09 03:12

    add is O(n), so your CC calculation is wrong. Your algorithm is O(n^2).

    Moreover, how would remove be implemented? It also looks like it would be O(n) - so the initial algorithm would be O(n^3).

    0 讨论(0)
  • 2021-02-09 03:13

    The easiest solution will be to simply sort the array (takes O(n log n) with standard implementation if you may use them. otherwise consider making an easy randomized quicksort (code is even on wikipedia)).

    Afterwards scan it for one additional time. During that scan simple eliminate consecutive identical elements.

    If you want to do it in O(n), you can also use a HashSet with elements you have already seen. Just iterate once over your array, for each element check if it is in your HashSet.

    If it isn't in there, add it. If it is in there, remove it from the array.

    Note, that this will take some additional memory and the hashing will have a constant factor that contributes to your runtime. Althought the time complexity is better, the practical runtime will only be onyl be faster once you exceed a certain array size

    0 讨论(0)
  • 2021-02-09 03:14

    If the order of the final solution is irrelevant, you could break the array into smaller arrays based on length of the strings, and then remove duplicates from those arrays. Example:

    // You have 
    {"a", "ab", "b", "ab", "a", "c", "cd", "cd"}, 
    
    // you break it into 
    {"a", "b", "a", "c"} and {"ab", "ab", "cd", "cd"}, 
    
    // remove duplicates from those arrays using the merge method that others have mentioned, 
    // and then combine the arrays back together into 
    {"a", "b", "c", "ab", "cd"}
    
    0 讨论(0)
提交回复
热议问题