std::vector differences

前端 未结 2 554
有刺的猬
有刺的猬 2021-02-13 00:57

How does one determine what the differences of 2 vectors are?

I have vector v1 and vector v2;

What I am looking

2条回答
  •  被撕碎了的回忆
    2021-02-13 01:03

    Here is the complete and correct answer. Before the set_symmetric_difference algorithm can be used, the source ranges must be ordered:

      using namespace std; // For brevity, don't do this in your own code...
    
      vector v1;
      vector v2;
    
      // ... Populate v1 and v2
    
      // For the set_symmetric_difference algorithm to work, 
      // the source ranges must be ordered!    
      vector sortedV1(v1);
      vector sortedV2(v2);
    
      sort(sortedV1.begin(),sortedV1.end());
      sort(sortedV2.begin(),sortedV2.end());
    
      // Now that we have sorted ranges (i.e., containers), find the differences    
      vector vDifferences;
    
      set_symmetric_difference(
        sortedV1.begin(),
        sortedV1.end(),
        sortedV2.begin(),
        sortedV2.end(),
        back_inserter(vDifferences));
    
      // ... do something with the differences
    

    It should be noted that sorting is an expensive operation (i.e., O(n log n) for common STL implementations). Especially for the case of one or both of the containers being very large (i.e., millions of integers or more), a different algorithm using hash tables may be preferable based on algorithmic complexity. Here is a high level description of this algorithm:

    1. Load each container into a hash table.
    2. If the two containers differ in size, the hash table corresponding to the smaller one will be used for traversal in Step 3. Otherwise, the first of the two hash tables will be used.
    3. Traverse the hash table chosen in Step 2, checking to see if each item is present in both hash tables. If it is, remove it from both of them. The reason that the smaller hash table is preferred for traversal is because hash table lookups are on the average O(1) regardless of container size. Therefore, the time to traverse is a linear function of n (i.e., O(n)), where n is the size of the hash table being traversed.
    4. Take the union of the remaining items in the hash tables and store the result in a difference container.

    C++11 offers us some capability for such a solution by standardizing the unordered_multiset container. I also employed the new usage of the auto keyword for explicit initializations to make the following hash table based solution more concise:

    using namespace std; // For brevity, don't do this in your own code...
    
    // The remove_common_items function template removes some and / or all of the
    // items that appear in both of the multisets that are passed to it. It uses the
    // items in the first multiset as the criteria for the multi-presence test.
    template 
    void remove_common_items(unordered_multiset &ms1, 
                             unordered_multiset &ms2)
    {
      // Go through the first hash table
      for (auto cims1=ms1.cbegin();cims1!=ms1.cend();)
      {
        // Find the current item in the second hash table
        auto cims2=ms2.find(*cims1);
    
        // Is it present?
        if (cims2!=ms2.end())
        {
          // If so, remove it from both hash tables
          cims1=ms1.erase(cims1);
          ms2.erase(cims2);
        }
        else // If not
          ++cims1; // Move on to the next item
      }
    }
    
    int main()
    {
      vector v1;
      vector v2;
    
      // ... Populate v1 and v2
    
      // Create two hash tables that contain the values
      // from their respective initial containers    
      unordered_multiset ms1(v1.begin(),v1.end());
      unordered_multiset ms2(v2.begin(),v2.end());
    
      // Remove common items from both containers based on the smallest
      if (v1.size()<=v2.size)
        remove_common_items(ms1,ms2);
      else
        remove_common_items(ms2,ms1);
    
      // Create a vector of the union of the remaining items
      vector vDifferences(ms1.begin(),ms1.end());
    
      vDifferences.insert(vDifferences.end(),ms2.begin(),ms2.end());
    
      // ... do something with the differences
    }
    

    In order to determine which solution is better for a particular situation, profiling both algorithms would be a smart course of action. Although the hash table based solution is in O(n), it requires more code and does more work per duplicate found (i.e., hash table removals). It also (sadly) uses a custom differencing function rather than a standard STL algorithm.

    It should be noted that both solutions present the differences in an order that is most likely quite different from the order that the elements appeared in the original containers. There is a way around this by using a variant of the hash table solution. A high level description follows (which only differs in Step 4 from the preceding solution):

    1. Load each container into a hash table.
    2. If the two containers differ in size, the smaller hash table will be used for traversal in Step 3. Otherwise, the first of the two will be used.
    3. Traverse the hash table chosen in Step 2, checking to see if each item is present in both hash tables. If it is, remove it from both of them.
    4. To form the difference container, traverse the original containers in order (i.e., the first container before the second). Look up each item from each container in its respective hash table. If it is found, the item is to be added to the difference container and removed from its hash table. Items not present in the respective hash tables will be skipped. Thus, only the items that are present in the hash tables will wind up in the difference container and their order of appearance will remain the same as it was in the original containers, because those containers dictate the order of the final traversal.

    In order to maintain original order, Step 4 has become more expensive than in the previous solution, especially if the number of items removed is high. This is because:

    1. All items will be tested a second time for eligibility to appear in the difference container, via a presence test in their respective hash tables.
    2. The hash tables will have the remainder of their items removed one at a time when the difference container is formed, as part of the present in the difference testing of Item 1.

提交回复
热议问题