An efficient code to determine if a set is a subset of another set

前端 未结 4 2399
悲哀的现实
悲哀的现实 2021-02-20 04:27

I am looking for an efficient way to determine if a set is a subset of another set in Matlab or Mathematica.

Example: Set A = [1 2 3 4] Set B = [4 3] Set C = [3 4 1] Set

相关标签:
4条回答
  • 2021-02-20 04:50

    You will likely want to take a look at the built-in set operation functions in MATLAB. Why reinvent the wheel if you don't have to? ;)

    HINT: The ISMEMBER function may be of particular interest to you.

    EDIT:

    Here's one way you can approach this problem using nested loops, but setting them up to try and reduce the number of potential iterations. First, we can use the suggestion in Marc's comment to sort the list of sets by their number of elements so that they are arranged largest to smallest:

    setList = {[1 2 3 4],...  %# All your sets, stored in one cell array
               [4 3],...
               [3 4 1],...
               [4 3 2 1]};
    nSets = numel(setList);                       %# Get the number of sets
    setSizes = cellfun(@numel,setList);           %# Get the size of each set
    [temp,sortIndex] = sort(setSizes,'descend');  %# Get the sort index
    setList = setList(sortIndex);                 %# Sort the sets
    

    Now we can set up our loops to start with the smallest sets at the end of the list and compare them first to the largest sets at the start of the list to increase the odds we will find a superset quickly (i.e. we're banking on larger sets being more likely to contain smaller sets). When a superset is found, we remove the subset from the list and break the inner loop:

    for outerLoop = nSets:-1:2
      for innerLoop = 1:(outerLoop-1)
        if all(ismember(setList{outerLoop},setList{innerLoop}))
          setList(outerLoop) = [];
          break;
        end
      end
    end
    

    After running the above code, setList will have all sets removed from it that are either subsets or duplicates of other sets preceding them in the list.

    In the best case scenario (e.g. the sample data in your question) the inner loop breaks after the first iteration every time, performing only nSets-1 set comparisons using ISMEMBER. In the worst case scenario the inner loop never breaks and it will perform (nSets-1)*nSets/2 set comparisons.

    0 讨论(0)
  • 2021-02-20 04:53

    Assuming that if no set is a superset of all the supplied sets, you wish to return the empty set. (I.e. if no set is a superset of all sets, return "no thing".)

    So, ..., you want to take the union of all the sets, then find the first set in your list with that many elements. This isn't too hard, skipping the reformatting of the input into internal list form... Mathematica:

        topSet[a_List] := Module[{pool, biggest, lenBig, i, lenI},  
            pool = DeleteDuplicates[Flatten[a]];  
            biggest = {}; lenBig = 0;  
            For[i = 1, i <= Length[a], i++,  
                lenI = Length[a[[i]]];  
                If[lenI > lenBig, lenBig = lenI; biggest = a[[i]]];  
                ];  
            If[lenBig == Length[pool], biggest, {}]  
        ]  
    

    For instance:

        topSet[{{1,2,3,4},{4,3},{3,4,1},{4,3,2,1}}]  
          {1,2,3,4}  
        topSet[{{4, 3, 2, 1}, {1, 2, 3, 4}, {4, 3}, {3, 4, 1}}]  
          {4,3,2,1}  
        topSet[{{1, 2}, {3, 4}}]  
          {}  
    

    As a large test:

        <<Combinatorica`  
        Timing[Short[topSet[Table[RandomSubset[Range[10^3]], {10^3}]]]]  
          {14.64, {}}  
    

    I.e., a set of 1000 randomly selected subsets of the range [1,1000] was analyzed in 14.64 seconds (and, unsurprisingly none of them happened to be a superset of all of them).

    -- Edit - Escaped a less than that was hiding a few lines of implementation. Also ...

    Run time analysis: Let L be the number of lists, N be the total number of elements in all the lists (including duplicates). The pool assignment takes O(L) for the flattening, and O(N) for the deletion of duplicates. In the for loop, all L assignments to lenI cumulatively require O(N) time and all L conditionals require at most O(L) time. The rest is O(1). Since L<N, the total run time, O(L)+O(N)+O(N)+O(L)+O(1), is O(N). I.e., the superset, if it exists, can be found in time proportional to the length of the input -- the sum of the lengths of the individual sets. And the constant hidden behind the big-O isn't large.

    Proof of correctness: A superset, if it exists, (1) contains itself, (2) contains any permutation of itself, (3) contains every element present in any (other) set, (4) is as long or longer than any other set in the collection. Consequences: A superset (if present) is the longest set in the collection, any other set of equal length is a permutation of it, and it contains a copy of every element contained in any set. Therefore, a superset exists if there is a set as large as the union of the collection of sets.

    0 讨论(0)
  • 2021-02-20 04:57

    I think the question you mean to ask is "given a list of sets, pick out the set that contains all of the others". There are a bunch of edge cases where I don't know what output you would want (e.g. A = { 1, 2 } and B = { 3, 4 }), so you need to clarify a lot.

    However, to answer the question you did ask, about set containment, you can use set difference (equivalently complement wrt another set). In Mathematica, this sort of thing:

    setA = {1, 2, 3, 4};
    setB = {4, 3};
    setC = {3, 4, 1};
    setD = {4, 3, 2, 1};
    Complement[setD, setA] == {}
     True
    

    indicates setD is a subset of setA.

    0 讨论(0)
  • 2021-02-20 05:06

    In Mathematica I propose using Alternatives for this.

    For example, if we have a set {1, 2, 3, 4} and we wish to test if set x is a subset we could use: MatchQ[x, {(1|2|3|4) ..}]. The advantage of this construct is that as soon as an element which does not belong is found the test will stop and return False.

    We can package this method as follows:

    maximal[sets_] :=
      Module[{f},
        f[x__] := (f[Union @ Alternatives @ x ..] = Sequence[]; {x});
        f @@@ sets
      ]
    
    maximal @ {{1, 2, 3, 4}, {4, 3}, {5, 1}, {3, 4, 1}, {4, 3, 2, 1}}
    
    {{1, 2, 3, 4}, {5, 1}}
    
    0 讨论(0)
提交回复
热议问题