I am looking for an efficient way to determine if a set is a subset of another set in Matlab or Mathematica.
Example: Set A = [1 2 3 4] Set B = [4 3] Set C = [3 4 1] Set
You will likely want to take a look at the built-in set operation functions in MATLAB. Why reinvent the wheel if you don't have to? ;)
HINT: The ISMEMBER function may be of particular interest to you.
EDIT:
Here's one way you can approach this problem using nested loops, but setting them up to try and reduce the number of potential iterations. First, we can use the suggestion in Marc's comment to sort the list of sets by their number of elements so that they are arranged largest to smallest:
setList = {[1 2 3 4],... %# All your sets, stored in one cell array
[4 3],...
[3 4 1],...
[4 3 2 1]};
nSets = numel(setList); %# Get the number of sets
setSizes = cellfun(@numel,setList); %# Get the size of each set
[temp,sortIndex] = sort(setSizes,'descend'); %# Get the sort index
setList = setList(sortIndex); %# Sort the sets
Now we can set up our loops to start with the smallest sets at the end of the list and compare them first to the largest sets at the start of the list to increase the odds we will find a superset quickly (i.e. we're banking on larger sets being more likely to contain smaller sets). When a superset is found, we remove the subset from the list and break the inner loop:
for outerLoop = nSets:-1:2
for innerLoop = 1:(outerLoop-1)
if all(ismember(setList{outerLoop},setList{innerLoop}))
setList(outerLoop) = [];
break;
end
end
end
After running the above code, setList
will have all sets removed from it that are either subsets or duplicates of other sets preceding them in the list.
In the best case scenario (e.g. the sample data in your question) the inner loop breaks after the first iteration every time, performing only nSets-1
set comparisons using ISMEMBER. In the worst case scenario the inner loop never breaks and it will perform (nSets-1)*nSets/2
set comparisons.