What is an efficient algorithm for determining the generating sets whose product contains all required permutations?

问题

Consider a list of permutations (order-relevant combinations) of the form :

(1 2 3)
(1 2 4)
(5 2 3)
(5 2 4)

I need to find the smallest number of generating sets for this permutation group. For example, given the permutations above,

(1 2 3,4)
(5 2 3,4)

Is not an optimal solution. The optimal solution is (1,5 2 3,4).

You will notice that this solution contains sets A={1, 5} B={2} and C={3,4} The original list of permutations is the ordered Cartesian product of these sets: A X B X C.

I would like to partition the list of permutations into the fewest possible groups expressed as sets A, B, and C whose product contains all permutations in the list. The final answer is usually, but not always, a list of a list of sets, since it is not always possible to reduce the list of permutations down to a single list of generating sets. That is, it is usually the case that the product of sets A, B, and C account for some of the permutations in the list, but sets D, E, and F are required to account for other permutations in the list and so on.

My crude attempt at solving this problem involved checking to see if I had a match on any of the two permutation slots in the list and recursively merging them. If so, I merged those two combinations, i.e.

(1 2 3)
(1 2 4)

produce

(1 2 3,4)

Unfortunately, the merge order of such combinations is not associative. An ideal solution would merge these combinations in such a way that the final list of sets would subsume the largest possible number of permutations.

To demonstrate the problem with associativity, take this example, which cannot reduce to fewer than two lists of generating sets:

(1 2 4) 
(1 2 3)
(1 2 5) 
(5 2 3) 
(5 2 4)

Suppose you were to recursively merge these according to the following rule, “If any two columns match, I merge by preserving those columns as-is. I then merge the two sets in the third column to produce the new set. After the merger of these two rows, I throw out the two original rows, so they are not re-merged or double-counted.” The order of these merges is significant. Given the list of permutations above, if I merge (1 2 3) and (1 2 4), I get (1 2 3,4). Now, how do I conduct the next merge to optimize the generating set? Suppose I look at (1 2 5) and see that it matches (1 2 3,4) on two columns, I perform the merge and get (1 2 3,4,5). All appears to be well. However, I then merge (5 2 3) and (5 2 4) which results in (5 2 3,4). I compare (5 2 3,4) and (1 2 3,4,5). I do not have a two-column match, so I stop merging. I would have reduced the output to (5 2 3,4) and (1 2 3,4,5).

Now suppose that I merged in a different order. I merge (1 2 3) and (1 2 4) to produce (1 2 3,4). Then I merge (5 2 3) and (5 2 4) to produce (5 2 3,4). I see that I have a match between these two products. I then merge (1 2 3,4) and (5 2 3,4) to produce (1,5 2 3,4). The final list of generating sets is (1,5 2 3,4) and (1 2 5). So you see that the merge order has produced two different answers: (1,5 2 3,4) and (1 2 5) vs. (5 2 3,4) and (1 2 3,4,5).

In this case I would probably settle for either answer, since the same number (2) of lists of generating sets occurs in each answer. However, (1,5 2 3,4) and (1 2 5) is slightly preferable, because (1,5 2 3,4) subsumes the largest possible number of combinations. Suppose, however, that we have a list of 900 combinations. The merge order of a bottom-up solution to the problem will cause you to reduce the problem in a non-optimal way where optimization is the smallest possible count on the list of the list of sets. It's hard to know what the merge order is without looking ahead through every possible merge path, which is not any more efficient than the brute force method of finding matches, which I have also tried.

I have also attempted the brute force method. Why is the efficiency of the brute force method unacceptable? That method first finds a list of unique integers across all columns. It then generates the power set of every possible combination of these integers. It does so for column/set A, column/set B, and column/set C. It then orders these sets by size (biggest to smallest), computes the Cartesian product of each set across every other set in other columns, then it loops through these Cartesian products, which are keyed by their generating sets to check if the Cartesian product matches a list of permutations from the input. This is roughly O(n^3) where n=the number of combinations in the input list. If I could do this in O(n^2) even, it would be a win compared to where it is now.

As far as the memory considerations go. Let’s assume that the domain for each slot is the integers 1-12. The number of distinct combinations across three slots is 12!/3! You’re looking at over 79 million raw combinations. That’s before they are even partitioned into sets by Google’s Guava Collection APIs (which I’d highly recommend BTW). We could probably somehow generate the set lazily, and I get the sense that Google does, but the memory constraints are still large.

I'm really looking for an algorithm to solve this problem. However, if there's a Java library or method that will take care of this with minimal pain, I'm up for that too. Thanks.

回答1:

Thanks everyone for your insights and answers. I'd like to credit this answer to John Kolen (http://www.johnfkolen.com) who has presented a feasible solution as follows:

A Greedy Algorithm for Maximal Coverage of Triples

Input: A set, with subsets A x B x C, of triples, where A, B, and C are sets of integers.

Output: A set of n triples (A_i, B_i, C_i), where A_i in A, B_i in B, and C_i in C, and Union_i A_i x B_i x C_i = X and Intersection_i A_i x B_i x C_i = empty.

Algorithm

The algorithm can be broken into three parts: finding pair covers, finding triple covers, and finally constructing a total cover.

Finding covers of pairs from a subset of B x C involves building a map from elements of B to sets of C. Essentially a small prefix tree, or trie, this data structure makes it easy to find the maximal cover a set of prefixes. For instance, if b_1->C_1 and b_2->C_2, then the maximal cover involving b_1 and b_2 will be <[b_1,b_2],C_1 intersection C_2>.

Once we can find maximal pairs, then it's possible to construct maximal triples. This time, however, the prefix (A) maps to a set of pairs (BxC). By using the previous method it's possible to find examine all subsets of A and their matching pair coverage over the suffixes.

The greedy approach finds the locally best cover and adds it to the solution set. The triples captured by the current cover are removed from consideration, and the process continues until the locally best cover is a singleton. The remaining triples are added to the solution.

The relevant code is attached.

import java.io.IOException;
import java.io.FileReader;
import java.io.BufferedReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Set;
import java.util.Arrays;

class Triples {
    private ArrayList<int[]> _data;
    private HashMap<Integer,HashSet<Integer>> _tree;

    /**
     * Instantiates a container of triples read from the provided file.
     * <p>
     * File should contain lines of the form '[#,#,#]', where # is an
     * integer. Spaces are ignored.
     *
     * @param filename a path to a file containing triples
     */
    public Triples(String filename) {
    _data = new ArrayList<int[]>();
    _tree = new HashMap<Integer, HashSet<Integer>>();
    readfile(filename);
    }

    /**
     * Returns the number of triples.
     *
     * @return the number of triples.
     */
    int size() {
    return _data.size();
    }

    /**
     * Returns a set of encoded pairs (b,c) where (first, b, c) is in the
     * set of triples. The pairs are encoded as integers using b * 255 + c.
     *
     * @param  first  the first value of a triple
     * @return        a set of pair encondig
     */
    HashSet<Integer> tree(int first) {
    return _tree.get(first);
    }

    public class PairCover 
    {
    public ArrayList<Integer> a;
    public ArrayList<Integer> b;
    PairCover() 
    {
        a = b = null;
    }
    PairCover(ArrayList<Integer> ax, ArrayList<Integer> bx) 
    {
        a = ax;
        b = bx;
    }
    /**
     * Returns the number of pairs covered by this cover.
     *
     * @return the number of pairs covered by this cover.
     */
    public int score()
    {
        if (a != null && b != null)
        return a.size() * b.size();
        else
        return 0;
    }
    public String toString() {
        return "<"+a+","+b+">";
    }
    }

    /**
     * Compute the maximal pair covering (B,C) for a set of suffix pairs.
     *
     * @return pair covering where |BxC| is maximized.
     */
    public PairCover maxPairCover(HashSet<Integer> set) {
    // unpack the pairs
    HashMap<Integer,NSet> pairs = new HashMap<Integer,NSet>();
    for(Integer value : set) {
        Integer a = value / 256;
        Integer b = value & 255;
        if (!pairs.containsKey(a))
        pairs.put(a, new NSet());
        pairs.get(a).add(b);
    }

    _mpc_best = new PairCover();
    Integer[] remain_ary = pairs.keySet().toArray(new Integer[0]);

    // recursively examine all subsets pair first values and find matching
    // second value groups.
    maxPairCoverRec(pairs,
            new ArrayList<Integer>(),
            new ArrayList<Integer>(Arrays.asList(remain_ary)),
            null);

    return _mpc_best;
    }

    /**
     * Recursively compute the maximal pair covering (B,C) for a set of 
     * suffix pairs from a set of active prefixes by combining all possible
     * subsets of the remaining prefixes.
     * <p>
     * Stores the best pair cover in _mpc_best. This "global" variable makes it
     * easy to prune search.
     *
     * @param pairs tree-compressed set of pairs
     * @param active currently active pair prefixes
     * @param remain set of pair prefixes to explore
     * @param set the pair suffixes shared by all the active prefixes
     * @return pair covering where |BxC| is maximized.
     */
    void maxPairCoverRec(HashMap<Integer,NSet> pairs,
                 ArrayList<Integer> active,
                 ArrayList<Integer> remain,
                 NSet set) 
    {
    if (set != null) {
        // Prune search if suffix set is empty or that there is no way
        // to generate a better pair cover than the current cover.
        if (set.isEmpty() ||
        (active.size() + remain.size()) * set.size()
        <= _mpc_best.score())
        return ;
    }
    if (remain.isEmpty()) {
        if (active.size() > 0) {
        // Save the current pair cover if it's better than the current
        // cover.
        if (_mpc_best.score() < set.size() * active.size()) {
            _mpc_best.a = (ArrayList<Integer>)(active.clone());
            _mpc_best.b = (ArrayList<Integer>)(set.to_array_list());
        }
        }
        return;
    }

    ArrayList<Integer> active_r = (ArrayList<Integer>)(active.clone());
    ArrayList<Integer> remain_r = (ArrayList<Integer>)(remain.clone());
    Integer a = remain_r.remove(0);
    // Explore active sets without element a.
    maxPairCoverRec(pairs, active_r, remain_r, set);

    // Explore active sets with element a. Perform intersection of 
    // current suffix set with suffixes of a.
    if (set == null) {
        set = pairs.get(a);
    }
    else {
        set = set.intersection(pairs.get(a));
    }
    active_r.add(a);
    maxPairCoverRec(pairs, active_r, remain_r, set);
    }

    public class TripleCover 
    {
    public ArrayList<Integer> a;
    public ArrayList<Integer> b;
    public ArrayList<Integer> c;
    TripleCover() 
    {
        a = b = c = null;
    }
    TripleCover(ArrayList<Integer> ax,
            ArrayList<Integer> bx,
            ArrayList<Integer> cx) 
    {
        a = ax;
        b = bx;
        c = cx;
    }
    TripleCover(int ax,
            int bx,
            int cx) 
    {
        a = new ArrayList<Integer>();
        a.add(ax);
        b = new ArrayList<Integer>();
        b.add(bx);
        c = new ArrayList<Integer>();
        c.add(cx);
    }
    /**
     * Returns the number of pairs covered by this cover.
     *
     * @return the number of pairs covered by this cover.
     */
    public int score()
    {
        if (a != null && b != null && c != null)
        return a.size() * b.size() * c.size();
        else
        return 0;
    }
    public String toString() {
        return "<"+a+","+b+","+c+">";
    }
    }

    /**
     * Compute the maximal triple cover for the pairs currently stored
     * in _tree.
     *
     * @return a triple cover with |AxBxC| maximized
     */
    TripleCover maxTripleCover() {
    _mtc_best = new TripleCover();
    Integer[] remain_ary = _tree.keySet().toArray(new Integer[0]);
    ArrayList<Integer> remain = 
        new ArrayList<Integer>(Arrays.asList(remain_ary));
    maxTripleCoverRec(new ArrayList<Integer>(),
                 remain,
                 null);
    return _mtc_best;
    }

    /**
     * Recursively explore all subsets of values in first position and
     * find the largest cover for the second and third positions.
     * <p>
     * Stores the best triple cover in _mtc_best. This "global" variable 
     * makes it easy to prune search.
     *
     * @param active the active values of the first position
     * @param remain the additional values of the first position to explore
     * @param set the current set of second/third position pairs that are shared by the active values
     */
    void maxTripleCoverRec(ArrayList<Integer> active,
               ArrayList<Integer> remain,
               HashSet<Integer> set) {
    if (set != null &&
        (set.isEmpty() ||
         (remain.size()>0 && 
          (active.size() + remain.size()) * set.size()
          <= _mtc_best.score()))){
        return;
    }
    if (remain.isEmpty()) {
        if (active.size() > 0) {
        PairCover mpc = maxPairCover(set);
        if (_mtc_best.score() < mpc.score() * active.size()) {
            _mtc_best.a = (ArrayList<Integer>)(active.clone());
            _mtc_best.b = mpc.a;
            _mtc_best.c = mpc.b;
        }
        }
        return;
    }

    ArrayList<Integer> active_r = (ArrayList<Integer>)(active.clone());
    ArrayList<Integer> remain_r = (ArrayList<Integer>)(remain.clone());
    Integer a = remain_r.remove(0);
    maxTripleCoverRec(active_r, remain_r, set);
    if (set == null) {
        set = (HashSet<Integer>)(_tree.get(a).clone());
    }
    else {
        set = (HashSet<Integer>)(set.clone());
        set.retainAll(_tree.get(a));
    }
    active_r.add(a);
    maxTripleCoverRec(active_r, remain_r, set);

    }

    /**
     * Finds a covering set for the triples using a greedy approach.
     * The largest cover of the current triples is found, recorded, and
     * the affected triples are removed from consideration. This process
     * continues until singleton covers are left.
     *
     * @return a list of covers
     */
    ArrayList<TripleCover> greedy() {
    // Since the prefix tree is modified as new covers are found,
    // we make a copy
    HashMap<Integer,HashSet<Integer>> old_tree = _tree;
    _tree = (HashMap<Integer,HashSet<Integer>>)(_tree.clone());

    ArrayList<TripleCover> solution = new ArrayList<TripleCover>();
    TripleCover cover;
    do {
        cover = maxTripleCover();  // find best
        solution.add(cover);  // record it
        remove(cover);  // remove covered triples from tree
        System.out.println("" + cover + " ==> " + cover.score());
    } while (cover.score() > 1);

    // Nothing left but singletons, so add them to solution
    for(Integer a : _tree.keySet()) {
        for(Integer pair : _tree.get(a)) {
        int b = pair / 256;
        int c = pair & 255;
        solution.add(new TripleCover(a,b,c));
        }
    }

    // restore prefix tree
    _tree = old_tree; 

    return solution;
    }

    //************************************************************

    private PairCover _mpc_best;
    private TripleCover _mtc_best;


    /**
     * Reads triples from the specified file. Will exit if file does not
     * exist.
     *
     * @param filename a path to a file containing triples
     */
    private void readfile(String filename) {
    try {
        FileReader fr = new FileReader(filename);
        BufferedReader reader = new BufferedReader(fr);
        String line = null;
        while ((line = reader.readLine()) != null) {
        line = line.replace("[","").replace("]","").replace(" ","");
        String[] svalues = line.split(",");
        int[] values = new int[3];
        values[0] = Integer.parseInt(svalues[0]);
        values[1] = Integer.parseInt(svalues[1]);
        values[2] = Integer.parseInt(svalues[2]);
        _data.add(values);
        Integer key = new Integer(values[0]);
        if (!_tree.containsKey(key)) {
            _tree.put(key, new HashSet<Integer>());
        }
        _tree.get(key).add(values[1] * 256 + values[2]);
        }
    } catch (IOException e) {
        System.out.println("Could not open " + filename);
        System.exit(1);
    }
    }

    /**
     * Removes covered triples from the prefix tree
     *
     * @param tc a covering
     */
    private void remove(TripleCover tc) {
    for(Integer a : tc.a) {
        HashSet<Integer> set = _tree.get(a);
        for(Integer b : tc.b) {
        for(Integer c : tc.c) {
            set.remove(b * 256 + c);
        }
        }
        if (set.isEmpty()) {
        _tree.remove(a);
        }
    }
    }

}

来源：https://stackoverflow.com/questions/20890802/what-is-an-efficient-algorithm-for-determining-the-generating-sets-whose-product

标签

java

algorithm

merge

permutation

set-theory