Dependency Algorithm - find a minimum set of packages to install

前端 未结 10 1917
别那么骄傲
别那么骄傲 2021-02-02 17:23

I\'m working on an algorithm which goal is to find a minimum set of packages to install package \"X\".

I\'ll explain better with an example:



        
相关标签:
10条回答
  • 2021-02-02 17:27

    My code is here.

    Scenario:

    Represent the constraints.

    X : A&(E|C)
    A : E&(Y|N)
    E : B&(Z|Y)
    C : A|K
    

    Prepare two variables target and result. Add the node X to target.

    target = X, result=[]
    

    Add single node X to the result. Replace node X with its dependent in the target.

    target = A&(E|C), result=[X]
    

    Add single node A to result. Replace node A with its dependent in the target.

    target = E&(Y|N)&(E|C), result=[X, A]
    

    Single node E must be true. So (E|C) is always true. Remove it from the target.

    target = E&(Y|N), result=[X, A]
    

    Add single node E to result. Replace node E with its dependent in the target.

    target = B&(Z|Y)&(Y|N), result=[X, A, E]
    

    Add single node B to result. Replace node B with its dependent in the target.

    target = (Z|Y)&(Y|N), result=[X, A, E, B]
    

    There are no single nodes any more. Then expand the target expression.

    target = Z&Y|Z&N|Y&Y|Y&N, result=[X, A, E, B]
    

    Replace Y&Y to Y.

    target = Z&Y|Z&N|Y|Y&N, result=[X, A, E, B]
    

    Choose the term that has smallest number of nodes. Add all nodes in the term to the target.

    target = , result=[X, A, E, B, Y]
    
    0 讨论(0)
  • 2021-02-02 17:30

    I actually think graphs are the appropriate structure for this problem. Note that A and (E or C) <==> (A and E) or (A and C). Thus, we can represent X = A and (E or C) with the following set of directed edges:

    A <- K1
    E <- K1
    A <- K2
    C <- K2
    K1 <- X
    K2 <- X
    

    Essentially, we're just decomposing the logic of the statement and using "dummy" nodes to represent the ANDs.

    Suppose we decompose all the logical statements in this fashion (dummy Ki nodes for ANDS and directed edges otherwise). Then, we can represent the input as a DAG and recursively traverse the DAG. I think the following recursive algorithm could solve the problem:

    Definitions:
    Node u - Current Node.
    S - The visited set of nodes.
    children(x) - Returns the out neighbors of x.

    Algorithm:

    shortestPath u S = 
    if (u has no children) {
        add u to S
        return 1
    } else if (u is a dummy node) {
      (a,b) = children(u)
      if (a and b are in S) {
        return 0
      } else if (b is in S) { 
        x = shortestPath a S
        add a to S
        return x
      } else if (a in S) {
        y = shortestPath b S
        add b to S
        return y
      } else {
        x = shortestPath a S
        add a to S
        if (b in S) return x
        else {
            y = shortestPath b S
            add b to S
            return x + y
        }
      }
    } else {
      min = Int.Max
      min_node = m
      for (x in children(u)){
        if (x is not in S) {
          S_1 = S
          k = shortestPath x S_1
          if (k < min) min = k, min_node = x
        } else {
          min = 1
          min_node = x
        }
      }
      return 1 + min
    }
    

    Analysis: This is an entirely sequential algorithm that (I think) traverses each edge at most once.

    0 讨论(0)
  • 2021-02-02 17:35

    Since the graph consists of two different types of edges (AND and OR relationship), we can split the algorithm up into two parts: search all nodes that are required successors of a node and search all nodes from which we have to select one single node (OR).

    Nodes hold a package, a list of nodes that must be successors of this node (AND), a list of list of nodes that can be successors of this node (OR) and a flag that marks on which step in the algorithm the node was visited.

    define node: package p , list required , listlist optional , 
                 int visited[default=MAX_VALUE]
    

    The main-routine translates the input into a graph and starts traversal at the starting node.

    define searchMinimumP:
        input: package start , string[] constraints
        output: list
    
        //generate a graph from the given constraint
        //and save the node holding start as starting point
        node r = getNode(generateGraph(constraints) , start)
    
        //list all required nodes
        return requiredNodes(r , 0)
    

    requiredNodes searches for all nodes that are required successors of a node (that are connected to n via AND-relation over 1 or multiple edges).

    define requiredNodes:
        input: node n , int step
        output: list
    
        //generate a list of all nodes that MUST be part of the solution
        list rNodes
        list todo
    
        add(todo , n)
    
        while NOT isEmpty(todo)
            node next = remove(0 , todo)
            if NOT contains(rNodes , next) AND next.visited > step
                add(rNodes , next)
                next.visited = step
    
        addAll(rNodes , optionalMin(rNodes , step + 1))
    
        for node r in rNodes
            r.visited = step
    
        return rNodes
    

    optimalMin searches for the shortest solution among all possible solutions for optional neighbours (OR). This algorithm is brute-force (all possible selections for neighbours will be inspected.

    define optionalMin:
        input: list nodes , int step
        output: list
    
        //find all possible combinations for selectable packages
        listlist optSeq
        for node n in nodes
            if NOT n.visited < step
                for list opt in n.optional
                    add(optSeq , opt)
    
        //iterate over all possible combinations of selectable packages
        //for the given list of nodes and find the shortest solution
        list shortest
        int curLen = MAX_VALUE
    
        //search through all possible solutions (combinations of nodes)
        for list seq in sequences(optSeq)
            list subseq
    
            for node n in distinct(seq)
                addAll(subseq , requiredNodes(n , step + 1))
    
            if length(subseq) < curLen
                //mark all nodes of the old solution as unvisited
                for node n in shortest
                    n.visited = MAX_VALUE
    
                curLen = length(subseq)
                shortest = subseq
            else
                //mark all nodes in this possible solution as unvisited
                //since they aren't used in the final solution (not at this place)
                for node n in subseq
                    n.visited = MAX_VALUE
    
         for node n in shorest
             n.visited = step
    
         return shortest
    

    The basic idea would be the following: Start from the starting node and search for all nodes that must be part of the solution (nodes that can be reached from the starting node by only traversing AND-relationships). Now for all of these nodes, the algorithm searches for the combination of optional nodes (OR) with the fewest nodes required.

    NOTE: so far this algorithm isn't much better than brute-force. I'll update as soon as i've found a better approach.

    0 讨论(0)
  • 2021-02-02 17:37

    A lot of the answers here focus on how this is a theoretically hard problem due to its NP-hard status. While this means you will experience asymptotically poor performance exactly solving the problem (given current solution techniques), you may still be able to solve it quickly (enough) for your particular problem data. For instance, we are able to exactly solve enormous traveling salesman problem instances despite the fact that the problem is theoretically challenging.

    In your case, a way to solve the problem would be to formulate it as a mixed integer linear program, where there is a binary variable x_i for each package i. You can convert requirements A requires (B or C or D) and (E or F) and (G) to constraints of the form x_A <= x_B + x_C + x_D ; x_A <= x_E + x_F ; x_A <= x_G, and you can require that a package P be included in the final solution with x_P = 1. Solving such a model exactly is relatively straightforward; for instance, you can use the pulp package in python:

    import pulp
    
    deps = {"X": [("A"), ("E", "C")],
            "A": [("E"), ("H", "Y")],
            "E": [("B"), ("Z", "Y")],
            "C": [("A", "K")],
            "H": [],
            "B": [],
            "Y": [],
            "Z": [],
            "K": []}
    required = ["X"]
    
    # Variables
    x = pulp.LpVariable.dicts("x", deps.keys(), lowBound=0, upBound=1, cat=pulp.LpInteger)
    
    mod = pulp.LpProblem("Package Optimization", pulp.LpMinimize)
    
    # Objective
    mod += sum([x[k] for k in deps])
    
    # Dependencies
    for k in deps:
        for dep in deps[k]:
            mod += x[k] <= sum([x[d] for d in dep])
    
    # Include required variables
    for r in required:
        mod += x[r] == 1
    
    # Solve
    mod.solve()
    for k in deps:
        print "Package", k, "used:", x[k].value()
    

    This outputs the minimal set of packages:

    Package A used: 1.0
    Package C used: 0.0
    Package B used: 1.0
    Package E used: 1.0
    Package H used: 0.0
    Package Y used: 1.0
    Package X used: 1.0
    Package K used: 0.0
    Package Z used: 0.0
    

    For very large problem instances, this might take too long to solve. You could either accept a potentially sub-optimal solution using a timeout (see here) or you could move from the default open-source solvers to a commercial solver like gurobi or cplex, which will likely be much faster.

    0 讨论(0)
  • 2021-02-02 17:42

    Unfortunately, there is little hope to find an algorithm which is much better than brute-force, considering that the problem is actually NP-hard (but not even NP-complete).

    A proof of NP-hardness of this problem is that the minimum vertex cover problem (well known to be NP-hard and not NP-complete) is easily reducible to it:

    Given a graph. Let's create package Pv for each vertex v of the graph. Also create package X what "and"-requires (Pu or Pv) for each edge (u, v) of the graph. Find a minimum set of packages to be installed in order to satisfy X. Then v is in the minimum vertex cover of the graph iff the corresponding package Pv is in the installation set.

    0 讨论(0)
  • 2021-02-02 17:42

    This is an example of a Constraint Satisfaction Problem. There are Constraint Solvers for many languages, even some that can run on generic 3SAT engines, and thus be run on GPGPU.

    0 讨论(0)
提交回复
热议问题