Alpha-beta pruning for Minimax

可紊 提交于 2019-11-30 08:47:29

To understand Alpha-Beta, consider the following situation. It's Whites turn, white is trying to maximize the score, black is trying to minimize the score.

White evaluates move A,B, and C and finds the best score is 20 with C. Now consider what happens when evaluating move D:

If white selects move D, we need to consider counter-moves by black. Early on, we find black can capture the white queen, and that subtree gets a MIN score of 5 due to the lost queen. However, we have not considered all of blacks counter-moves. Is it worth checking the rest? No.

We don't care if black can get a score lower than 5 because whites move "C" could keep the score to 20. Black will not choose a counter-move with a score higher than 5 because he is trying to MINimize the score and has already found move with a score of 5. For white, move C is preferred over move D as soon as the MIN for D (5 so far) goes below that of C (20 for sure). So we "prune" the rest of the tree there, pop back up a level and evaluate white moves E,F,G,H.... to the end.

Hope that helps.

You don't need to evaluate the entire subtree of a node to decide its value. Alpha Beta Pruning uses two dynamically computed bounds alpha and beta to bound the values that nodes can take.

Alpha is the minimum value that the max player is guaranteed (regardless of what the min player does) through another path through the game tree. This value is used to perform cutoffs (pruning) at the minimizing levels. When the min player has discovered that the score of a min node would necessarily be less than alpha, it need not evaluate any more choices from that node because the max player already has a better move (the one which has value alpha).

Beta is the maximum value that the min player is guaranteed and is used to perform cutoffs at the maximizing levels. When the max player has discovered that the score of a max node would necessarily be greater than beta, it can stop evaluating any more choices from that node because the min player would not allow it to take this path since the min player already has a path that guarantees a value of beta.

I've written a detailed explanation of Alpha Beta Pruning, its pseudocode and several improvements: http://kartikkukreja.wordpress.com/2014/06/29/alphabetasearch/

(Very) short explanation for mimimax:

  • You (the evaluator of a board position) have the choice of playing n moves. You try all of them and give the board positions to the (opponent) evaluator.

    • The opponent evaluates the new board positions (for him, the opponent side) - by doing essentially the same thing, recursively calling (his opponent) evaluator, unless the maximum depth or some other condition has been reached and a static evaluator is called - and then selects the maximum evaluation and sends the evaluations back to you.
  • You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.


(Very) short explanation for α-β-pruning:

  • You (the evaluator of a board position) have the choice of playing n moves. You try all of them one by one and give the board positions to the (opponent) evaluator - but you also pass along your current evaluation (of your board).

    • The opponent evaluates the new board position (for him, the opponent side) and sends the evaluation back to you. But how does he do that? He has the choice of playing m moves. He tries all of them and gives the new board positions (one by one) to (his opponent) evaluator and then chooses the maximum one.
    • Crucial step: If any of those evaluations that he gets back, is bigger than the minimum you gave him, it is certain that he will eventually return an evaluation value at least that large (because he wants to maximize). And you are sure to ignore that value (because you want to minimize), so he stops any more work for boards he hasn't yet evaluated.
  • You select the move that has the minimum of those evaluation. And that evaluation is the evaluation of the board you had to evaluate at the beginning.

Here's a short answer -- you can know the value of a node without computing the precise value of all its children.

As soon as we know that a child node cannot be better, from the perspective of the parent-node player, than the previously evaluated sibling nodes, we can stop evaluating the child subtree. It's at least this bad.

sehe

I think your question hints at misunderstanding of the evaluation function

if you can work out the score of a node, you will need to know the score of all nodes on a layer lower than the node (in my understanding of minimax)

I'm not completely sure what you meant there, but it sounds wrong. The evaluation function (EF) is usually a very fast, static position evaluation. This means that it needs only look at a single position and reach a 'verdict' from that. (IOW, you don't always evaluate a branch to n plys)

Now many times, the evaluation truly is static, which means that the position evaluation function is completely deterministic. This is also the reason why the evaluation results are easily cacheable (since they will be the same each time a position is evaluated).


Now, for e.g. chess, there is usually quite a bit of overt/covert deviation from the above:

  • a position might be evaluated differently depending on game context (e.g. whether the exact position did occur earlier during the game; how many moves without pawn moves/captures have occurred, en passant and castling opportunity). The most common 'trick' to tackle this is by actually incorporating that state into the 'position'1

  • a different EF is usually selected for the different phases of the game (opening, middle, ending); this has some design impact (how to deal with cached evaluations when changing the EF? How to do alpha/beta pruning when the EF is different for different plies?)

To be honest, I'm not aware how common chess engines solve the latter (I simply avoided it for my toy engine)

I'd refer to an online resources like:


1just like the 'check'/'stalemate' conditions, if they are not special cased outside the evaluation function anyways

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!