Programmer Puzzle: Encoding a chess board state throughout a game

前端 未结 30 1527
闹比i
闹比i 2021-01-29 17:16

Not strictly a question, more of a puzzle...

Over the years, I\'ve been involved in a few technical interviews of new employees. Other than asking the standard \"do you

相关标签:
30条回答
  • 2021-01-29 17:32

    At each position get the number of all possible moves.

    next move is generated as

    index_current_move =n % num_of_moves //this is best space efficiency
    n=n/num_of_moves
    

    provably best space efficiency for storing randomly generated game and need approx 5 bits/move on average since you have 30-40 possible moves. Assembling storage is just generating n in reverse order.

    Storing position is harder to crack, because of great redundancy. (There can be up to 9 queens on board for one site but in that case there are no pawns, and bishops if on the board are on opposite colored squares) but generally is like storing combination of same pieces over remaining squares.)

    EDIT:

    Point in saving moves is to store only the index of move. Instead of storing Kc1-c2 and trying to reduce this info we should add only index of move generated from deterministic movegenerator(position)

    At each move we add information of size

    num_of_moves = get_number_of_possible_moves(postion) ;
    

    in pool and this number cannot be reduced

    generating information pool is

    n=n*num_of_moves+ index_current_move
    

    extra

    If there is only one move available in final position, save as number of previously done forced moves. Example: if starting position has 1 forced moves for each side (2 moves) and we want to save this as one move game, store 1 in pool n.

    example of storing into info pool

    Lets suppose that we have known starting positions and we do 3 moves.

    In first move there are 5 available moves, and we take move index 4. In second move there are 6 available moves and we take position index 3 and in 3th move there are 7 moves available for that side and he chose to pick the move index 2.

    Vector form; index=[4,3,2] n_moves=[5,6,7]

    We are encoding this info backwards, so n= 4+5*(3+6*(2))=79 (no multiplying by 7 needed)

    How to unloop this? First we have position and we find out that there are 5 moves available. So

    index=79%5=4
    n=79/5=15; //no remainder
    

    We take move index 4 and examine position again and from this point we find out that there are 6 possible moves.

    index=15%6=3
    n=15/6=2
    

    And we take move index 3 which gets us to a position with 7 possible moves.

    index=2%7=2
    n=2/7=0
    

    We do last move index 2 and we reach final position.

    As you can see the time complexity is O(n) ansd space complexity is O(n). Edit: time complexity is actually O(n^2) because the number you multipy by increases, but there should be no problem storing games up to 10,000 moves.


    saving position

    Can be done close to optimum.

    When we find out about information and storing informations let me talk more about it. General idea is to decrease redundancy (I will talk about that later). Lets presume that there were no promotions and no taking so there are 8 pawns, 2 rooks, 2 knights, 2 bishops 1 king and 1 queen per side.

    What do we have to save: 1. position of each peace 2. posibilities of castling 3. possibilities of en-passant 4. side that has move avaliable

    Let's suppose that every piece can stand anywhere but not 2 pieces at same place. Number of ways 8 pawns of same color can be arranged on board is C(64/8) (binomial) which is 32 bits, then 2 rooks 2R-> C(56/2), 2B -> C(54/2), 2N->C(52/2), 1Q->C(50/1), 1K -> C(49/1) and same for other site but starting with 8P -> C(48/8) and so on.

    Multiplying this together for both sites get us number 4634726695587809641192045982323285670400000 which is approx 142 bits, we have to add 8 for one possible en-passant (en-passant pawn can be in one of 8 places), 16 (4 bits) for castling limitations and one bit for site that has move. We end up with 142+3+4+1=150bits

    But now let's go on the hunt for redundancy on the board with 32 pieces and no taking.

    1. both black and white pawns are on same column and facing each other. Each pawn is facing other pawn that means that white pawn can be at most at 6th rank. This bring us 8*C(6/2) instead of C(64/8)*C(48/8) which decrease information by 56 bits.

    2. possibility of castling is also redundant. If rooks are not on starting place there is no castling possibility whit that rook. So we can imaginaly add 4 squares on board to get the extra info if castling whit this rook is possible and remove 4 castling bits. So instead of C(56/2)*C(40/2)*16 we have C(58/2)*C(42/2) and we lost 3.76 bits (almost all of 4 bits)

    3. en-passant: When we store one of 8 en passant possibilites, we know position of black pawn and reduce informational redindancy (if it is white move and has 3th pawn en-passant that mean that black pawn is on c5 and white pawn is either c2,c3 or c4) so insted of C(6/2) we have 3 and we lost 2.3 bits. We decrease some redundancy if we store whit en-passant number also side from which can be done (3 possibilities-> left,right,both) and we know the possiton of pawn that can take en passant. (for instance from prevous en passant example whit black on c5 what can be in left, right or both. if it is on one site we have 2*3 (3 for storing psissibilites and 2 possible moves for black pawn on 7th or 6 rank) insted of C(6/2) and we reduce by 1.3 bits and if on boths sides we reduce by 4.2 bits. That way we can reduce by 2.3+1.3=3.6 bits per en passant.

    4. bishops: bisops can be on opostite squares only, this reduce redundancy by 1 bit for each site.

    If we sum up we need 150-56-4-3.6-2=85bits for storing chess position if there were no takings

    And probably not much more if there are takings and promotions taken in account (but i will write about that later if somebody will find this long post usefull)

    0 讨论(0)
  • 2021-01-29 17:33

    It'd add interest to optimize for average-case size for typical games played by humans, instead of the worst case. (The problem statement doesn't say which; most responses assume worst-case.)

    For the move sequence, have a good chess engine generate moves from each position; it'll produce a list of k possible moves, ordered by its ranking of their quality. People generally pick good moves more often than random moves, so we need to learn a mapping from each position in the list to the probability that people pick a move that 'good'. Using these probabilities (based on a corpus of games from some internet chess database), encode the moves with arithmetic coding. (The decoder must use the same chess engine and mapping.)

    For the starting position, ralu's approach would work. We could refine it with arithmetic coding there as well, if we had some way to weight the choices by probability — e.g. pieces often appear in configurations defending each other, not at random. It's harder to see an easy way to incorporate that knowledge. One idea: fall back on the above move encoding instead, starting from the standard opening position and finding a sequence that ends in the desired board. (You might try A* search with a heuristic distance equaling the sum of the distances of pieces from their final positions, or something along those lines.) This trades some inefficiency from overspecifying the move sequence vs. efficiency from taking advantage of chess-playing knowledge. (You can claw back some of the inefficiency by eliminating move choices that would lead to a previously-explored position in the A* search: these can get weight 0 in the arithmetic code.)

    It's also kind of hard to estimate how much savings this would buy you in average-case complexity, without gathering some statistics from an actual corpus. But the starting point with all moves equally probable I think would already beat most of the proposals here: the arithmetic coding doesn't need an integer number of bits per move.

    0 讨论(0)
  • 2021-01-29 17:33

    Most people have been encoding the board state, but regarding the moves themselves.. Here's a bit-encoding description.

    Bits per piece:

    • Piece-ID: Max 4 bits to identify the 16 pieces per side. White/black can be inferred. Have an ordering defined on the pieces. As the number of pieces drops below the respective powers of two, use fewer bits to describe the remaining pieces.
    • Pawn: 3 possibilities on the first move, so +2 bits (forward by one or two squares, en passant.) Subsequent moves do not allow moving forward by two, so +1 bit is sufficient. Promotion can be inferred in the decoding process by noting when the pawn has hit the last rank. If the pawn is known to be promoted, the decoder will expect another 2 bits indicating which of the 4 major pieces it has been promoted to.
    • Bishop: +1 bit for diagonal used, Up to +4 bits for distance along the diagonal (16 possibilities). The decoder can infer the max possible distance that the piece can move along that diagonal, so if it's a shorter diagonal, use less bits.
    • Knight: 8 possible moves, +3 bits
    • Rook: +1 bit for horizontal / vertical, +4 bits for distance along the line.
    • King: 8 possible moves, +3 bits. Indicate castling with an 'impossible' move -- since castling is only possible while the king is on the first rank, encode this move with an instruction to move the king 'backwards' -- i.e. out of the board.
    • Queen: 8 possible directions, +3bits. Up to +4 more bits for distance along the line / diagonal (less if the diagonal is shorter, as in the bishop's case)

    Assuming all pieces are on the board, these are the bits per move: Pawn - 6 bits on first move, 5 subsequently. 7 if promoted. Bishop: 9 bits (max), Knight: 7, Rook: 9, King: 7, Queen: 11 (max).

    0 讨论(0)
  • 2021-01-29 17:35

    Just like they encode games on books and papers: every piece has a symbol; since it's a "legal" game, white moves first - no need to encode white or black separetely, just count the number of moves to determine who moved. Also, every move is encoded as (piece,ending position) where 'ending position' is reduced to the least amount of symbols that allows to discern ambiguities (can be zero). Length of game determines number of moves. One can also encode the time in minutes (since last move) at every step.

    Encoding of the piece could be done either by assigning a symbol to each (32 total) or by assigning a symbol to the class, and use the ending position to understand which of the piece was moved. For example, a pawn has 6 possible ending positions; but on average only a couple are available for it at every turn. So, statistically, encoding by ending position might be best for this scenario.

    Similar encodings are used for spike trains in computational neuroscience (AER).

    Drawbacks: you need to replay the entire game to get at the current state and generate a subset, much like traversing a linked list.

    0 讨论(0)
  • 2021-01-29 17:36

    I saw this question last night and it intrigued me so I sat in bed thinking up solutions. My final answer is pretty similar to int3's actually.

    Basic solution

    Assuming a standard chess game and that you don't encode the rules (like White always goes first), then you can save a lot by encoding just the moves each piece makes.

    There are 32 pieces total but on each move you know what colour is moving so there's only 16 squares to worry about, which is 4 bits for which piece moves this turn.

    Each piece only has a limited moveset, which you would enumerate in some way.

    • Pawn: 4 options, 2 bits (1 step forward, 2 steps forward, 1 each diagonal)
    • Rook: 14 options, 4 bits (max of 7 in each direction)
    • Bishop: 13 options, 4 bits (if you have 7 in one diagonal, you only have 6 in the other)
    • Knight: 8 options, 3 bits
    • Queen: 27 options, 5 bits (Rook+Bishop)
    • King: 9 options, 4 bits (8 one-step moves, plus the castling option)

    For promotion, there are 4 pieces to choose from (Rook, Bishop, Knight, Queen) so on that move we would add 2 bits to specify that. I think all the other rules are covered automatically (e.g. en passant).

    Further optimizations

    First, after 8 pieces of one colour have been captured, you could reduce the piece encoding to 3 bits, then 2 bits for 4 pieces and so on.

    The main optimization though is to enumerate only the possible moves at each point in the game. Assume we store a Pawn's moves as {00, 01, 10, 11} for 1 step forward, 2 steps forward, diagonal left and diagonal right respectively. If some moves are not possible we can remove them from the encoding for this turn.

    We know the game state at every stage (from following all the moves), so after reading which piece is going to move, we can always determine how many bits we need to read. If we realize a pawn's only moves at this point are capture diagonally right or move forward one, we know to only read 1 bit.

    In short, the bit storage listed above for each piece is a maximum only. Nearly every move will have fewer options and often fewer bits.

    0 讨论(0)
  • 2021-01-29 17:36

    Thomas has the right approach for encoding the board. However this should be combined with ralu's approach for storing moves. Make a list of all possible moves, write out the number of bits needed to express this number. Since the decoder is doing the same calculation it knows how many are possible and can know how many bits to read, no length codes are needed.

    Thus we get 164 bits for the pieces, 4 bits for castling info (assuming we are storing a fragment of a game, otherwise it can be reconstructed), 3 bits for en passant eligibility info--simply store the column where the move occurred (If en passant isn't possible store a column where it's not possible--such columns must exist) and 1 for who is to move.

    Moves will typically take 5 or 6 bits but can vary from 1 to 8.

    One additional shortcut--if the encode starts with 12 1 bits (an invalid situation--not even a fragment will have two kings on one side) you abort the decode, wipe the board and set up a new game. The next bit will be a move bit.

    0 讨论(0)
提交回复
热议问题