Not strictly a question, more of a puzzle...
Over the years, I\'ve been involved in a few technical interviews of new employees. Other than asking the standard \"do you
Storing the board state
The simplest way I thought of is too first have a array of 8*8 bits representing the location of each piece (So 1 if there is a chess piece there and 0 if there isn't). As this is a fixed length we don't need any terminator.
Next represent every chess piece in order of its location. Using 4 bits per piece, this takes 32*4 bits (128 in total). Which is really really wasteful.
Using a binary tree, we can represent a pawn in one byte, a knight and rook and bishop in 3 and a king and queen in 4. As we also need to store the color of the piece, which takes an extra byte it ends up as (forgive me if this is wrong, I have never looked at Huffman coding in any detail before):
Given the totals:
2*16 + 4*4 + 4*4 + 4*4 + 2*5 + 2*5 = 100
Which beats using a fixed size set of bits by 28 bits.
So the best method I have found is to store it in a 82 + 100 bit array
8*8 + 100 = 164
Storing Moves
The first thing we need to know with is which piece is moving to where. Given that there are at maximum 32 pieces on the board and we know what each piece is, rather than an integer representing the square, we can have an integer representing the piece offset, which means we only have to fit 32 possible values to represent a piece.
Unfortunately there are various special rules, like castling or overthrowing the king and forming a republic (Terry Pratchett reference), so before we store the piece to move we need a single bit indicating if it is a special move or not.
So for each normal move we have a necessary 1 + 5 = 6
bits. (1 bit type, 5 bits for the piece)
After the piece number has been decoded, we then know the type of piece, and each piece should represent its move in the most efficient way. For example (If my chess rules are up to scratch) a pawn has a total of 4 possible moves (take left, take right, move one forward, move two forward).
So to represent a pawn move we need '6 + 2 = 8' bits. (6 bit for initial move header, 2 bits for what move)
Moving for the queen would be more complex, in that it would be best to have a direction (8 possible directions, so 3 bits) and a total of 8 possible squares to move to for each direction (so another 3 bits). So to represent moving a queen it would require 6 + 3 + 3 = 12
bits.
The last thing that occurrences to me is that we need to store which players turn it is. This should be a single bit (White or black to move next)
Resultant Format
So the file format would look something like this
[64 bits] Initial piece locations
[100 bits max] Initial pieces
[1 bit] Player turn
[n bits] Moves
Where a Move is
[1 bit] Move type (special or normal)
[n bits] Move Detail
If the Move is a normal move, the Move Detail looks something like
[5 bits] piece
[n bits] specific piece move (usually in the range of 2 to 6 bits]
If it is a special move
It should have an integer type and then any additional information (like if it is castling). I cannot remember the number of special moves, so it may be OK just to indicate that it is a special move (if there is only one)
There are 64 possible board positions, so you need 6 bits per position. There are 32 initial pieces, so we have 192 bits total so far, where every 6 bits indicates the position of the given piece. We can pre-determine the order the pieces appear in, so we don't have to say which is which.
What if a piece is off the board? Well, we can place a piece on the same spot as another piece to indicate that it is off the board, since that would be illegal otherwise. But we also don't know whether the first piece will be on the board or not. So we add 5 bits indicating which piece is the first one (32 possibilities = 5 bits to represent the first piece). Then we can use that spot for subsequent pieces that are off the board. That brings us to 197 bits total. There has to be at least one piece on the board, so that will work.
Then we need one bit for whose turn it is - brings us to 198 bits.
What about pawn promotion? We can do it a bad way by adding 3 bits per pawn, adding on 42 bits. But then we can notice that most of the time, pawns aren't promoted.
So, for every pawn that is on the board, the bit '0' indicates it is not promoted. If a pawn is not on the board then we don't need a bit at all. Then we can use variable length bit strings for which promotion he has. The most often it will be a queen, so "10" can mean QUEEN. Then "110" means rook, "1110" means bishop, and "1111" means knight.
The initial state will take 198 + 16 = 214 bits, since all 16 pawns are on the board and unpromoted. An end-game with two promoted pawn-queens might take something like 198 + 4 + 4, meaning 4 pawns alive and not promoted and 2 queen pawns, for 206 bits total. Seems pretty robust!
===
Huffman encoding, as others have pointed out, would be the next step. If you observe a few million games, you'll notice each piece is much more likely to be on certain squares. For example, most of the time, the pawns stay in a straight line, or one to the left / one to the right. The king will usually stick around the home base.
Therefore, devise a Huffman encoding scheme for each separate position. Pawns will likely only take on average 3-4 bits instead of 6. The king should take few bits as well.
Also in this scheme, include "taken" as a possible position. This can very robustly handle castling as well - each rook and king will have an extra "original position, moved" state. You can also encode en passant in the pawns this way - "original position, can en passant".
With enough data this approach should yield really good results.
There are 32 pieces on the board. Each piece has a position (one in 64 squares). So you just need 32 positive integers.
I know 64 positions hold in 6 bits but I wouldn't do that. I'd keep the last bits for a few flags (dropped piece, queen'ed pawn)
I've thought about that one for a long time (+- 2hours). And there is no obvious answers.
Assuming:
... so up to date modern rules it is. First irrespective of repetition and move repitition limit.
-C 25 bytes rounded ( 64b+32*4b+5b= 325b)
=64 bits (something/nothing) +32*4 bits [ 1bit=color{black/withe} +3bit=type of piece{King,Queen,Bishop,kNight,Rook,Pawn,MovedPawn} NB:Moved pawn... e.g if it was the last moved pawn in the previous turn indicating that an 'en passant' is feasible. ] +5bit for the actual state (who's turn, en passant, possibility of rooking or not on each sides)
So far so good. Probably can be enhanced but then there would be variable length and promotion to take in consideration!?
Now, following rules are only applicable WHEN a player apllies for a draw, IT IS NOT automatic! So consider this 90 moves without a capture or a a pawn move is feasible if no player calls for a draw! Meaning that all moves need to be recorded... and available.
-D repetion of position... e.g. state of board as mentioned above (see C) or not... (see following regarding the FIDE rules) -E That leaves the complex problem of 50 move allowance without capture or pawn move there a counter is necessary... However.
So how do you deal with that?... Well really there is no way. Because neither player may want to draw, or realize that it has occurred. Now in case E a counter might suffice... but here is the trick and even reading the FIDE rules (http://www.fide.com/component/handbook/?id=124&view=article) I can't find an answer... what about loss of ability of rooking. Is that a repetition? I think not but then this is a blurred subject not addressed, not clarified.
So here is two rules that are two complex or undefined even to try to encode... Cheers.
So the only way to truly encode a game is to record all from start... which then conflict (or not?) with the "board state" question.
Hope this help... not too much math :-) Just to show that some question are not as easy, too open for interpretation or pre-knowledge to be a correct and efficient. Not one I would consider for interviewing as it open too much of a can of worm.
Great puzzle!
I see that most people are storing the position of each piece. How about taking a more simple-minded approach, and storing the contents of each square? That takes care of promotion and captured pieces automatically.
And it allows for Huffman encoding. Actually, the initial frequency of pieces on the board is nearly perfect for this: half of the squares are empty, half of the remaining squares are pawns, etcetera.
Considering the frequency of each piece, I constructed a Huffman tree on paper, which I won't repeat here. The result, where c
stands for the colour (white = 0, black = 1):
For the entire board in its initial situation, we have
Total: 164 bits for the initial board state. Significantly less than the 235 bits of the currently highest voted answer. And it's only going to get smaller as the game progresses (except after a promotion).
I looked only at the position of the pieces on the board; additional state (whose turn, who has castled, en passant, repeating moves, etc.) will have to be encoded separately. Maybe another 16 bits at most, so 180 bits for the entire game state. Possible optimizations:
c
bit, which can then be deduced from the square that the bishop is on. (Pawns promoted to bishops disrupt this scheme...)Like Robert G, I'd tend to use PGN since it's standard and can be used by a wide range of tools.
If, however, I'm playing a chess AI that's on a distant space probe, and thus every bit is precious, this is what I'd do for the moves. I'll come bock to encoding the initial state later.
The moves don't need to record state; the decoder can take keep track of state as well as what moves are legal at any given point. All the moves need to record is which of the various legal alternatives is chosen. Since players alternate, a move doesn't need to record player color. Since a player can only move their own color pieces, the first choice is which piece the player moves (I'll come back to an alternative that uses another choice later). With at most 16 pieces, this requires at most 4 bits. As a player loses pieces, the number of choices decreases. Also, a particular game state may limit the choice of pieces. If a king can't move without placing itself in check, the number of choices is reduced by one. If a king is in check, any piece that can't get the king out of check isn't a viable choice. Number the pieces in row major order starting at a1 (h1 comes before a2).
Once the piece is specified, it will only have a certain number of legal destinations. The exact number is highly dependent on board layout and game history, but we can figure out certain maximums and expected values. For all but the knight and during castling, a piece can't move through another piece. This will be a big source of move-limits, but it's hard to quantify. A piece can't move off of the board, which will also largely limit the number of destinations.
We encode the destination of most pieces by numbering squares along lines in the following order: W, NW, N, NE (black side is N). A line starts in the square furthest in the given direction that's legal to move to and proceeds towards the. For an unencumbered king, the list of moves is W, E, NW, SE, N, S, NE, SW. For the knight, we start with 2W1N and proceed clockwise; destination 0 is the first valid destination in this order.
Since the number of choices isn't always a power of two, the above still wastes bits. Suppose the number of choices is C and the particular choice is numbered c, and n = ceil(lg(C)) (the number of bits required to encode the choice). We make use of these otherwise wasted values by examining the first bit of the next choice. If it's 0, do nothing. If it's 1 and c+C < 2n, then add C to c. Decoding a number reverses this: if the received c >= C, subtract C and set the first bit for the next number to 1. If c < 2n-C, then set the first bit for the next number to 0. If 2n-C <= c < C, then do nothing. Call this scheme "saturation".
Another potential type of choice that might shorten the encoding is to choose an opponents piece to capture. This increases the number of choices for the first part of a move, picking a piece, for at most an additional bit (the exact number depends on how many pieces the current player can move). This choice is followed by a choice of capturing piece, which is probably much smaller than the number of moves for any of the given players pieces. A piece can only be attacked by one piece from any cardinal direction plus the knights for a total of at most 10 attacking pieces; this gives a total maximum of 9 bits for a capture move, though I'd expect 7 bits on average. This would be particularly advantageous for captures by the queen, since it will often have quite a few legal destinations.
With saturation, capture-encoding probably doesn't afford an advantage. We could allow for both options, specifying in the initial state which are used. If saturation isn't used, the game encoding could also use unused choice numbers (C <= c < 2n) to change options during the game. Any time C is a power of two, we couldn't change options.