Strategy Game Programming The Basics

The Internal Representation

Before we start with moves and search trees, we will need an internal representation of the position of our game. For the sake of simplicity, I will define a struct position p in such a way that all information about the position is included in it. An example for TicTacToe would be:
typedef struct 
	{
	int board[3][3];
	int color;
	} TTTPOSITION;
	
	// use it like this:
	TTTPOSITION p;	
Here the board is represented by a 3x3 array which will contain the values WHITE or BLACK, and an int color which says which side will move next, again WHITE or BLACK, which are #defined somewhere. A more sophisticated example would be a chess program which has to take more things into account.
typedef struct
	{
	int board[8][8];
	int color;
	boolean whitecancastleshort;
	boolean whitecancastlelong;
	boolean blackcancastleshort;
	boolean blackcancastlelong;
	int movestodraw;
	} CHESSPOSITION;
Here, the board and the color are again just integers. Additionally, in chess, we must remember if we can castle and how many moves are left until the 50-move-rule kicks in. Actually, this representation is not complete yet. For instance, we cannot check if this position has occurred three times in the game, and en-passant captures are also not yet built in.

Anyway, I do not want to talk too much about game-specific things, but rather about general algorithms. So let's just assume that we have some kind of representation of the game we want to program. However, it is important to keep an open mind about the position representation. For instance in checkers one can do much better than using a 8x8-array. Checkers is played on 32 squares and therefore a position representation with 32-bit words is interesting - use a 32-bit integer for black men, white men, black kings and white kings, respectively. You might ask: what's so good about this? The reason for doing it is simple: speed. With this kind of representation you can create much faster move generators than with an array. In a later part of this series, you can read how important speed is.

Move generation

Next, we need to be able to generate all legal moves for a certain position. I will assume that I have a function
int makemovelist(POSITION *p, MOVE list[MAXMOVES])
This function generates all possible moves and stores them in the array list. It returns the number of legal moves, n. Note that the position is passed by reference rather than by value, although we won't need to change anything in the position at all - passing structures by reference is much faster than passing them by value, because only a single pointer needs to be passed instead of all the real position data. For some games, a movelist with a fixed number of moves is quite appropriate, e.g. for connect 4 where you mostly have 7 possible moves. Sometimes, however, you might be wasting a lot of space with a fixed-size movelist - to avoid this problem, you can allocate a global move stack for your program. The move stack is just another array of moves, large enough to hold all movelists for the search path you are looking at. Hopefully this gets clearer in the search tree section below!

DoMove and UndoMove

Now that we can generate moves, we still need functions to do and undo moves. We use
domove(MOVE *m, POSITION *p)
to make a move m in the position p and
undomove(MOVE *m, POSITION *p)
to undo the move again. Obviously, the struct move must contain all information necessary to support these two operations. As always, the structures are passed by reference, in this case it is not only a speed question: the position will be modified by these two functions.

The Evaluation Function

The last thing we need before we can start tree search is an evaluation function. We are going to look a couple of moves ahead, and at the end of these moves we need to get an evaluation of the position. This will be
int evaluation(POSITION *p)
The evaluation function will return positive values if the position is good for white and negative values if the position is bad for white in the MiniMax formulation.
Many things could be said about evaluation functions, for me, the two main objectives in designing an evaluation function are speed and accuracy. The faster your evaluation function is, the better, and the more accurate its evaluation is, the better. Obviously, these two things are somewhat at odds: an accurate evaluation function probably is slower than a 'quick-and-dirty' one. The evaluation function I'm talking about here is a heuristic one - not an exact one. For games like checkers and chess, there are endgame databases, where positions with few pieces are listed as drawn, won or lost. This is obviously a completely accurate evaluation function! Normally, we deal with positions which we cannot evaluate correctly. For chess, an evaluation function would consist of a large term which describes the material balance, and many positional features, like doubled pawns, passed pawns, king safety, piece centralization and whatever else you might think of.

Tree Searching

The simplest way to search the game tree and also the most easy to understand is the MiniMax algorithm. It searches all possible moves up to a fixed depth, evaluates all resulting positions and uses these evaluations to track the score down to the root of the search tree. Here it is:
int minimax(POSITION  *p, int depth)
	{
	MOVE list[MAXMOVES];
	int i,n,bestvalue,value;
	
	if(checkwin(p)) 
		{
		if (p->color == WHITE) 
			return -INFINITY;
		else 
			return INFINITY;
		}

	if(depth == 0)	
		return evaluation(p);

	if(p->color==WHITE) 
		bestvalue = -INFINITY;
	else 
		bestvalue = INFINITY;
	
	n = makemovelist(p,list);
	if(n == 0) 
		return handlenomove(p);
	
	for(i=0; i<n; i++)
		{
		domove(&list[i],p);
		value = minimax(p,d-1);
		undomove(&list[i],p);
		if(color == WHITE) 
			bestvalue = max(value,bestvalue);
		else 
			bestvalue = min(value,bestvalue);
		}
  
	return bestvalue;
	}
The idea here is that both players will try all possible moves in their position and then choose, respectively, the one which makes the value of the position as high as possible (the white side) or as low as possible (black). I have called one color 'WHITE', this is the side which tries to maximize the value, and the other side tries to minimize the value. You can see that player 'WHITE' starts with a value of -INFINITY, and then goes on to try every move, and always maximizes the best value so far with the value of the current move. The other player, BLACK, will start out with +INFINITY and try to reduce this value. Note how I use a function checkwin(p) to detect a winning position during the tree search. If you only check winning conditions at the end of your variation, you can generate variations where both sides have won, for instance in connect 4 you could generate a variation where first one side connects four, and later the other side does. Also, note the use of handlenomove(p) - that's what you need to do when you have no legal move left. In checkers you will lose, in chess it's a draw.

If the (average) number of possible moves at each node is N, you see that you have to search N^D positions to search to depth D. N is called the branching factor. Typical branching factors are 40 for chess, 7 for connect 4, 10 for checkers and 300 for go. The larger the branching factor is, the less far you will be able to search with this technique. This is the main reason that a game like connect 4 has been solved, that checkers programs are better than humans, chess programs are very strong already, but go programs are still playing very poorly - always when compared to humans.

NegaMax

The normal MiniMax code is a bit clumsy, since one side is trying to maximize the value and the other is trying to minimize - therefore, with MiniMax we always have to check if we are the side trying to maximize or the side trying to minimize. A neat way to get rid of this and to have a simpler function is NegaMax. With the NegaMax algorithm both sides try to maximize all the time. NegaMax is identical to MiniMax, it's just a nicer formulation. Here's the basic NegaMax code:
int negamax(POSITION *p, int depth)
	{
	MOVE list[MAXMOVES];
	int i,n,value,bestvalue=-INFINITY;

	if(checkwin(p)) 
		return -INFINITY;
	
	if(depth == 0)	
		return evaluation(p);
	n = makemovelist(p,list);
	if(n == 0) 
		return handlenomove(p);
	
	for(i=0; i<n; i++)
		{
		domove(&list[i],p);
		value = -negamax(p,depth-1);
		undomove(&list[i],p);
		bestvalue = max(value,bestvalue);
		}
	return bestvalue;
	}
You can see that the NegaMax algorithm is shorter and simpler than the MiniMax algorithm. The point is that the call value = -negamax(p,d-1); takes care of the signs - or nearly. There is one further modification we must make for this code to work: The evaluation function must be sensitive to the side to move - for a position with white to move it must return its normal evaluation, for a position with black to move it must return -evaluation.

At first sight, NegaMax is a bit harder to understand than MiniMax, but it's in fact much easier to use. The side to move is always trying to maximize the value. NegaMax is no better or worse than MiniMax - it's identical. It's just a better framework to use.

AlphaBeta

The major improvement over MiniMax/NegaMax is the AlphaBeta algorithm: Here you realize that you don't have to go through the whole search tree. If you find one winning continuation, you don't have to look at any others. Similarly, if you have found one continuation which will give you the value V you can stop your search along another continuation if you find only one possibility for your opponent which gives you a lower score than V. You don't have to look at all the other possibilities your opponent might have - one refutation is enough! Here is the code for AlphaBeta, extending the earlier NegaMax code: It receives two extra parameters, alpha and beta. They define an interval within which the evaluation has to be. If it isn't, the function will return. Your first call to AlphaBeta will be with an interval -INFINITY...INFINITY; subsequent recursive calls to the function will make the window smaller.
int alphabeta(POSITION *p, int depth, int alpha, int beta)
	{
	MOVE list[MAXMOVES];
	int i,n,value,localalpha=alpha,bestvalue=-INFINITY;
	
	if(checkwin(p)) 
		return -INFINITY;

	if(depth == 0)	
		return evaluation(p);

	n = makemovelist(p,list);
	if(n == 0) 
		return handlenomove(p);
	
	for(i=0; i<n; i++)
		{
		domove(&list[i],p);
		value = -alphabeta(p,d-1,-beta,-localalpha);
		undomove(&list[i],p);
		bestvalue = max(value,bestvalue);
		if(bestvalue>=beta) 
			break;
		if(bestvalue>localalpha) 
			localalpha=bestvalue;
		}
	return bestvalue;
	}
Note how AlphaBeta receives the parameters alpha and beta which tell it what range the value of the current position should lie. Once a move has returned with a higher value than alpha, this best value is saved in the variable localalpha and used for the next recursive call of AlphaBeta. If the best value is larger than beta, the search terminates immediately - we have found a move which refutes the notion that this position has a value in the range from alpha to beta, and do not need to look for another one. Note how my AlphaBeta function is returning the highest value it found, this can be higher than beta. Some people prefer to return beta instead of the best value on a fail high, that formulation of AlphaBeta is known as fail-hard. My formulation above is called fail-soft. The names come from the fact that in fail hard, the bounds alpha and beta are "hard", the return value cannot be outside the alpha-beta window. It would seem that fail-soft is much more sensible, as it might lead to more cutoffs: If you can return a higher value than beta (or a lower value than alpha), then perhaps you might get a cutoff in a previous instance of AlphBeta at a lower level in the search tree that you wouldn't get otherwise. However, the fail-hard camp says they get less search instabilities when using advanced techniques such as pruning.

Putting things together

Now that we have all the basic building blocks for a 2-person strategy game program, we can put them all together. There are two issues I have not addressed - the way my tree searching functions are written they just return the evaluation of the position, but no move. The other issue is: how deep shall we search?

I usually define a function firstalphabeta which is an exact copy of alphabeta, except that it also returns a best move. The second issue is resolved with something called 'iterative deepening': First we search 1 ply deep, then 2 ply, then 3, and so on. Once the user interrupts the search we return the best move of the last iteration. Of course this can also be automated by measuring the elapsed time and returning once this is larger than some specified value.

Comments and questions are welcome!


[ Author homepage | Introduction | Part I | Part II | Part III | Part IV | Part V ]

This page was last modified on May 5th, 2005 using arachnoid