backgroundcolor=, basicstyle=
A General Upper Bound for the Runtime of a Coevolutionary Algorithm on Impartial Combinatorial Games
Abstract
Due to their complex dynamics, combinatorial games are a key test case and application for algorithms that train game playing agents. Among those algorithms that train using self-play are coevolutionary algorithms (CoEAs). However, the successful application of CoEAs for game playing is difficult due to pathological behaviours such as cycling, an issue especially critical for games with intransitive payoff landscapes.
Insight into how to design CoEAs to avoid such behaviours can be provided by runtime analysis. In this paper, we push the scope of runtime analysis for CoEAs to combinatorial games, proving a general upper bound for the number of simulated games needed for UMDA to discover (with high probability) an optimal strategy. This result applies to any impartial combinatorial game, and for many games the implied bound is polynomial or quasipolynomial as a function of the number of game positions. After proving the main result, we provide several applications to simple well-known games: Nim, Chomp, Silver Dollar, and Turning Turtles. As the first runtime analysis for CoEAs on combinatorial games, this result is a critical step towards a comprehensive theoretical framework for coevolution.
1 Introduction
Many of the most well-known games in the world are combinatorial games. Combinatorial games are typically perfect-information games played by two players without chance moves. The game has a finite number of possible positions, and players alternately take turns moving the game from one position to another, according to a set of rules describing which moves are legal. Combinatorial games are an exceptionally broad class of games, including famous games enjoyed the world over such as Chess or Go. Even those with simple rules can engender deep and complex strategic interactions between players. While this strategic depth is a key part of the appeal for human players, it can also render the task of computing a winning strategy to be extremely difficult. Indeed, games for which this task is known to be EXPTIME-complete (in terms of board size) include Chess [20], Go (with the ko rule) [53], and Checkers [54]. It is also known that determining an optimal strategy for a poset game (a class of combinatorial games which we will encounter in Section 6.4) is PSPACE-complete in terms of the size of the underlying poset [24]. (For further results, see [10].)
While classical methods are impractical for such cases, strong strategies can still be developed by using heuristic approaches, such as neural networks, Monte Carlo tree search, or genetic programming. Indeed, combinatorial games are a long-standing focus in the development of artificial intelligence, from Donald Michie’s seminal use of reinforcement learning on Tic-Tac-Toe [45], to Deep Blue’s famous matches against then-world Chess champion Garry Kasparov [9], to the recent groundbreaking results of DeepMind [60]. Many recent successes in this area train game playing agents using self-play, and among those self-play heuristics are coevolutionary algorithms (CoEAs) [51]. For a CoEA, self-play is realised through one or more evolving populations of individuals who compete against their contemporaries. In each iteration, the strongest individuals are selected based on their competitive interactions. Through genetic mutation and crossover, these strongest individuals are then used as parents for the individuals in the next iteration.
The successful application of CoEAs is deeply challenging, often due to the potential for games with intransitive payoff landscapes to induce cyclic behaviour [17]. For standard evolutionary algorithms, which apply similar methods to traditional optimisation problems, insight into how to avoid pathological behaviours can be provided by runtime analysis, which exists in great breadth and depth in literature and continues to be actively developed [12]. However, despite clear demand (see [51]), runtime analysis that addresses the challenges unique to CoEAs is far more limited. Indeed, while existing coevolutionary runtime analysis concerns a range of algorithms and design features, there are only three problem settings to which it so far applies: Bilinear, a game played on bitstrings whose outcome depends only on the number of -bits selected by each player [40, 31, 32]; Diagonal, a benchmark problem inspired by binary test-based optimisation [42]; and a class of symmetric zero-sum games with a payoff landscape that is globally very simple, but possibly locally intransitive [2]. Accordingly, our core research aim is to push the scope of runtime analysis for CoEAs towards games which feature more complex strategic interaction between players, and more closely reflect real-world games.
Motivated by the numerous empirical investigations into the topic (see Section 1.1), we focus our analysis on the use of CoEAs for combinatorial games, and in particular impartial combinatorial games. A combinatorial game is said to be impartial if both players share the same set of available moves at each game position [26]. It is common to also adopt the normal play convention, which assumes that a player loses if they have no legal moves available. For instance, consider (formally introduced in Section 6.1), in which the game positions are and a player must subtract either or from the position on their turn. A strategy may be encoded as a string of length , with entry indicating whether a or a is to be subtracted when the game position is . One way the game may play out is then: Player 1 Player 2 Turn 1: P1 subtracts 1. The new position is 5. Turn 2: P2 subtracts 2. The new position is 3. Turn 3: P1 subtracts 2. The new position is 1. Turn 4: P2 subtracts 1. The new position is 0. Turn 5: P1 has no legal moves. P2 is the winner.
(In fact, any strategy of the form 12*12* will always win this game, provided the corresponding player does not move first.)
The main result of this paper (Theorem 5.2 and Corollary 5.5) is the first runtime analysis for a coevolutionary algorithm on impartial combinatorial games. In broad terms, it says the following.
Theorem 1.1 (Corollary 5.5, informal version).
Let be the coevolutionary algorithm specified in Section 3, and let be an impartial combinatorial game with possible positions. Then, with high probability, discovers an optimal strategy for within game evaluations, where is a precisely defined invariant of the corresponding game graph.
We note that the notion of a game graph is defined in Section 2 and the invariant is defined in Section 4. For many games we find or , and so this result implies a range of polynomial and quasipolynomial runtimes. While it appears likely that the upper bound provided is higher than the true runtime for specific games, a major strength is that it is immediately applicable to any impartial combinatorial game. As we also provide an easy method for bounding above when its exact value is not obvious (see Proposition 4.4), deriving runtimes for well-known games is straightforward. Indeed, after distilling into a more concise form (Corollary 5.5), we will see applications to games including Nim, Silver Dollar, Turning Turtles, and Chomp.
To understand what is the significance of our result, it is helpful to first clarify what it is not. In no uncertain terms, this paper is not an account of a superior ready-to-use method for efficiently finding optimal strategies for combinatorial games. Strategies will here be encoded by exhaustively listing a preferred action for every possible game position, and thus the methods presented are necessarily at least linear in the number of game states, both in terms of memory and of time. With this naive representation, classical algorithms can already establish optimal strategies in time using Sprague-Grundy theory (see Section 2.1), which is best possible. However, many games are parameterised in such a way that the number of possible game positions grows exponentially (accordingly, we emphasise that Theorem 1.1 is not in contradiction with the aforementioned EXPTIME and PSPACE results). While the classical approach breaks down in such cases, a CoEA can still find success by replacing the exhaustive listing of actions with a model that maps features of the game position onto an action.
However, even when using the naive representation, our understanding of how to successfully apply CoEAs is very limited. If we wish to consistently apply CoEAs to advanced problems, whether they are rooted in game-playing or not, we must attain a comprehensive understanding of their behaviour in these simpler settings. Indeed, seemingly simple instances still produce payoff landscapes with features that make them difficult to optimise heuristically, such as intransitivity (as an example, in the already-introduced representation for , 111121 defeats 122112, which in turn defeats 12122, which in turn defeats 111121, regardless of who plays first).
Thus, the main contribution of this paper is precisely this: a first step towards a theoretical understanding of CoEAs on combinatorial games. This greatly expands the scope of rigorous runtime analysis available for coevolution (which so far does not apply to any turn-based game, let alone combinatorial ones), and additionally complements the abundance of existing empirical analysis, which we review in Section 1.1. While it remains a long term goal to push analysis towards more sophisticated representations, insights into algorithm design gained here still hold great relevance to coevolution in general. Furthermore, we believe our addition to the range of techniques available in this critical domain will in turn further the development of future runtime analysis of CoEAs.
Finally, we note here that the algorithm we analyse is a type of coevolutionary algorithm called an estimation of distribution algorithm (EDA), and moreover that this EDA applies to multi-valued decision variables (these notions are covered in Section 3). While not the main focus of this paper, there is only a small amount of preexisting analysis for EDAs operating over non-binary search domains, despite the clear utility of such algorithms. Our proof includes a detailed treatment of this setting, and may also provide methods useful in future analysis in this area.
In the remainder of this section, we review existing related work before stating notation. In Section 2 we give a more comprehensive discussion of impartial combinatorial games and review some Sprague-Grundy theory that will be relevant to our proof. In Section 3 we state the algorithm to which our result applies (UMDA), with an emphasis on its extension to multi-valued decision variables. In Section 4 we motivate and define the graph property appearing in Theorem 1.1, before then presenting the main result in Section 5. Following this, we apply the main result to a menagerie of selected impartial combinatorial games in Section 6.
1.1 Related work
Empirical analysis of coevolutionary algorithms for game playing. As game playing is a natural application for CoEAs, there have been a large number of empirical investigations into this topic, of which we can only list a small fraction here. In terms of impartial combinatorial games, Rosin and Belew [55] investigated the effect of using features such as fitness sharing and archives in CoEAs optimising a 4-pile instance of Nim, noting that Nim was a difficult coevolutionary problem despite lending itself to simple crossover-friendly representations. Additionally, Jákowski, Krawiec, and Wieloch [35] observed in relation to experiments on that intransitivity presents a strong challenge for CoEAs. Non-impartial (yet still almost symmetric) combinatorial games studied in the context of coevolution include Tic-Tac-Toe [35, 55], Backgammon [50], Othello [36, 63, 64], Senet [16], Checkers [5], Chess [18, 30], and Go [43]. More general game-playing applications include Pong [46], Bomberman [23], Poker [48], Resistance [39], as well as games invented to emulate real-world applications such as cyber security and defense [28, 41]. For a general survey, see [38].
Runtime analysis of coevolutionary algorithms. Until recently, the only existing coevolutionary runtime analysis result, due to Jansen and Wiegand [34], applied to a cooperative coevolutionary algorithm, which uses multiple populations to collectively solve traditional optimisation problems. The first runtime analysis applicable to competitive coevolution was established by Lehre [40], who showed that a population-based CoEA which selects using a pairwise dominance relation is able to approximate the Nash equilibrium of instances of a game called Bilinear in expected polynomial time. A key theoretical insight into algorithm design from the same paper was the identification of an error threshold for mutation rate, above which no CoEA can efficiently optimise Bilinear. Further runtime analysis for CoEAs on Bilinear has concerned the roles played by fitness aggregation methods [31] and archives [32] in algorithm behaviour. Inspired by promising applications of CoEAs for optimising binary test-based problems, Lin and Lehre [42] provided runtime analysis establishing the benefit of using a CoEA over a traditional EA for optimising a benchmark problem called Diagonal. In [2], Benford and Lehre considered the importance of maintaining a diverse set of opponents when coevolving game strategies, showing that any CoEA able to retain only one individual between generations cannot efficiently find optimal strategies on a certain class of symmetric zero-sum games, even though with high probability a coevolutionary EDA finds an optimal strategy in polynomial time.
1.2 Notation
Given a finite set , a probability distribution over is a function satisfying . We say that an -valued random variable is distributed according to , written , if holds for every . Given also a subset , we write . Given a number we use to denote the set of probability distributions over satisfying for every , and we also write .
A rooted directed graph is a triple , were is a vertex set, is a function mapping each vertex onto its out-neighbourhood, and is a distinguished root vertex. Throughout we will assume all directed graphs are acyclic. We write for the set of edges of and for the maximum degree of . A directed path in is a sequence of vertices such that for each . For a path we have . If has no out-neighbours, then we say is a sink. We use to denote the set of non-sink vertices of (the interior vertices).
All logarithms are the natural logarithm unless stated otherwise, and given we write .
2 Impartial combinatorial games
Let us briefly review the representation of impartial games via directed graphs and some Sprague-Grundy theory (see, for example, [27, 47]). An impartial combinatorial game is a finite acyclic rooted directed graph (see Section 1.2), where is a vertex set of size , and is the initial game position. Players take it in turns to move the current position to one of its out-neighbours. We adopt the convention that if a player is unable to make a move because the current position has no out-neighbours (i.e., it is a sink), then that player loses. This is usually referred as the normal play convention. We will also always assume that for each , there is a directed path from to , so that every game position is reachable.
We will encode strategies for impartial combinatorial games as an assignment of each non-sink game position to an element of (that is, an out-neighbour of ), with this assignment indicating the preferred move at each game position. Formally, recalling that denotes the set of with , then
will be the set of strategies for . Note that an element may be regarded as a mapping , and so we will write for the image of a position under this mapping. This formulation coincides closely with that featured in the aforementioned work of Richie on reinforcement learning for optimal Tic-Tac-Toe play [45], and has similarities to subsequent ‘move selector’ representations which identify a preferred action based on the current game position using, for example, genetic programming [23], neural networks [43, 46], or a game-specific mapping [48, 39]. However, it stands distinct from ‘state evaluator’ representations which play by evaluating board positions, whether by recording evaluations for all possible positions [35, 55], genetic programming [29, 16, 30], neural networks [50, 64, 5, 18], or otherwise.
As is typical for the uses of coevolution for gameplaying discussed in Section 1.1, players receive a payoff depending only on whether the final outcome of the game was win or lose. Accordingly, let be the payoff function for , where indicates that wins against and indicates that loses against (where makes the first move). Precisely, if we recursively define for ,
then . It will also be convenient to define for ,
We will always assume that there is some such that for every (i.e., that the first player has a winning strategy for ). Indeed, if this is not the case, then the second player has a winning strategy, and so we can add a fictitious initial position to with to obtain a game equally challenging as but with a winning strategy for the first player. We thus define the set of optimal strategies for to be
and remark that the above assumption implies that will always be non-empty.
2.1 The Sprague-Grundy function
First introduced independently by Sprague [61, 62] and Grundy [25], the Sprague-Grundy function of an impartial combinatorial game is a function mapping game positions onto non-negative integers, which contains information about the game’s strategic landscape and how optimal play is affected when building new games out of smaller ones [19]. Formally, given , the Sprague-Grundy function is defined recursively. First, all sink vertices are given the value . Then, once all out-neighbours of have a value assigned, we define
where denotes the smallest non-negative number not in a finite set (the ‘minimum excluded integer’).
Given , if the current position is then the player making the next move has a winning strategy if and only if . Accordingly, if then the player making the next move will always lose against an opponent who plays optimally. Thus, victory can be assured for the player making the first move by always choosing to move to vertices in . Because this happens automatically whenever , an optimal strategy can be guaranteed by learning optimal moves at a set (which we refer to as critical positions) defined in the following way.
Definition 2.1.
Given an impartial combinatorial game , let
where is the Sprague-Grundy function for .
The following lemma formalises this notion in a general form that will be useful to quote later.
Lemma 2.2.
Let denote the Sprague-Grundy function of a combinatorial game . Let be an ordering of such that for every . Then, the following holds for every .
-
A1
If and satisfies for every , then holds for every .
-
A2
If and satisfies for every , then holds for every .
In particular, with our assumption that the first player always has a winning strategy for , if satisfies for every , then .
Proof.
We prove that the conditions A1 andA2 always hold by induction on . For the case , note that we must have (as ) and for any . For the inductive stage, there are two cases to consider. First, if and satisfies for every , then because we must have for any , and hence
On the other hand, if and satisfies for every , then in fact (for this holds by default if ), and so for any ,
as required. ∎
Note that the final conclusion of Lemma 2.2 is a sufficient condition, but not a necessary condition, as demonstrated by Figure 1.
3 UMDA
Rather than storing a population as a set of points in the search space, as is the case for most EAs, an estimation of distribution algorithm (EDA) represents its population as a probability distribution over the search space [49]. Whereas most algorithms sample candidates for selection from their current population uniformly at random, an EDA instead samples from its probability distribution. After selection has been completed, the selected individuals are then used to update the probability distribution for the next generation. Much of the existing runtime analysis for EDAs (see [8, 11, 13, 65, 66]) has emphasised the benefit provided by a high level of diversity among generated search points. This is also the case in the recent first runtime analysis of a coevolutionary EDA [2], wherein the difficulty presented by locally intransitive payoff landscapes could be provably averted by evaluating strategies against a diverse set of opponents. As intransitivity is also apparent in impartial combinatorial games, a coevolutionary EDA is a good candidate for a first runtime analysis on this topic too.
Most existing theoretical analysis of EDAs concerns those operating over bitstrings – that is, is the search domain. However, as outlined in Section 2, our formulation of strategies gives rise to a more complicated search domain. For a parent set , we are considering search domains of the form , where is an indexing set and for each . Given a tuple , let denote the probability distribution over such that if then for any ,
so that the distribution of is that of an independent univariate sampling for each . For notational convenience, given a tuple we will often write for and ,
The coevolutionary EDA we consider will represent its current population as an element , with individuals being generated according to . In the case where and for each , we recover the standard framework for univariate EDAs operating over bitstrings. For these EDAs, the tuple is often represented as a frequency vector , where is the probability that has a -bit in position . A common feature for EDAs operating over bitstrings is to constrain these frequencies to the interval for some small at the end of each generation. For the general case, where we track a tuple , we need to constrain each to the set . To achieve this, we adopt the following minor variation of the multi-valued EDA framework proposed by Ben Jedidia, Doerr, and Krejca [1]. Given and , let
Let then be the function given by
For the case the definition reduces to , and so this model fits the usual method for constraining univariate EDAs over bitstrings.
Despite some differences in notation, the function is nearly identical the restriction described in [1]. In the context of [1, Section 4.2], our only modification is to forego an initial clamping of probabilities to the interval , as the upper border of is already implied by the fact that the remaining steps produce an element of . Indeed, an actual difference between the two methods only arises for inputs satisfying , and even in such cases the difference is not significant.
The fact that always outputs an element of is verified by B1 in the following lemma, which also establishes several further properties of which will be useful for our later proofs.
Lemma 3.1.
Let , , and be as defined in Section 3. Then, the following properties hold.
-
B1
For any , .
-
B2
If , then .
-
B3
For any , and , .
-
B4
For any and , .
Proof.
We first note that the definitions of and imply that for any and ,
| (1) |
Because it immediately follows from (1) that
| (2) |
With these observations, we are now ready to prove the desired properties.
B1: If , then setting and we have
A description of the algorithm we analyse is now provided by Algorithm 1, which effectively generalises the version appearing in [2] (which applied only to bitstrings and omitted the step involving ). Note that due to the use of in line 23, we always have .
A key step towards analysing the performance of Algorithm 1 on impartial combinatorial games is understanding the distribution of a selected individual . This will be handled by the following lemma. Its conclusion gives an exact expression for how the probability a selected individual would choose to move from to compares to the probability a sampled individual would choose to move from to (where a selected individual is simply the winner of a game played between two independent sampled individuals).
Lemma 3.2.
Let be an impartial combinatorial game, and let . Suppose that are independent, and
Then, for any and ,
| (3) |
For an intuition behind (3), the comparative factor has effectively three terms (here interpreted in the context of Algorithm 1):
-
•
, the probability the algorithm encounters position ;
-
•
, the probability that is observed as a winning position; and
-
•
, the probability that is observed as a losing position.
If it is likely for to be observed as a losing position, but unlikely for to be observed as a winning position, then it is beneficial to deliberately move from to (placing your opponent in a likely losing position) rather than play out with whatever the current strategy is from (where you are unlikely to win), thus incurring an increase in the prevalence of among selected individuals. On the other hand, if the reverse is true, then it is beneficial to deliberately avoid moving from to and instead play out normally, thus incurring a decrease in prevalence of . This helps motivate the effect of the term . The magnitude of this effect scales with the relative frequency with which is encountered as a game position, which corresponds to .
Proof of Lemma 3.2.
First, we will introduce some notation to assist with this proof. Let us write , , and . Let us also write
and note that , , and are pairwise disjoint sets. Finally, if we have when regarded as a directed path (where here and throughout we drop the subscript from to simplify notation), then we will define
Note that because is the disjoint union of and , we have
| (4) |
The event can be written as the disjoint union of the following six events.
Let us examine the probability of each of these events occurring. For , the event can be determined using only and , and so is independent of the event . Similarly, in the event is independent of the event . Therefore,
| (5) |
For , the event can be determined using only and , and the event can be determined using only and . Therefore, all three component events in are independent of each other. The same is also true of . Therefore, noting that , we can write
| (6) |
For , the event can be determined using only and , and the event can be determined using only and . Therefore, all three component events in are independent of each other. The same is also true of . Therefore, noting that , we can write
| (7) |
We can now combine these observations to obtain
as required. ∎
As an aside, we note here a parallel with evolutionary game theory. Consider the discrete time replicator equation with nonlinear payoff functions (see (2.1) of [57]; also [33] for the more standard continuous and linear versions),
| (8) |
where we interpret as the proportion of type in a population and as the fitness of a type individual. The following proposition demonstrates that by identifying and appropriately, (3) can be seen to be of the form provided by (8).
Proposition 3.3.
Proof.
First, note the identity
| (9) |
Therefore,
as required. ∎
In this sense, when executing Algorithm 1 on a game , the evolution of the distribution at each vertex of stochastically emulates these replicator dynamics. However, a key difference is that in standard evolutionary game theory, each fitness function typically depends only on the distribution of types in the population; whereas the expression depends on the distribution of ‘types’ not just at the node , but also at possibly all other nodes as well, and so the dynamics of each node cannot be considered in isolation.
4 Switchability
In Section 1 we noted that our main result implies a probabilistic upper bound of on an impartial combinatorial game, where is an (often small) invariant of the corresponding game graph. In this section, we define this invariant and prove a key lemma.
Rather than defining this property, which we call switchability, for a game as a whole, we will actually define switchability as a property of each vertex in the game’s vertex set . Then later we will take (see Corollary 5.5). Intuitively, measures the ‘smallest’ possible set of edges such that any pair of strategies satisfying and must also satisfy . The motivation is that if for some , then is assured by having and take certain values at the vertices appearing at the tail of some edge in , which occurs with probability at least . A property that places a lower bound on in such a way will be very useful as we seek to apply Lemma 3.2 later.
In the description above, a naive approach would be to take ‘smallest’ to simply mean having fewest edges. However, while this gives a working definition, when then bounding below, it is clear that significant improvements can be made in many cases. Consider the example shown in Figure 2. Our naive approach suggests that if for some , then . However, it would be better to observe that , as visiting can be assured a single choice to move to made by the player who makes the first move after reaching the layer (in this case, always the player ).
To better capture this notion, we will not take ‘smallest’ to mean fewest edges, but rather smallest depth, defined in the following way (we recall here that all graphs are assumed to be acyclic).
Definition 4.1.
Given a set of edges , we define the depth of to be
With this, the full description of switchability is provided by the following two definitions.
Definition 4.2.
Given a set , we (inductively) say that a directed path is -compatible if any of the following conditions hold.
-
C1
.
-
C2
is -compatible and .
-
C3
is -compatible and there is no such that .
Then, given a vertex , we say that is a -switcher if is contained in every -compatible directed path with .
Definition 4.3.
The switchability of a vertex is the smallest possible depth of a -switcher. We will also write .
Thus, while the set shown in Figure 2 has 5 edges, it has , and so in that case we have . Figure 3 shows two further illustrations of switchability. For certain games, constructing a small -switcher is quite straightforward (see proof of Proposition 6.1 later); in other cases where determining switchability is not obvious, the following upper bound may be used instead.
Proposition 4.4.
If there is a directed path of length from to , then .
Proof.
If is a directed path from to , then every -compatible path with includes as a prefix, and hence also includes . Thus, is a -switcher of depth . ∎
To complete this section, the required lower bound on is provided by the following lemma. Note that as well as improving the naive approach by using instead of , we also deduce a result of instead of by carefully accounting for the fact that at most one player can visit each possible game position (due to the previous assumption that the impartial combinatorial games considered are acyclic).
Lemma 4.5.
Suppose that and . Then for every , .
Proof.
The distribution of is the same as the random set produced by the following process.
-
1.
Initially, set .
-
2.
For , do the following.
-
(a)
If , then set .
-
(b)
Otherwise if , sample .
-
(a)
We will generate an instance of the above process in a very specific way using a collection of independent random variables. First, let be a -switcher of depth , and let . Next, for each , let be any function satisfying the following properties.
-
D1
If , then for every .
-
D2
If and , then .
Note that this is possible because holds for every and . The modified process is then as follows.
-
0.
Let be independent random variables.
-
1.
Initially, set .
-
2.
For , do the following
-
(a)
If , then set .
-
(b)
If , then set and .
-
(c)
If , then set and .
-
(a)
From D1 it follows that has the same distribution as , and hence also as .
We now claim that whenever . The key observation is that under this regime, the first visits that makes to must be followed by an edge in . Let us label . We will show by induction on that is -compatible for every , noting that the case holds because . For the inductive step, if is -compatible, then the only way for to not be -compatible is to have and . But then, because the first visits that makes to are followed by an edge in , we can infer that already includes at least edges in . Letting be such that , we then have that is a directed path with , a contradiction to the depth of . So in fact the inductive step holds, and is -compatible for every . In particular, is -compatible, and hence .
Thus, whenever , and hence
as required. ∎
5 Main result
In order to state runtime results, we adopt the standard black box convention where runtime is defined as the number of times a function is queried until the algorithm reaches the desired search objective (see [14]), as follows.
Definition 5.1.
Suppose that is an impartial combinatorial game, and that is an algorithm which makes queries of during each generation. Then, given a set , the runtime of on is defined to be the random variable
where is the population of at the start of generation . (If the game is clear from context, we will write instead of .)
Our main result is now provided by Theorem 5.2. In simple terms, it states that if Algorithm 1 is executed on an impartial combinatorial game using a sufficiently large population size , then with high probability its runtime is at most , where is a formula of the game graph expressed in terms of its number of vertices , maximum degree , and a summation involving the switchability (Definition 4.3) at each critical position (Definition 2.1). Notably, is increasing in each of , , and , indicating that games for which these quantities are high may be the most difficult to optimise. We remark that the exact parameter settings for Algorithm 1 appearing in the statement have not been chosen to guarantee an optimal runtime, but rather to make the proof more comprehensible.
Theorem 5.2.
There is a constant such that the following holds. Let be an -vertex impartial combinatorial game with maximum degree , and let . Let , and let be described by Algorithm 1, where and
| (10) |
Then,
The asymptotic behaviour of the runtime bound may not be immediately obvious from the form stated here. Accordingly, we will shortly provide an easier to digest corollary using the facts and to remove the role of and the corresponding summation. For many games (including the applications considered later), this simplified bound has the same asymptotic behaviour as the one provided by Theorem 5.2. Nonetheless, as it is possible to construct games for which Theorem 5.2 offers significant improvement of the simplified bound, we opt to retain the more general form above.
We will now briefly provide some intuition for the proof of Theorem 5.2. As characterised by Lemma 2.2, we know that any strategy that makes the correct decision at every critical position is an element of (where here, making a correct decision means ensuring has a Sprague-Grundy value of ). Thus, we consider the sequence appearing in Algorithm 1, and estimate the time until the algorithm arrives at some such that, with high probability, an makes the correct decision at every critical position. The progress to arrive at such a is effectively broken down into steps: fixing an ordering of such that for every (a reverse topological ordering), step finishes when, with high probability, an makes the correct decision at the first critical positions appearing in the ordering. Bounding the length of time to complete step is accomplished by combining Lemmas 2.2, 3.2, and 4.5 to show that if sampled individuals are usually making the correct decision at the first critical positions in the ordering, then the algorithm has a bias towards retaining individuals who also make the correct decision at the next critical position in the ordering. Note that this step-by-step process does not appear explicitly in the proof, but is implicit from the definition of a function measuring progress towards the optimality condition (see (12)).
Proof of Theorem 5.2.
First, let us introduce some further notation. Given a set we will write . Recalling that Algorithm 1 ensures that at every step, we will write . Let denote the Sprague-Grundy function, and let and . We also will assume that , as any impartial combinatorial game with satisfies (such games satisfy , and so in fact in these cases).
Let be an ordering of such that for every (note that such an ordering exists as is assumed to be acyclic). Let us write for the set of such that for all . If for some generation we have , then for every we have
| (11) |
Recalling from Lemma 2.2 that if then for some , we can deduce
and hence . In particular, if then .
We will define a map that will measure progress towards . To do this, first let be given by
so that is a monotone increasing function. Then, given , let and define
| (12) |
Define also , and note that if and only if . The motivation for the function is that the value of increases as moves through towards . Indeed, the first term of (12) is a summation depending on only; its role ensures that whenever and , and hence increases true to the sequence . The second term measures progress within some as we move towards (it increases as the value of decreases toward ).
Denote . We will later show the following two claims, where the second is a direct consequence of the first.
Claim 5.3.
If and , then
Claim 5.4.
If , then .
Claim 5.4 asserts that if has not yet reached , then we should expect the value of to increase by at least in the next generation. However, cannot increase by at least more than times. Precisely, if then we must have and for every . In particular, it would then hold that for some . Therefore, using a union bound with Claim 5.4, we have
| (13) |
Noting (using Lemma A.3) that
| (14) |
we can bound
| (15) | ||||
| (16) |
We now have
as required. Therefore, all that remains is to prove Claims 5.3 and 5.4.
Proof of Claim 5.3.
Assume are independent. To assist with this claim, we will introduce some further notation. Let , and note that from Lemma 4.5 we have
| (17) |
Given , let us write as a shorthand for the event , and note that because and are independent and identically distributed,
| (18) |
Finally, let us also write and .
Given , we wish to consider , where is the winner of the game played between and (as in Lemma 3.2). This will be useful, as the individuals selected in lines 10-17 of Algorithm 1 are independent and with the same distribution as , and hence for every ,
| (19) |
To analyse , first note that we have
| (20) |
Therefore, applying Lemma 3.2 with ,
| (21) |
In particular, we also have
| (22) |
Next, we would like to place some simple bounds on for . If satisfies , then by using the fact that ,
On the other hand, if satisfies , then by using the fact that ,
In summary, we have
| (23) | ||||
| (24) |
Finally, we will apply Corollary A.2 to establish that certain events occur with very low probability. A straightforward numerical manipulation we will use after each application is that for every , because ,
| (25) |
We now complete the proof of the claim by dividing into two cases. Note that the properties E1-E3 and F1-F2 quoted hereafter are from the results of Section A.
Case 1: . If , then
| (26) |
By using (19) and (26) to apply E2 with , it holds for any fixed that
Therefore, by taking a union bound over , it occurs with probability at least that
| for every , | (27) |
and so we proceed under the assumption that this occurs. Note that this automatically gives us for any that
| (28) |
So if then and hence . On the other hand, if then there is some such that , and hence
| (29) |
In particular, this would then imply that
Combining the cases and shows that the event holds with probability at least .
Case 2: . If , then
| (30) |
By using (19) and (30) to apply E1 with , it holds for every that
By using (19) and (22) to apply E3 with , it holds for every that
Therefore, by taking a union bound over and also , it occurs with probability at least that
| for every , | (31) | |||
| for every , | (32) |
and so we proceed under the assumption that this occurs. Recalling that and the assumption that , we can now bound above as
| (33) |
Hence, using that for every ,
| (34) |
This would then imply that
Thus, the event holds with probability at least . ∎
Proof of Claim 5.4.
Suppose that , so that . For every with , it follows from the fact that that and hence
| (35) |
Let so that . For , let be the event that
We will now show that if holds for every , then . Indeed, if holds for every , then it follows from (35) that for every with , and hence . If additionally , then is immediate from (12). On the other hand, if , then implies and hence,
Therefore, using a union bound we have
as required. ∎
∎
For many applications, rather than applying Theorem 5.2 directly it will be convenient to use the following corollary.
Corollary 5.5.
There is a constant such that the following holds. Let be an impartial combinatorial game with maximum degree , and let . Let , and assume uses parameters and . Then,
Proof.
6 Applications
In this section we will apply Theorem 5.2 to obtain several runtimes for Algorithm 1 on a number of well-established combinatorial games. Throughout, we state runtimes in terms of , the number of possible game positions, and always assume that is described by Algorithm 1. All described games are played under the normal play convention (that a player unable to move loses), as established in Section 2.
6.1 Subtraction Nim
Nim is a strategic game in which players take turns removing items from distinct heaps. Variants have been played across cultures since ancient history [56, 67], and it was also the game of choice for some of the earliest machines and computers dedicated to game playing [37, 44, 52]. Nim is also perhaps the most important impartial combinatorial game from a mathematical perspective, with the Sprague-Grundy theorem establishing that, for a particular formulation of equivalence which characterises strategic continuation, every position in any impartial combinatorial game is equivalent to some position of a one-heap game of Nim [7].
While the version central to combinatorial game theory typically allows players to remove any positive number of items on their turn, here we consider the well-known one-heap variant in which there is an upper limit on the number of items that can be taken at once (see, for example, [19]). Given parameters and , begins with an initial heap of items, and on each turn a player may remove between and items from the heap. The game graph for is shown in Figure 3. This game constitutes the simplest example of a subtraction game [26] of also a take-away game [59], both of which are expansive and well-studied classes of impartial combinatorial games. We have the following polynomial runtime for .
Proposition 6.1.
satisfies and . Thus, for each there exists such that for appropriately chosen parameters in Algorithm 1,
Proof.
For we have , , and . Note that .
We need to verify that for every . Given , let . We have , as any directed path in can visit at most once. To see that is a -switcher, suppose that is an -compatible directed path from to . Because at most items are removed on each turn, there is some such that . But then we either have or, in order for to remain -compatible, and hence . In either case, we deduce that lies on every -compatible directed path from to . Thus, is a -switcher of depth , and hence .
From this, we have that . Combined with the observation that , the result then follows from Corollary 5.5. ∎
6.2 Silver Dollar
We consider the variant of Silver Dollar played without the eponymous silver dollar [7, 15, 26]; however, it should be noted that Theorem 5.2 also implies a similar polynomial runtime for the original version of Silver Dollar attributed to de Bruijn (see also [7]).
Given parameters and , is played using coins on a horizontal strip of squares, with the coins initially placed on the rightmost squares (most descriptions actually have the coins placed on arbitrary starting squares, however this does not significantly affect our analysis). A turn consists of moving one coin leftwards any number of spaces, provided the coin does not go past any other coins. In addition, coins may never occupy the same square. Assuming is a fixed constant, the number of game positions is . We have the following polynomial runtime for .
Proposition 6.2.
Let be fixed. satisfies and . Thus, for each there exists such that for appropriately chosen parameters in Algorithm 1,
Proof.
On each turn, for each empty square there is at most one possible move that places a coin onto that square. Therefore, . Next, any possible game position can be reached from the starting position in at most moves (simply move each coin in order from left to right onto the required square). Therefore, using Proposition 4.4, we have . The required result then follows from Corollary 5.5 using these bounds on and . ∎
6.3 Turning Turtles
Here we consider one instance of a large class of coin turning games [3, 26]. Given a parameter , is played using a row of coins, initially all showing heads. A turn consists of turning over one coin from heads to tails, and then optionally turning over one more coin anywhere to the left of that one (regardless of whether it is showing heads or tails). Play continues until all coins show tails. Noting that the total number of game positions is , we have the following quasipolynomial runtime for .
Proposition 6.3.
satisfies and . Thus, for each there exists such that for appropriately chosen parameters in Algorithm 1,
Proof.
On each turn, there are at most possible moves that turn over only one coin and at most possible moves that turn over two coins. Therefore, . Next, any possible game position can be reached from the starting position in at most moves (simply turn over the required coins from heads to tails one by one). Therefore, using Proposition 4.4 we have . Noting that , and hence
the required result then follows from Corollary 5.5. ∎
6.4 Chomp
Since its introduction by Schuh [58] and later by Gale [21], Chomp has inspired a great deal of theoretical and empirical analysis, as well as numerous variants incorporating, for example, graphs and simplicial complexes [22]. While typically played on any rectangular board, we focus on square instances for the sake of conciseness.
Given a parameter , is played on an board. A turn consists of removing one square, as well as all squares to the right and above. However, if a player removes the square in the lower-left corner (the ‘poison’ square) they immediately lose. Note that to instantiate this game under our normal play convention, we can make removing the lower-left corner fatal by simply removing the position that has no remaining squares. We can establish the following quasipolynomial runtime for .
Proposition 6.4.
satisfies and . Thus, for each there exists such that for appropriately chosen parameters in Algorithm 1,
Proof.
In each possible game position, every row must be at least as long as the row above it. In particular, there is a correspondence between game positions and lattice paths (i.e., paths that only move right and down along the squares’ edges) from the top-left corner to bottom-right corner, with the path marking out the boundary of the remaining squares. Using stars and bars counting (see [4, Theorem 8.5.1] for a full treatment) and removing the position that has no remaining squares, the total number of game positions is . On each turn, there are at most moves available, and so . Next, any possible game position can be reached from the starting position in at most moves (simply make the appropriate chomp row by row working from top to bottom). Therefore, using Proposition 4.4, we have . Thus, as with Proposition 6.3, we have and where , and so the result follows. ∎
7 Concluding remarks
We conclude with some brief remarks about the main result and future work.
In order to accommodate the high degree of generality in Theorem 5.2, the proof makes a number assumptions about the route taken to the search objective. A notable one is that, if is the next critical position to be optimised, or one that has already been learned, then the probability that is encountered in a game played out by sampled individuals is bound below by . Lemma 4.5 demonstrates that analysis of is a major contribution to the eventual runtime, serving a role akin to a dynamic learning rate for the algorithm at position . A key insight is that encountering a large range of game positions by evaluating diverse sets of opponents is essential to an algorithm’s success. However, it is apparent that the general bound could be greatly improved through closer analysis of coevolutionary dynamics, especially for specific games. For example, if individuals often misplay at a winning position , opponents should begin to exploit this by steering the game towards ; the resulting feedback mechanism between and can assist more efficient learning.
A related assumption is that game positions are optimised sequentially, moving from the end of the game and working backwards, not unlike a recursive computation of the Sprague-Grundy function. However, this is not the route to optimality we would expect CoEAs to adopt for all games (consider Figure 4, where UMDA would naturally optimise starting from and working forwards). Moreover, because Lemma 2.2 is not a necessary condition, there is potential for CoEAs to demonstrate bias towards learning simpler elements of without the need to implicitly deduce all zeros of the Sprague-Grundy function (for example, when played on a square board, there is an optimal strategy for Chomp that can be described by specifying an action at only of the game positions).
In future work, we aim to provide more detailed analysis related to both of the above assumptions in order to provide stronger runtime results on classes of impartial combinatorial games. A longer term goal is the development of runtime analysis applicable to game representations that are practical even for games with exponentially many positions, such as in situations encountered in genetic programming.
References
- [1] F. Ben Jedidia, B. Doerr, and M. S. Krejca. Estimation-of-distribution algorithms for multi-valued decision variables. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’23, page 230–238, 2023.
- [2] A. Benford and P. K. Lehre. Runtime analysis of coevolutionary algorithms on a class of symmetric zero-sum games. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’24, page 1542–1550, 2024.
- [3] E. R. Berlekamp, J. H. Conway, and R. K. Guy. Winning Ways for Your Mathematical Plays, volume 3. A. K. Peters, 2003.
- [4] R. A. Brualdi. Introductory Combinatorics. Pearson, 5th edition, 2009.
- [5] K. Chellapilla and D. Fogel. Evolving neural networks to play checkers without relying on expert knowledge. IEEE Transactions on Neural Networks, 10(6):1382–1391, 1999.
- [6] F. Chung and L. Lu. Concentration inequalities and martingale inequalities: a survey. Internet Mathematics, 3(1):79 – 127, 2006.
- [7] J. H. Conway. On Numbers and Games. A.K. Peters, 2nd edition, 2001.
- [8] D.-C. Dang and P. K. Lehre. Simplified runtime analysis of estimation of distribution algorithms. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, GECCO ’15, page 513–518, 2015.
- [9] D. DeCoste. The significance of Kasparov versus DEEP BLUE and the future of computer chess. ICGA Journal, 21(1):33–43, 1998.
- [10] E. Demaine and R. Hearn. Playing games with algorithms: Algorithmic combinatorial game theory. In Games of No Chance 3, volume 56 of Mathematical Sciences Research Institute Publications, pages 3–56. Cambridge University Press, 2009.
- [11] B. Doerr. The runtime of the compact genetic algorithm on jump functions. Algorithmica, 83(10):3059–3107, 2021.
- [12] B. Doerr and F. Neumann. A survey on recent progress in the theory of evolutionary algorithms for discrete optimization. ACM Transactions on Evolutionary Learning and Optimization, 1(4), oct 2021.
- [13] S. Droste. A rigorous analysis of the compact genetic algorithm for linear functions. Natural Computing, 5:257–283, 2006.
- [14] S. Droste, T. Jansen, and I. Wegener. Upper and lower bounds for randomized search heuristics in black-box optimization. Theory of Computing Systems, 39(4):525–544, 2006.
- [15] G. Farr and N. B. Ho. The Sprague–Grundy function for some nearly disjunctive sums of nim and silver dollar games. Theoretical Computer Science, 732:46–59, 2018.
- [16] G. Ferrer and W. Martin. Using genetic programming to evolve board evaluation functions. In Proceedings of 1995 IEEE International Conference on Evolutionary Computation, volume 2, pages 747–752, 1995.
- [17] S. G. Ficici. Solution concepts in coevolutionary algorithms. PhD thesis, Brandeis University, 2004.
- [18] D. Fogel, T. Hays, S. Hahn, and J. Quon. A self-learning evolutionary chess program. Proceedings of the IEEE, 92(12):1947–1954, 2004.
- [19] A. S. Fraenkel. Scenic trails ascending from sea-level nim to alpine chess and back. In Games of No Chance, volume 29 of Mathematical Sciences Research Institute Publications, pages 13–42. Cambridge University Press, 1996.
- [20] A. S. Fraenkel and D. Lichtenstein. Computing a perfect strategy for n × n chess requires time exponential in n. Journal of Combinatorial Theory, Series A, 31(2):199–214, 1981.
- [21] D. Gale. A curious nim-type game. American Mathematical Monthly, 81:876–879, 1974.
- [22] I. García-Marco, K. Knauer, and L. P. Montejano. Chomp on generalized Kneser graphs and others. International Journal of Game Theory, 50(3):603–621, 2021.
- [23] R. Gold, H. Branquinho, E. Hemberg, U.-M. O’Reilly, and P. García-Sánchez. Genetic programming and coevolution to play the Bomberman™ video game. In Applications of Evolutionary Computation, pages 765–779, 2023.
- [24] D. Grier. Deciding the winner of an arbitrary finite poset game is PSPACE-complete. In Automata, Languages, and Programming, pages 497–503, 2013.
- [25] P. M. Grundy. Mathematics and games. Eureka, 2:6–8, 1939.
- [26] R. K. Guy. Impartial games. In Games of No Chance, volume 29 of Mathematical Sciences Research Institute Publications, pages 61–78. Cambridge University Press, 1996.
- [27] R. K. Guy. What is a game? In Games of No Chance, volume 29 of Mathematical Sciences Research Institute Publications, pages 43–60. Cambridge University Press, 1996.
- [28] S. N. Harris and D. R. Tauritz. Competitive coevolution for defense and security: Elo-based similar-strength opponent sampling. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’21, page 1898–1906, 2021.
- [29] A. Hauptman. Evolving search heuristics for combinatorial games with genetic programming. PhD thesis, Ben-Gurian University of the Negev, 2009.
- [30] A. Hauptman and M. Sipper. GP-EndChess: using genetic programming to evolve chess endgame players. In Proceedings of the 8th European Conference on Genetic Programming, EuroGP ’05, page 120–131, 2005.
- [31] M. A. Hevia Fajardo and P. K. Lehre. How fitness aggregation methods affect the performance of competitive CoEAs on bilinear problems. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’23, page 1593–1601, 2023.
- [32] M. A. Hevia Fajardo, P. K. Lehre, and S. Lin. Runtime analysis of a co-evolutionary algorithm: Overcoming negative drift in maximin-optimisation. In Proceedings of the 17th Conference on Foundations of Genetic Algorithms, FOGA ’23, page 73–83, 2023.
- [33] J. Hofbauer and K. Sigmund. Evolutionary game dynamics. Bulletin of the American Mathematical Society, 40(4):479–519, 2003.
- [34] T. Jansen and R. P. Wiegand. The cooperative coevolutionary (1+1) EA. Evolutionary Computation, 12(4):405–434, 2004.
- [35] W. Jaśkowski, K. Krawiec, and B. Wieloch. Fitnessless coevolution. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’08, page 355–362, 2008.
- [36] W. Jaśkowski, M. Szubert, and P. Liskowski. Multi-criteria comparison of coevolution and temporal difference learning on othello. In Applications of Evolutionary Computation, pages 301–312, 2014.
- [37] A. H. Jorgensen. Context and driving forces in the development of the early computer game Nimbi. IEEE Annals of the History of Computing, 31(3):44–53, 2009.
- [38] K. Krawiec and M. Heywood. Solving complex problems with coevolutionary algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’20, page 832–858, 2020.
- [39] J. Lange, M. Stanke, and M. Ebner. Co-evolution of spies and resistance fighters. In Applications of Evolutionary Computation, pages 487–502, 2022.
- [40] P. K. Lehre. Runtime analysis of competitive co-evolutionary algorithms for maximin optimisation of a bilinear function. Algorithmica, 86(7):2352–2392, 2024.
- [41] P. K. Lehre, M. A. Hevia Fajardo, J. Toutouh, E. Hemberg, and U.-M. O’Reilly. Analysis of a pairwise dominance coevolutionary algorithm and DefendIt. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’23, page 1027–1035, 2023.
- [42] S. Lin and P. K. Lehre. Overcoming binary adversarial optimisation with competitive coevolution. In Parallel Problem Solving from Nature XVIII, 2024.
- [43] A. Lubberts and R. Miikkulainen. Co-evolving a go-playing neural network. In Coevolution: Turning Adaptive Algorithms Upon Themselves, 2001.
- [44] H. K. McCoy. The game of nim - the Nimatron. Carnegie Technical, page 14, February 1951.
- [45] D. Michie. Experiments on the mechanization of game-learning part I: characterization of the model and its parameters. The Computer Journal, 6(3):232–236, 11 1963.
- [46] G. A. Monroy, K. O. Stanley, and R. Miikkulainen. Coevolution of neural networks using a layered pareto archive. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’06, page 329–336, 2006.
- [47] G. Nivasch. More on the Sprague-Grundy function for Whythoff’s game. In Games of No Chance 3, volume 56 of Mathematical Sciences Research Institute Publications, pages 377–410. Cambridge University Press, 2009.
- [48] J. Noble and R. A. Watson. Pareto coevolution: using performance against coevolved opponents in a game as dimensions for pareto selection. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’01, page 493–500, 2001.
- [49] M. Pelikan, M. Hauschild, and F. G. Lobo. Estimation of distribution algorithms. In Springer Handbook of Computational Intelligence, pages 899–928. Springer, 2015.
- [50] J. B. Pollack and A. D. Blair. Co-evolution in the successful learning of backgammon strategy. Machine Learning, 32(3):225–240, Sep 1998.
- [51] E. Popovici, A. Bucci, R. P. Wiegand, and E. D. De Jong. Coevolutionary Principles, pages 987–1033. Springer, 2012.
- [52] R. Redheffer. A machine for playing the game nim. The American Mathematical Monthly, 55(6):343–349, 1948.
- [53] J. M. Robson. The complexity of go. In Proceedings of the IFIP 9th World Computer Congress on Information Processing, pages 413–417, 1983.
- [54] J. M. Robson. N by N checkers is exptime complete. SIAM Journal on Computing, 13(2):252–267, 1984.
- [55] C. D. Rosin and R. K. Belew. New methods for competitive coevolution. Evolutionary Computation, 5(1):1–29, 1997.
- [56] L. Rougetet. A prehistory of nim. The College Mathematics Journal, 45(5):358–363, 2014.
- [57] M. Saburov. On discrete-time replicator equations with nonlinear payoff functions. Dynamic Games and Applications, 12(2):643–661, 2022.
- [58] F. Schuh. Spel van delers. Nieuw Tijdschrift voor Wiskunde, 39:299–304, 1952.
- [59] A. J. Schwenk. Take-away games. The Fibonacci Quarterly, 8:225–234, 1970.
- [60] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- [61] R. Sprague. Über mathematische kampfspiele. Tohoku Mathematical Journal, First Series, 41:438–444, 1935.
- [62] R. Sprague. Über zwei abarten von nim. Tohoku Mathematical Journal, First Series, 43:351–354, 1937.
- [63] M. Szubert, W. Jaśkowski, P. Liskowski, and K. Krawiec. The role of behavioral diversity and difficulty of opponents in coevolving game-playing agents. In Applications of Evolutionary Computation, pages 394–405, 2015.
- [64] M. Szubert, W. Jaśkowski, and K. Krawiec. On scalability, generalization, and hybridization of coevolutionary learning: a case study for othello. IEEE Transactions on Computational Intelligence and AI in Games, 5(3):214–226, 2013.
- [65] C. Witt. Upper bounds on the running time of the univariate marginal distribution algorithm on onemax. Algorithmica, 81:632–667, 2019.
- [66] C. Witt. How majority-vote crossover and estimation-of-distribution algorithms cope with fitness valleys. Theoretical Computer Science, 940:18–42, 2023.
- [67] I. M. Yaglom. Two games with matchsticks. In Kvant Selecta: Combinatorics, I, volume 17 of Mathematical World, pages 1–8. American Mathematical Society, 2001.
Appendix A Preliminary results
Here we provide two straightforward results that will be useful to quote throughout the proof of Theorem 5.2. The first is derived from the Chernoff bounds for binomial random variables given by Theorem A.1, which is in turn an immediate consequences of [6, Theorem 3.2]. We remark that the conclusions E1-E3 have been optimised for ease of integration with the proofs in this paper, rather than tightness of bound.
Theorem A.1.
If , then for any it holds that
| (36) | ||||
| (37) |
Corollary A.2.
Suppose and .
-
E1
For any and satisfying ,
-
E2
For any and satisfying ,
-
E3
For any and satisfying ,
Proof.
Lemma A.3.
Given , let be given by
Then, the following properties hold.
-
F1
If and , then .
-
F2
If and , then .
-
F3
.