backgroundcolor=, basicstyle=

A General Upper Bound for the Runtime of a Coevolutionary Algorithm on Impartial Combinatorial Games

Alistair Benford and Per Kristian Lehre School of Computer Science, University of Birmingham, Birmingham, B15 2TT, UK. [email protected] of Computer Science, University of Birmingham, Birmingham, B15 2TT, UK. [email protected]
This research was supported by a Turing AI Fellowship (EPSRC grant ref EP/V025562/1).

Abstract

Due to their complex dynamics, combinatorial games are a key test case and application for algorithms that train game playing agents. Among those algorithms that train using self-play are coevolutionary algorithms (CoEAs). However, the successful application of CoEAs for game playing is difficult due to pathological behaviours such as cycling, an issue especially critical for games with intransitive payoff landscapes.

Insight into how to design CoEAs to avoid such behaviours can be provided by runtime analysis. In this paper, we push the scope of runtime analysis for CoEAs to combinatorial games, proving a general upper bound for the number of simulated games needed for UMDA to discover (with high probability) an optimal strategy. This result applies to any impartial combinatorial game, and for many games the implied bound is polynomial or quasipolynomial as a function of the number of game positions. After proving the main result, we provide several applications to simple well-known games: Nim, Chomp, Silver Dollar, and Turning Turtles. As the first runtime analysis for CoEAs on combinatorial games, this result is a critical step towards a comprehensive theoretical framework for coevolution.

1 Introduction

Many of the most well-known games in the world are combinatorial games. Combinatorial games are typically perfect-information games played by two players without chance moves. The game has a finite number of possible positions, and players alternately take turns moving the game from one position to another, according to a set of rules describing which moves are legal. Combinatorial games are an exceptionally broad class of games, including famous games enjoyed the world over such as Chess or Go. Even those with simple rules can engender deep and complex strategic interactions between players. While this strategic depth is a key part of the appeal for human players, it can also render the task of computing a winning strategy to be extremely difficult. Indeed, games for which this task is known to be EXPTIME-complete (in terms of board size) include Chess [20], Go (with the ko rule) [53], and Checkers [54]. It is also known that determining an optimal strategy for a poset game (a class of combinatorial games which we will encounter in Section 6.4) is PSPACE-complete in terms of the size of the underlying poset [24]. (For further results, see [10].)

While classical methods are impractical for such cases, strong strategies can still be developed by using heuristic approaches, such as neural networks, Monte Carlo tree search, or genetic programming. Indeed, combinatorial games are a long-standing focus in the development of artificial intelligence, from Donald Michie’s seminal use of reinforcement learning on Tic-Tac-Toe [45], to Deep Blue’s famous matches against then-world Chess champion Garry Kasparov [9], to the recent groundbreaking results of DeepMind [60]. Many recent successes in this area train game playing agents using self-play, and among those self-play heuristics are coevolutionary algorithms (CoEAs) [51]. For a CoEA, self-play is realised through one or more evolving populations of individuals who compete against their contemporaries. In each iteration, the strongest individuals are selected based on their competitive interactions. Through genetic mutation and crossover, these strongest individuals are then used as parents for the individuals in the next iteration.

The successful application of CoEAs is deeply challenging, often due to the potential for games with intransitive payoff landscapes to induce cyclic behaviour [17]. For standard evolutionary algorithms, which apply similar methods to traditional optimisation problems, insight into how to avoid pathological behaviours can be provided by runtime analysis, which exists in great breadth and depth in literature and continues to be actively developed [12]. However, despite clear demand (see [51]), runtime analysis that addresses the challenges unique to CoEAs is far more limited. Indeed, while existing coevolutionary runtime analysis concerns a range of algorithms and design features, there are only three problem settings to which it so far applies: Bilinear, a game played on bitstrings whose outcome depends only on the number of $1$ -bits selected by each player [40, 31, 32]; Diagonal, a benchmark problem inspired by binary test-based optimisation [42]; and a class of symmetric zero-sum games with a payoff landscape that is globally very simple, but possibly locally intransitive [2]. Accordingly, our core research aim is to push the scope of runtime analysis for CoEAs towards games which feature more complex strategic interaction between players, and more closely reflect real-world games.

Motivated by the numerous empirical investigations into the topic (see Section 1.1), we focus our analysis on the use of CoEAs for combinatorial games, and in particular impartial combinatorial games. A combinatorial game is said to be impartial if both players share the same set of available moves at each game position [26]. It is common to also adopt the normal play convention, which assumes that a player loses if they have no legal moves available. For instance, consider $\textsc{SubtractionNim}_{7}^{2}$ (formally introduced in Section 6.1), in which the game positions are $\{0,1,2,3,4,5,6\}$ and a player must subtract either $1$ or $2$ from the position on their turn. A strategy may be encoded as a string of length $6$ , with entry $i$ indicating whether a $1$ or a $2$ is to be subtracted when the game position is $i$ . One way the game may play out is then: Player 1 $\displaystyle:\texttt{122111}$ Player 2 $\displaystyle:\texttt{122122}$ Turn 1: P1 subtracts 1. The new position is 5. Turn 2: P2 subtracts 2. The new position is 3. Turn 3: P1 subtracts 2. The new position is 1. Turn 4: P2 subtracts 1. The new position is 0. Turn 5: P1 has no legal moves. P2 is the winner.

(In fact, any strategy of the form 12*12* will always win this game, provided the corresponding player does not move first.)

The main result of this paper (Theorem 5.2 and Corollary 5.5) is the first runtime analysis for a coevolutionary algorithm on impartial combinatorial games. In broad terms, it says the following.

Theorem 1.1 (Corollary 5.5, informal version).

Let $\mathcal{A}$ be the coevolutionary algorithm specified in Section 3, and let $G$ be an impartial combinatorial game with $n$ possible positions. Then, with high probability, $\mathcal{A}$ discovers an optimal strategy for $G$ within $n^{O(\overline{s})}$ game evaluations, where $\overline{s}$ is a precisely defined invariant of the corresponding game graph.

We note that the notion of a game graph is defined in Section 2 and the invariant $\overline{s}$ is defined in Section 4. For many games we find $\overline{s}=O(1)$ or $\overline{s}=O(\log{n})$ , and so this result implies a range of polynomial and quasipolynomial runtimes. While it appears likely that the upper bound provided is higher than the true runtime for specific games, a major strength is that it is immediately applicable to any impartial combinatorial game. As we also provide an easy method for bounding $\overline{s}$ above when its exact value is not obvious (see Proposition 4.4), deriving runtimes for well-known games is straightforward. Indeed, after distilling into a more concise form (Corollary 5.5), we will see applications to games including Nim, Silver Dollar, Turning Turtles, and Chomp.

To understand what is the significance of our result, it is helpful to first clarify what it is not. In no uncertain terms, this paper is not an account of a superior ready-to-use method for efficiently finding optimal strategies for combinatorial games. Strategies will here be encoded by exhaustively listing a preferred action for every possible game position, and thus the methods presented are necessarily at least linear in the number of game states, both in terms of memory and of time. With this naive representation, classical algorithms can already establish optimal strategies in time $O(n)$ using Sprague-Grundy theory (see Section 2.1), which is best possible. However, many games are parameterised in such a way that the number of possible game positions grows exponentially (accordingly, we emphasise that Theorem 1.1 is not in contradiction with the aforementioned EXPTIME and PSPACE results). While the classical approach breaks down in such cases, a CoEA can still find success by replacing the exhaustive listing of actions with a model that maps features of the game position onto an action.

However, even when using the naive representation, our understanding of how to successfully apply CoEAs is very limited. If we wish to consistently apply CoEAs to advanced problems, whether they are rooted in game-playing or not, we must attain a comprehensive understanding of their behaviour in these simpler settings. Indeed, seemingly simple instances still produce payoff landscapes with features that make them difficult to optimise heuristically, such as intransitivity (as an example, in the already-introduced representation for $\textsc{SubtractionNim}_{7}^{2}$ , 111121 defeats 122112, which in turn defeats 12122, which in turn defeats 111121, regardless of who plays first).

Thus, the main contribution of this paper is precisely this: a first step towards a theoretical understanding of CoEAs on combinatorial games. This greatly expands the scope of rigorous runtime analysis available for coevolution (which so far does not apply to any turn-based game, let alone combinatorial ones), and additionally complements the abundance of existing empirical analysis, which we review in Section 1.1. While it remains a long term goal to push analysis towards more sophisticated representations, insights into algorithm design gained here still hold great relevance to coevolution in general. Furthermore, we believe our addition to the range of techniques available in this critical domain will in turn further the development of future runtime analysis of CoEAs.

Finally, we note here that the algorithm we analyse is a type of coevolutionary algorithm called an estimation of distribution algorithm (EDA), and moreover that this EDA applies to multi-valued decision variables (these notions are covered in Section 3). While not the main focus of this paper, there is only a small amount of preexisting analysis for EDAs operating over non-binary search domains, despite the clear utility of such algorithms. Our proof includes a detailed treatment of this setting, and may also provide methods useful in future analysis in this area.

In the remainder of this section, we review existing related work before stating notation. In Section 2 we give a more comprehensive discussion of impartial combinatorial games and review some Sprague-Grundy theory that will be relevant to our proof. In Section 3 we state the algorithm to which our result applies (UMDA), with an emphasis on its extension to multi-valued decision variables. In Section 4 we motivate and define the graph property $\overline{s}$ appearing in Theorem 1.1, before then presenting the main result in Section 5. Following this, we apply the main result to a menagerie of selected impartial combinatorial games in Section 6.

1.1 Related work

Empirical analysis of coevolutionary algorithms for game playing. As game playing is a natural application for CoEAs, there have been a large number of empirical investigations into this topic, of which we can only list a small fraction here. In terms of impartial combinatorial games, Rosin and Belew [55] investigated the effect of using features such as fitness sharing and archives in CoEAs optimising a 4-pile instance of Nim, noting that Nim was a difficult coevolutionary problem despite lending itself to simple crossover-friendly representations. Additionally, Jákowski, Krawiec, and Wieloch [35] observed in relation to experiments on $\textsc{SubtractionNim}_{200}^{3}$ that intransitivity presents a strong challenge for CoEAs. Non-impartial (yet still almost symmetric) combinatorial games studied in the context of coevolution include Tic-Tac-Toe [35, 55], Backgammon [50], Othello [36, 63, 64], Senet [16], Checkers [5], Chess [18, 30], and Go [43]. More general game-playing applications include Pong [46], Bomberman [23], Poker [48], Resistance [39], as well as games invented to emulate real-world applications such as cyber security and defense [28, 41]. For a general survey, see [38].

Runtime analysis of coevolutionary algorithms. Until recently, the only existing coevolutionary runtime analysis result, due to Jansen and Wiegand [34], applied to a cooperative coevolutionary algorithm, which uses multiple populations to collectively solve traditional optimisation problems. The first runtime analysis applicable to competitive coevolution was established by Lehre [40], who showed that a population-based CoEA which selects using a pairwise dominance relation is able to approximate the Nash equilibrium of instances of a game called Bilinear in expected polynomial time. A key theoretical insight into algorithm design from the same paper was the identification of an error threshold for mutation rate, above which no CoEA can efficiently optimise Bilinear. Further runtime analysis for CoEAs on Bilinear has concerned the roles played by fitness aggregation methods [31] and archives [32] in algorithm behaviour. Inspired by promising applications of CoEAs for optimising binary test-based problems, Lin and Lehre [42] provided runtime analysis establishing the benefit of using a CoEA over a traditional EA for optimising a benchmark problem called Diagonal. In [2], Benford and Lehre considered the importance of maintaining a diverse set of opponents when coevolving game strategies, showing that any CoEA able to retain only one individual between generations cannot efficiently find optimal strategies on a certain class of symmetric zero-sum games, even though with high probability a coevolutionary EDA finds an optimal strategy in polynomial time.

1.2 Notation

Given a finite set $S$ , a probability distribution over $S$ is a function $p:S\to[0,1]$ satisfying $\sum_{s\in S}p(s)=1$ . We say that an $S$ -valued random variable $x$ is distributed according to $p$ , written $x\sim p$ , if $\mathbb{P}(x=s)=p(s)$ holds for every $s\in S$ . Given also a subset $A\subseteq S$ , we write $p(A)=\sum_{s\in A}p(s)$ . Given a number $\gamma\in[0,1]$ we use $\mathcal{P}_{\gamma}(S)$ to denote the set of probability distributions $p$ over $S$ satisfying $p(s)\geqslant\gamma$ for every $s\in S$ , and we also write $\mathcal{P}(S)=\mathcal{P}_{0}(S)$ .

A rooted directed graph is a triple $G=(V,F,v_{0})$ , were $V$ is a vertex set, $F$ is a function mapping each vertex onto its out-neighbourhood, and $v_{0}\in V$ is a distinguished root vertex. Throughout we will assume all directed graphs are acyclic. We write $E(G)=\{(u,v)\in V^{2}:v\in F(u)\}$ for the set of edges of $G$ and $\Delta=\max_{v\in V}|F(v)|$ for the maximum degree of $G$ . A directed path in $G$ is a sequence of vertices $u_{0}u_{1}\ldots u_{\ell}$ such that $u_{i}\in F(u_{i-1})$ for each $i\in[\ell]$ . For a path $P=u_{0}u_{1}\ldots u_{\ell}$ we have $|P|=\ell+1$ . If $v\in V$ has no out-neighbours, then we say $v$ is a sink. We use $\text{Int}(G)=\{v\in V:F(v)\neq\emptyset\}$ to denote the set of non-sink vertices of $G$ (the interior vertices).

All logarithms are the natural logarithm unless stated otherwise, and given $k\in\mathbb{N}$ we write $\log^{k}{n}=(\log{n})^{k}$ .

2 Impartial combinatorial games

Let us briefly review the representation of impartial games via directed graphs and some Sprague-Grundy theory (see, for example, [27, 47]). An impartial combinatorial game is a finite acyclic rooted directed graph $G=(V,F,v_{0})$ (see Section 1.2), where $V$ is a vertex set of size $n$ , and $v_{0}\in V$ is the initial game position. Players take it in turns to move the current position to one of its out-neighbours. We adopt the convention that if a player is unable to make a move because the current position has no out-neighbours (i.e., it is a sink), then that player loses. This is usually referred as the normal play convention. We will also always assume that for each $v\in V$ , there is a directed path from $v_{0}$ to $v$ , so that every game position is reachable.

We will encode strategies for impartial combinatorial games as an assignment of each non-sink game position $v$ to an element of $F(v)$ (that is, an out-neighbour of $v$ ), with this assignment indicating the preferred move at each game position. Formally, recalling that $\text{Int}(G)$ denotes the set of $v\in V$ with $F(v)\neq\emptyset$ , then

\mathcal{X}_{G}=\prod_{v\in\text{Int}(G)}F(v)

will be the set of strategies for $G$ . Note that an element $x\in\mathcal{X}_{G}$ may be regarded as a mapping $\text{{Int}}(G)\to V$ , and so we will write $x(v)$ for the image of a position $v\in V$ under this mapping. This formulation coincides closely with that featured in the aforementioned work of Richie on reinforcement learning for optimal Tic-Tac-Toe play [45], and has similarities to subsequent ‘move selector’ representations which identify a preferred action based on the current game position using, for example, genetic programming [23], neural networks [43, 46], or a game-specific mapping [48, 39]. However, it stands distinct from ‘state evaluator’ representations which play by evaluating board positions, whether by recording evaluations for all possible positions [35, 55], genetic programming [29, 16, 30], neural networks [50, 64, 5, 18], or otherwise.

As is typical for the uses of coevolution for gameplaying discussed in Section 1.1, players receive a payoff depending only on whether the final outcome of the game was win or lose. Accordingly, let $f_{G}:\mathcal{X}_{G}\times\mathcal{X}_{G}\to\{-1,1\}$ be the payoff function for $G$ , where $f_{G}(x,y)=1$ indicates that $x$ wins against $y$ and $f_{G}(x,y)=-1$ indicates that $x$ loses against $y$ (where $x$ makes the first move). Precisely, if we recursively define for $v\in V$ ,

f_{G}^{v}(x,y)=\begin{cases}-f_{G}^{x(v)}(y,x)&\qquad\text{if $v\in\text{Int}(% G)$,}\\ -1&\qquad\text{otherwise,}\end{cases}

then $f_{G}(x,y)=f_{G}^{v_{0}}(x,y)$ . It will also be convenient to define for $x,y\in\mathcal{X}_{G}$ ,

\text{Path}_{G}(x,y)=\{v_{0},x(v_{0}),y(x(v_{0})),x(y(x(v_{0}))),\ldots\}.

We will always assume that there is some $x\in\mathcal{X}_{G}$ such that $f_{G}(x,y)=1$ for every $y\in\mathcal{X}_{G}$ (i.e., that the first player has a winning strategy for $G$ ). Indeed, if this is not the case, then the second player has a winning strategy, and so we can add a fictitious initial position $v^{\ast}$ to $G$ with $F(v^{\ast})=\{v_{0}\}$ to obtain a game equally challenging as $G$ but with a winning strategy for the first player. We thus define the set of optimal strategies for $G$ to be

\text{Opt}(G)=\{x\in\mathcal{X}_{G}:\text{$f_{G}(x,y)=1$ for every $y\in% \mathcal{X}$}\},

and remark that the above assumption implies that $\text{Opt}(G)$ will always be non-empty.

2.1 The Sprague-Grundy function

First introduced independently by Sprague [61, 62] and Grundy [25], the Sprague-Grundy function of an impartial combinatorial game is a function mapping game positions onto non-negative integers, which contains information about the game’s strategic landscape and how optimal play is affected when building new games out of smaller ones [19]. Formally, given $G=(V,F,v_{0})$ , the Sprague-Grundy function $h:V\to\mathbb{N}_{0}$ is defined recursively. First, all sink vertices are given the value $0$ . Then, once all out-neighbours of $v$ have a value assigned, we define

h(v)=\text{{mex}}\,{\{h(w):w\in F(v)\}},

where $\text{{mex}}\,{S}=\min{(\mathbb{N}_{0}\setminus S)}$ denotes the smallest non-negative number not in a finite set $S$ (the ‘minimum excluded integer’).

Given $v\in V$ , if the current position is $v$ then the player making the next move has a winning strategy if and only if $h(v)\neq 0$ . Accordingly, if $h(v)=0$ then the player making the next move will always lose against an opponent who plays optimally. Thus, victory can be assured for the player making the first move by always choosing to move to vertices in $h^{-1}(\{0\})$ . Because this happens automatically whenever $F(v)\setminus h^{-1}(\{0\})=\emptyset$ , an optimal strategy can be guaranteed by learning optimal moves at a set $W_{G}$ (which we refer to as critical positions) defined in the following way.

Definition 2.1.

Given an impartial combinatorial game $G=(V,F,v_{0})$ , let

W_{G}=\{v\in\text{\emph{Int}}(G):\text{$h(v)\neq 0$ and $F(v)\setminus h^{-1}(% \{0\})\neq\emptyset$}\},

where $h:V\to\mathbb{N}_{0}$ is the Sprague-Grundy function for $G$ .

The following lemma formalises this notion in a general form that will be useful to quote later.

Lemma 2.2.

Let $h:V\to\mathbb{N}_{0}$ denote the Sprague-Grundy function of a combinatorial game $G$ . Let $u_{1},\ldots,u_{n}$ be an ordering of $V$ such that $F(u_{i})\subseteq\{u_{1},\ldots,u_{i-1}\}$ for every $i\in[n]$ . Then, the following holds for every $i\in[n]$ .

A1

If $h(u_{i})\neq 0$ and $x\in\mathcal{X}_{G}$ satisfies $h(x(v))=0$ for every $v\in W_{G}\cap\{u_{1},\ldots,u_{i}\}$ , then $f_{G}^{u_{i}}(x,y)=1$ holds for every $y\in\mathcal{X}_{G}$ .
A2

If $h(u_{i})=0$ and $y\in\mathcal{X}_{G}$ satisfies $h(y(v))=0$ for every $v\in W_{G}\cap\{u_{1},\ldots,u_{i}\}$ , then $f_{G}^{u_{i}}(x,y)=-1$ holds for every $x\in\mathcal{X}_{G}$ .

In particular, with our assumption that the first player always has a winning strategy for $G$ , if $x\in\mathcal{X}_{G}$ satisfies $h(x(v))=0$ for every $v\in W_{G}$ , then $x\in\text{\emph{Opt}}(G)$ .

Proof.

We prove that the conditions A1 andA2 always hold by induction on $i$ . For the case $i=1$ , note that we must have $h(u_{1})=0$ (as $u_{i}\notin\text{Int}(G)$ ) and $f_{G}^{u_{1}}(x,y)=-1$ for any $x,y\in\mathcal{X}_{G}$ . For the inductive stage, there are two cases to consider. First, if $h(u_{i})=0$ and $y\in\mathcal{X}_{G}$ satisfies $h(y(v))=0$ for every $v\in W_{G}\cap\{u_{1},\ldots,u_{i}\}$ , then because $\text{{mex}}\,{\{h(w):w\in F(u_{i})\}}=0$ we must have $h(x(u_{i}))\neq 0$ for any $x\in\mathcal{X}_{G}$ , and hence

f_{G}^{u_{i}}(x,y)=-f_{G}^{x(u_{i})}(y,x)\overset{\emph{\ref{oc1}}}{=}-1.

On the other hand, if $h(u_{i})\neq 0$ and $x\in\mathcal{X}_{G}$ satisfies $h(x(v))=0$ for every $v\in W_{G}\cap\{u_{1},\ldots,u_{i}\}$ , then in fact $h(x(u_{i}))=0$ (for this holds by default if $u_{i}\notin W_{G}$ ), and so for any $y\in\mathcal{X}_{G}$ ,

f_{G}^{u_{i}}(x,y)=-f_{G}^{x(u_{i})}(y,x)\overset{\emph{\ref{oc2}}}{=}1,

as required. ∎

Note that the final conclusion of Lemma 2.2 is a sufficient condition, but not a necessary condition, as demonstrated by Figure 1.

Figure 1: In the combinatorial game illustrated above, Sprague-Grundy values at each game position are shown in red. In this game,

W_{G}=\{v_{0},b\}

. However, any strategy

x

with

x(v_{0})=d

is automatically optimal (the first player wins on their first turn), and so the condition of Lemma 2.2 is not a necessary one.

3 UMDA

Rather than storing a population as a set of points in the search space, as is the case for most EAs, an estimation of distribution algorithm (EDA) represents its population as a probability distribution over the search space [49]. Whereas most algorithms sample candidates for selection from their current population uniformly at random, an EDA instead samples from its probability distribution. After selection has been completed, the selected individuals are then used to update the probability distribution for the next generation. Much of the existing runtime analysis for EDAs (see [8, 11, 13, 65, 66]) has emphasised the benefit provided by a high level of diversity among generated search points. This is also the case in the recent first runtime analysis of a coevolutionary EDA [2], wherein the difficulty presented by locally intransitive payoff landscapes could be provably averted by evaluating strategies against a diverse set of opponents. As intransitivity is also apparent in impartial combinatorial games, a coevolutionary EDA is a good candidate for a first runtime analysis on this topic too.

Most existing theoretical analysis of EDAs concerns those operating over bitstrings – that is, $\{0,1\}^{n}$ is the search domain. However, as outlined in Section 2, our formulation of strategies gives rise to a more complicated search domain. For a parent set $S$ , we are considering search domains of the form $\mathcal{X}=\prod_{i\in I}S_{i}$ , where $I$ is an indexing set and $S_{i}\subseteq S$ for each $i\in I$ . Given a tuple $p\in\prod_{i\in I}\mathcal{P}(S_{i})$ , let $\text{Univ}(\mathcal{X},p)$ denote the probability distribution over $\mathcal{X}$ such that if $x\sim\text{Univ}(\mathcal{X},p)$ then for any $y\in\mathcal{X}$ ,

\mathbb{P}(x=y)=\prod_{i\in I}p(i)(y_{i}),

so that the distribution of $x$ is that of an independent univariate sampling for each $i\in I$ . For notational convenience, given a tuple $p\in\prod_{i\in I}\mathcal{P}(S_{i})$ we will often write for $i\in I$ and $s\in S$ ,

p(i,s)=\begin{cases}p(i)(s)&\qquad\text{if $s\in S_{i}$,}\\ 0&\qquad\text{otherwise.}\end{cases}

The coevolutionary EDA we consider will represent its current population as an element $p\in\prod_{i\in I}\mathcal{P}(S_{i})$ , with individuals being generated according to $\text{Univ}(\mathcal{X},p)$ . In the case where $I=[n]$ and $S_{i}=\{0,1\}$ for each $i\in[n]$ , we recover the standard framework for univariate EDAs operating over bitstrings. For these EDAs, the tuple $p\in\prod_{i\in[n]}\mathcal{P}(\{0,1\})$ is often represented as a frequency vector $(p(1),\ldots,p(n))\in[0,1]^{n}$ , where $p(i)$ is the probability that $x\sim\text{Univ}(\{0,1\}^{n},p)$ has a $1$ -bit in position $i$ . A common feature for EDAs operating over bitstrings is to constrain these frequencies to the interval $[\gamma,1-\gamma]$ for some small $\gamma$ at the end of each generation. For the general case, where we track a tuple $p\in\prod_{i\in I}\mathcal{P}(S_{i})$ , we need to constrain each $p(i)\in\mathcal{P}(S_{i})$ to the set $\mathcal{P}_{\gamma}(S_{i})$ . To achieve this, we adopt the following minor variation of the multi-valued EDA framework proposed by Ben Jedidia, Doerr, and Krejca [1]. Given $\gamma\in[0,\textstyle{\frac{1}{|S|}})$ and $p\in\mathcal{P}(S)$ , let

\beta^{+}_{\gamma}(p)=\sum_{s\in S}\max{\{p(s)-\gamma,0\}},\qquad\beta^{-}_{% \gamma}(p)=\sum_{s\in S}\max{\{\gamma-p(s),0\}},

Let $\pi_{\gamma}^{S}:\mathcal{P}(S)\to\mathcal{P}_{\gamma}(S)$ then be the function given by

\pi_{\gamma}^{S}(p)(s)=\begin{cases}\gamma&\qquad\text{if $p(s)\leqslant\gamma% $,}\\ \gamma+\left(1-\textstyle{\frac{\beta_{\gamma}^{-}(p)}{\beta_{\gamma}^{+}(p)}}% \right)(p(s)-\gamma)&\qquad\text{if $p(s)\geqslant\gamma$.}\end{cases}

For the case $|S|=2$ the definition reduces to $\pi_{\gamma}^{S}(p)(s)=\min{\{\max{\{p(s),\gamma\}},1-\gamma\}}$ , and so this model fits the usual method for constraining univariate EDAs over bitstrings.

Despite some differences in notation, the function $\pi_{\gamma}^{S}$ is nearly identical the restriction described in [1]. In the context of [1, Section 4.2], our only modification is to forego an initial clamping of probabilities to the interval $[\gamma,1-(|S|-1)\gamma]$ , as the upper border of $1-(|S|-1)\gamma$ is already implied by the fact that the remaining steps produce an element of $\mathcal{P}_{\gamma}(S)$ . Indeed, an actual difference between the two methods only arises for inputs $p$ satisfying $\max_{s\in S}p(s)>1-(|S|-1)\gamma$ , and even in such cases the difference is not significant.

The fact that $\pi_{\gamma}^{S}$ always outputs an element of $\mathcal{P}_{\gamma}(S)$ is verified by B1 in the following lemma, which also establishes several further properties of $\pi_{\gamma}$ which will be useful for our later proofs.

Lemma 3.1.

Let $\beta_{\gamma}^{+}$ , $\beta_{\gamma}^{-}$ , and $\pi_{\gamma}^{S}$ be as defined in Section 3. Then, the following properties hold.

B1

For any $p\in\mathcal{P}(S)$ , $\sum_{s\in S}\pi_{\gamma}^{S}(p)(s)=1$ .
B2

If $p(s)\geqslant\gamma$ , then $\left(1-\textstyle{\frac{\beta_{\gamma}^{-}(p)}{1-\gamma|S|}}\right)p(s)% \leqslant\pi_{\gamma}^{S}(p)(s)\leqslant p(s)$ .
B3

For any $S_{i}\subseteq S$ , $p\in\mathcal{P}(S_{i})$ and $s\in S$ , $\pi_{\gamma}^{S_{i}}(p)(s)\leqslant\max{\{\gamma,p(s)\}}$ .
B4

For any $S_{i},A\subseteq S$ and $p\in\mathcal{P}(S_{i})$ , $\pi_{\gamma}^{S_{i}}(p)(A)\leqslant p(A)+\gamma|S_{i}|$ .

Proof.

We first note that the definitions of $\beta_{\gamma}^{+}$ and $\beta_{\gamma}^{-}$ imply that for any $\gamma\in[0,1/|S|)$ and $p\in\mathcal{P}(S)$ ,

	$\displaystyle\beta_{\gamma}^{+}(p)-\beta_{\gamma}^{-}(p)$	$\displaystyle=\sum_{s\in S}(\max{\{p(s)-\gamma,0\}}-\max{\{\gamma-p(s),0\}})$
		$\displaystyle=\sum_{s\in S}(\max{\{p(s)-\gamma,0\}}+\min{\{p(s)-\gamma,0\}})=% \sum_{s\in S}(p(s)-\gamma)=1-\gamma\|S\|.$		(1)

Because $1-\gamma|S|>0$ it immediately follows from (1) that

\beta_{\gamma}^{-}(p)<\beta_{\gamma}^{+}(p).

(2)

With these observations, we are now ready to prove the desired properties.

B1: If $p\in\mathcal{P}(S)$ , then setting $S^{+}=\{s\in S:p(s)\geqslant\gamma\}$ and $S^{-}=S\setminus S^{+}$ we have

\displaystyle\sum_{s\in S}\pi_{\gamma}^{S}(p)(s)=\gamma|S|+\sum_{s\in S^{+}}% \left(1-\textstyle{\frac{\beta_{\gamma}^{-}(p)}{\beta_{\gamma}^{+}(p)}}\right)% (p(s)-\gamma)=\gamma|S|+\left(\textstyle{\frac{\beta_{\gamma}^{+}(p)-\beta_{% \gamma}^{-}(p)}{\beta_{\gamma}^{+}(p)}}\right)\beta_{\gamma}^{+}(p)\overset{% \eqref{eq:beta3}}{=}1.

B2: If $p(s)\geqslant\gamma$ , then by setting $\alpha=\beta_{\gamma}^{-}(p)/\beta_{\gamma}^{+}(p)$ ,

	$\displaystyle\left(1-\textstyle{\frac{\beta_{\gamma}^{-}(p)}{1-\gamma\|S\|}}% \right)p(s)$	$\displaystyle\overset{\eqref{eq:beta3}}{\leqslant}\left(1-\textstyle{\frac{% \beta_{\gamma}^{-}(p)}{\beta_{\gamma}^{+}(p)}}\right)p(s)=(1-\alpha)p(s)% \leqslant(1-\alpha)p(s)+\alpha\gamma=\gamma+(1-\alpha)(p(s)-\gamma)$
		$\displaystyle=\pi_{\gamma}^{S}(p)(s)=\gamma+\left(1-\textstyle{\frac{\beta_{% \gamma}^{-}(p)}{\beta_{\gamma}^{+}(p)}}\right)(p(s)-\gamma)\overset{\eqref{eq:% beta4}}{\leqslant}\gamma+(p(s)-\gamma)=p(s),$

and so B2 holds.

B3: If $p(s)\leqslant\gamma$ then $\pi_{\gamma}^{S_{i}}(p)(s)\leqslant\gamma=\max{\{\gamma,p(s)\}}$ . On the hand, if $p(s)\geqslant\gamma$ , then B2 implies that $\pi_{\gamma}^{S_{i}}(p)\leqslant p(s)=\max{\{\gamma,p(s)\}}$ . In either case, B3 holds.

B4: We can compute

	$\displaystyle\pi_{\gamma}^{S_{i}}(p)(A)$	$\displaystyle=\pi_{\gamma}^{S_{i}}(p)(A\cap S_{i})=\sum_{s\in A\cap S_{i}}\pi_% {\gamma}^{S_{i}}(p)(s)\overset{\emph{\ref{pi-2}}}{\leqslant}\sum_{s\in A\cap S% _{i}}\max{\{\gamma,p(s)\}}\leqslant\sum_{s\in A\cap S_{i}}(p(s)+\gamma)$
		$\displaystyle=p(A)+\gamma\|A\cap S_{i}\|\leqslant p(A)+\gamma\|S_{i}\|,$

as required. ∎

Algorithm 1 UMDA with binary tournament selection

1:Search domain

\mathcal{X}=\prod_{i\in I}S_{i}

2:Function

f:\mathcal{X}\times\mathcal{X}\to\{-1,1\}

3:Algorithm parameters

\mu\in\mathbb{N}

and

\gamma>0

4:for

i\in I

5: for

s\in S_{i}

6: Set

p_{0}(i)(s)=\frac{1}{|S_{i}|}

7: end for

8:end for

9:for

t\in\mathbb{N}

until termination criterion met do

10: for

j\in[\mu]

11: Sample

x\sim\text{Univ}(\mathcal{X},p_{t})

12: Sample

y\sim\text{Univ}(\mathcal{X},p_{t})

13: if

f(x,y)=1

then

14: Set

P_{t+1}(j)=x

15: else if

f(x,y)=-1

then

16: Set

P_{t+1}(j)=y

17: end if

18: end for

19: for

i\in I

20: for

s\in S_{i}

21: Set

q_{t+1}(i)(s)=\frac{1}{\mu}|\{j:\text{$P_{t+1}(j)$ has an $s$ in position $i$}\}|

22: end for

23: Set

p_{t+1}(i)=\pi_{\gamma}^{S_{i}}(q_{t+1}(i))

24: end for

25:end for

A description of the algorithm we analyse is now provided by Algorithm 1, which effectively generalises the version appearing in [2] (which applied only to bitstrings and omitted the step involving $\pi_{\gamma}^{S_{i}}$ ). Note that due to the use of $\pi_{\gamma}^{S_{i}}$ in line 23, we always have $p_{t}\in\prod_{i\in I}\mathcal{P}_{\gamma}(S_{i})$ .

A key step towards analysing the performance of Algorithm 1 on impartial combinatorial games is understanding the distribution of a selected individual $P_{t+1}(j)$ . This will be handled by the following lemma. Its conclusion gives an exact expression for how the probability a selected individual would choose to move from $u$ to $v$ compares to the probability a sampled individual would choose to move from $u$ to $v$ (where a selected individual is simply the winner of a game played between two independent sampled individuals).

Lemma 3.2.

Let $G$ be an impartial combinatorial game, and let $p\in\prod_{v\in\text{\emph{Int}}(G)}\mathcal{P}(F(v))$ . Suppose that $x,y\sim\text{\emph{Univ}}(\mathcal{X}_{G},p)$ are independent, and

z=\begin{cases}x&\qquad\text{if $f_{G}(x,y)=1$,}\\ y&\qquad\text{if $f_{G}(x,y)=-1$.}\end{cases}

Then, for any $u\in V$ and $v\in F(u)$ ,

\mathbb{P}(z(u)=v)=p(u,v)\cdot[1+\mathbb{P}(u\in\text{\emph{Path}}_{G}(x,y))% \cdot(1-\mathbb{P}(f_{G}^{v}(x,y)=1)-\mathbb{P}(f_{G}^{u}(x,y)=1))].

(3)

For an intuition behind (3), the comparative factor has effectively three terms (here interpreted in the context of Algorithm 1):

•

$\mathbb{P}(u\in\text{{Path}}_{G}(x,y))$ , the probability the algorithm encounters position $u$ ;
•

$\mathbb{P}(f_{G}^{u}(x,y)=1)$ , the probability that $u$ is observed as a winning position; and
•

$1-\mathbb{P}(f_{G}^{v}(x,y)=1)$ , the probability that $v$ is observed as a losing position.

If it is likely for $v$ to be observed as a losing position, but unlikely for $u$ to be observed as a winning position, then it is beneficial to deliberately move from $u$ to $v$ (placing your opponent in a likely losing position) rather than play out with whatever the current strategy is from $u$ (where you are unlikely to win), thus incurring an increase in the prevalence of $z(u)=v$ among selected individuals. On the other hand, if the reverse is true, then it is beneficial to deliberately avoid moving from $u$ to $v$ and instead play out normally, thus incurring a decrease in prevalence of $z(u)=v$ . This helps motivate the effect of the term $1-\mathbb{P}(f_{G}^{v}(x,y)=1)-\mathbb{P}(f_{G}^{u}(x,y)=1)$ . The magnitude of this effect scales with the relative frequency with which $u$ is encountered as a game position, which corresponds to $\mathbb{P}(u\in\text{{Path}}_{G}(x,y))$ .

Proof of Lemma 3.2.

First, we will introduce some notation to assist with this proof. Let us write $r=\mathbb{P}(u\in\text{Path}_{G}(x,y))$ , $s_{u}=\mathbb{P}(f_{G}^{u}(x,y)=1)$ , and $s_{v}=\mathbb{P}(f_{G}^{v}(x,y)=1)$ . Let us also write

	$\displaystyle A$	$\displaystyle=\{w\in V\setminus\{u\}:\text{there is a directed path from $w$ % to $u$}\},$
	$\displaystyle B$	$\displaystyle=\{w\in V\setminus\{u\}:\text{there is a directed path from $u$ % to $w$}\},$

and note that $A$ , $B$ , and $\{u\}$ are pairwise disjoint sets. Finally, if we have $\text{Path}(x,y)=v_{0}v_{1}\ldots v_{\ell}$ when regarded as a directed path (where here and throughout we drop the subscript from $\text{Path}_{G}$ to simplify notation), then we will define

	$\displaystyle\text{Path}^{1}(x,y)$	$\displaystyle=\{v_{i}:\text{$i$ is even}\},$
	$\displaystyle\text{Path}^{2}(x,y)$	$\displaystyle=\{v_{i}:\text{$i$ is odd}\}.$

Note that because $\text{Path}(x,y)$ is the disjoint union of $\text{Path}^{1}(x,y)$ and $\text{Path}^{2}(x,y)$ , we have

r=\mathbb{P}(u\in\text{Path}^{1}(x,y))+\mathbb{P}(u\in\text{Path}^{2}(x,y)).

(4)

The event $z(u)=v$ can be written as the disjoint union of the following six events.

	$\displaystyle E_{1}$	$\displaystyle=u\notin\text{Path}(x,y)\wedge f_{G}(x,y)=1\wedge x(u)=v$
	$\displaystyle E_{2}$	$\displaystyle=u\notin\text{Path}(x,y)\wedge f_{G}(x,y)=-1\wedge y(u)=v$
	$\displaystyle E_{3}$	$\displaystyle=u\in\text{Path}^{1}(x,y)\wedge x(u)=v\wedge f_{G}^{v}(y,x)=-1$
	$\displaystyle E_{4}$	$\displaystyle=u\in\text{Path}^{1}(x,y)\wedge f_{G}^{u}(x,y)=-1\wedge y(u)=v$
	$\displaystyle E_{5}$	$\displaystyle=u\in\text{Path}^{2}(x,y)\wedge y(u)=v\wedge f_{G}^{v}(x,y)=-1$
	$\displaystyle E_{6}$	$\displaystyle=u\in\text{Path}^{2}(x,y)\wedge f_{G}^{u}(y,x)=-1\wedge x(u)=v$

Let us examine the probability of each of these events occurring. For $E_{1}$ , the event $u\notin\text{Path}(x,y)\wedge f_{G}(x,y)=1$ can be determined using only $(x(w))_{w\neq u}$ and $(y(w))_{w\neq u}$ , and so is independent of the event $x(u)=v$ . Similarly, in $E_{2}$ the event $u\notin\text{Path}(x,y)\wedge f_{G}(x,y)=-1$ is independent of the event $y(u)=v$ . Therefore,

$\displaystyle\mathbb{P}(E_{1})+\mathbb{P}(E_{2})=$	$\displaystyle\,\mathbb{P}(u\notin\text{Path}(x,y)\wedge f_{G}(x,y)=1)\cdot p(u% ,v)$
	$\displaystyle+\mathbb{P}(u\notin\text{Path}(x,y)\wedge f_{G}(x,y)=-1)\cdot p(u% ,v)$
$\displaystyle=$	$\displaystyle\,\mathbb{P}(u\notin\text{Path}(x,y))\cdot p(u,v)$
$\displaystyle=$	$\displaystyle\,(1-r)\cdot p(u,v).$	(5)

For $E_{3}$ , the event $u\in\text{Path}^{1}(x,y)$ can be determined using only $(x(w))_{w\in A}$ and $(y(w))_{w\in A}$ , and the event $f_{G}^{v}(x,y)=1$ can be determined using only $(x(w))_{w\in B}$ and $(y(w))_{w\in B}$ . Therefore, all three component events in $E_{3}$ are independent of each other. The same is also true of $E_{5}$ . Therefore, noting that $\mathbb{P}(f_{G}^{v}(x,y)=-1)=\mathbb{P}(f_{G}^{v}(y,x)=-1)$ , we can write

$\displaystyle\mathbb{P}(E_{3})+\mathbb{P}(E_{5})=$	$\displaystyle\,\mathbb{P}(u\in\text{Path}^{1}(x,y))\cdot p(u,v)\cdot\mathbb{P}% (f_{G}^{v}(y,x)=-1)$
	$\displaystyle+\mathbb{P}(u\in\text{Path}^{2}(x,y))\cdot p(u,v)\cdot\mathbb{P}(% f_{G}^{v}(x,y)=-1)$
$\displaystyle=$	$\displaystyle\,(\mathbb{P}(u\in\text{Path}^{1}(x,y))+\mathbb{P}(u\in\text{Path% }^{2}(x,y)))\cdot p(u,v)\cdot\mathbb{P}(f_{G}^{v}(x,y)=-1)$
$\displaystyle\overset{\eqref{eq:r-1-2}}{=}$	$\displaystyle\,r\cdot p(u,v)\cdot(1-s_{v}).$	(6)

For $E_{4}$ , the event $u\in\text{Path}^{1}(x,y)$ can be determined using only $(x(w))_{w\in A}$ and $(y(w))_{w\in A}$ , and the event $f_{G}^{u}(x,y)=-1$ can be determined using only $(x(w))_{w\in\{u\}\cup B}$ and $(y(w))_{w\in B}$ . Therefore, all three component events in $E_{4}$ are independent of each other. The same is also true of $E_{6}$ . Therefore, noting that $\mathbb{P}(f_{G}^{u}(x,y)=-1)=\mathbb{P}(f_{G}^{u}(y,x)=-1)$ , we can write

$\displaystyle\mathbb{P}(E_{4})+\mathbb{P}(E_{6})=$	$\displaystyle\,\mathbb{P}(u\in\text{Path}^{1}(x,y))\cdot\mathbb{P}(f_{G}^{u}(x% ,y)=-1)\cdot p(u,v)$
	$\displaystyle+\mathbb{P}(u\in\text{Path}^{1}(x,y))\cdot\mathbb{P}(f_{G}^{u}(y,% x)=-1)\cdot p(u,v)$
$\displaystyle=$	$\displaystyle\,(\mathbb{P}(u\in\text{Path}^{1}(x,y))+\mathbb{P}(u\in\text{Path% }^{2}(x,y)))\cdot\mathbb{P}(f_{G}^{u}(x,y)=-1)\cdot p(u,v)$
$\displaystyle\overset{\eqref{eq:r-1-2}}{=}$	$\displaystyle\,r\cdot(1-s_{u})\cdot p(u,v).$	(7)

We can now combine these observations to obtain

	$\displaystyle\mathbb{P}(z(u)=v)$	$\displaystyle=\sum_{i\in[6]}\mathbb{P}(E_{i})\overset{\eqref{eq:E12},\eqref{eq% :E35},\eqref{eq:E46}}{=}(1-r)\cdot p(u,v)+r\cdot p(u,v)(1-s_{v})+r\cdot(1-s_{u% })p(u,v)$
		$\displaystyle=p(u,v)\cdot[1+r\cdot(1-s_{v}-s_{u})],$

as required. ∎

As an aside, we note here a parallel with evolutionary game theory. Consider the discrete time replicator equation with nonlinear payoff functions (see (2.1) of [57]; also [33] for the more standard continuous and linear versions),

q_{i}^{\prime}=q_{i}(1+a_{i}-\textstyle\sum_{j}q_{j}a_{j}),

(8)

where we interpret $q_{i}$ as the proportion of type $i$ in a population and $a_{i}$ as the fitness of a type $i$ individual. The following proposition demonstrates that by identifying $q_{i}$ and $a_{i}$ appropriately, (3) can be seen to be of the form provided by (8).

Proposition 3.3.

In the setting of Lemma 3.2, let $u\in V$ be fixed and enumerate $F(u)=\{v_{1},\ldots,v_{k}\}$ . Let us identify

	$\displaystyle q_{i}$	$\displaystyle=p(u,v_{i})$
	$\displaystyle a_{i}$	$\displaystyle=\mathbb{P}(u\in\text{{Path}}_{G}(x,y))\cdot(1-\mathbb{P}(f_{G}^{% v_{i}}(x,y)=1)).$

Then (3) can be rewritten as $q_{i}^{\prime}=q_{i}(1+a_{i}-\sum_{j\in[k]}q_{j}a_{j})$ .

Proof.

First, note the identity

\mathbb{P}(f_{G}^{u}(x,y)=1)=\sum_{j\in[k]}p(u,v_{j})\cdot\mathbb{P}(f_{G}^{v_% {j}}(y,x)=-1)=\sum_{j\in[k]}q_{j}\cdot(1-\mathbb{P}(f_{G}^{v_{j}}(x,y)=1)).

(9)

Therefore,

	$\displaystyle\mathbb{P}(z(u)=v_{i})$	$\displaystyle\overset{\eqref{eq:sampled-individual}}{=}p(u,v_{i})\cdot[1+% \mathbb{P}(u\in\text{{Path}}_{G}(x,y))\cdot(1-\mathbb{P}(f_{G}^{v_{i}}(x,y)=1)% -\mathbb{P}(f_{G}^{u}(x,y)=1))]$
		$\displaystyle=q_{i}\cdot[1+a_{i}-\mathbb{P}(u\in\text{{Path}}_{G}(x,y))\cdot% \mathbb{P}(f_{G}^{u}(x,y)=1)]$
		$\displaystyle\overset{\eqref{eq:rep-step}}{=}q_{i}\cdot[1+a_{i}-\mathbb{P}(u% \in\text{{Path}}_{G}(x,y))\cdot\textstyle\sum_{j\in[k]}q_{j}\cdot(1-\mathbb{P}% (f_{G}^{v_{j}}(x,y)=1))]$
		$\displaystyle=q_{i}(1+a_{i}-\textstyle\sum_{j\in[k]}q_{j}a_{j}),$

as required. ∎

In this sense, when executing Algorithm 1 on a game $G$ , the evolution of the distribution $p(u,\,\cdot\,)$ at each vertex $u$ of $G$ stochastically emulates these replicator dynamics. However, a key difference is that in standard evolutionary game theory, each fitness function $a_{i}:=a_{i}(q_{1},\ldots,q_{k})$ typically depends only on the distribution of types in the population; whereas the expression $a_{i}=\mathbb{P}(u\in\text{Path}_{G}(u,v_{i}))\cdot(1-\mathbb{P}(f_{G}^{v_{i}}% (x,y)=1))$ depends on the distribution of ‘types’ not just at the node $u$ , but also at possibly all other nodes as well, and so the dynamics of each node cannot be considered in isolation.

4 Switchability

In Section 1 we noted that our main result implies a probabilistic upper bound of $n^{O(\overline{s})}$ on an impartial combinatorial game, where $\overline{s}$ is an (often small) invariant of the corresponding game graph. In this section, we define this invariant and prove a key lemma.

Rather than defining this property, which we call switchability, for a game as a whole, we will actually define switchability as a property $s(u)$ of each vertex $u$ in the game’s vertex set $V$ . Then later we will take $\overline{s}=\max_{u\in V}s(u)$ (see Corollary 5.5). Intuitively, $s(u)$ measures the ‘smallest’ possible set of edges $A\subseteq E(G)$ such that any pair of strategies $x,y\in\mathcal{X}_{G}$ satisfying $A\subseteq\{(v,x(v)):v\in V\}$ and $A\subseteq\{(v,y(v)):v\in V\}$ must also satisfy $u\in\text{Path}_{G}(x,y)$ . The motivation is that if $x,y\sim\text{Univ}(\mathcal{X}_{G},p)$ for some $p\in\prod_{v\in\text{Int}(G)}\mathcal{P}_{\gamma}(F(v))$ , then $u\in\text{Path}_{G}(x,y)$ is assured by having $x$ and $y$ take certain values at the vertices appearing at the tail of some edge in $A$ , which occurs with probability at least $\gamma^{2s(u)}$ . A property that places a lower bound on $\mathbb{P}(u\in\text{Path}_{G}(x,y))$ in such a way will be very useful as we seek to apply Lemma 3.2 later.

Figure 2: An example of switchability.

In the description above, a naive approach would be to take ‘smallest’ to simply mean having fewest edges. However, while this gives a working definition, when then bounding $\mathbb{P}(u\in\text{Path}_{G}(x,y))$ below, it is clear that significant improvements can be made in many cases. Consider the example shown in Figure 2. Our naive approach suggests that if $x,y\sim\text{Univ}(\mathcal{X}_{G},p)$ for some $p\in\prod_{v\in\text{Int}(G)}\mathcal{P}_{\gamma}(F(v))$ , then $\mathbb{P}(u\in\text{Path}_{G}(x,y))\geqslant\gamma^{10}$ . However, it would be better to observe that $\mathbb{P}(u\in\text{Path}_{G}(x,y))\geqslant\gamma$ , as visiting $u$ can be assured a single choice to move to $u$ made by the player who makes the first move after reaching the layer $B$ (in this case, always the player $y$ ).

To better capture this notion, we will not take ‘smallest’ to mean fewest edges, but rather smallest depth, defined in the following way (we recall here that all graphs are assumed to be acyclic).

Definition 4.1.

Given a set of edges $A\subseteq E(G)$ , we define the depth of $A$ to be

\text{\emph{Depth}}(A)=\max{\{|A\cap E(P)|:\text{$P$ is a directed path in $G$% }\}}.

With this, the full description of switchability is provided by the following two definitions.

Definition 4.2.

Given a set $A\subseteq E(G)$ , we (inductively) say that a directed path $P=v_{0}\ldots v_{\ell}$ is $A$ -compatible if any of the following conditions hold.

C1

$P=v_{0}$ .
C2

$v_{0}\ldots v_{\ell-1}$ is $A$ -compatible and $v_{\ell-1}v_{\ell}\in A$ .
C3

$v_{0}\ldots v_{\ell-1}$ is $A$ -compatible and there is no $w\in V$ such that $v_{\ell-1}w\in A$ .

Then, given a vertex $v$ , we say that $A$ is a $v$ -switcher if $v$ is contained in every $A$ -compatible directed path $v_{0}\ldots v_{\ell}$ with $v_{\ell}\notin\text{\emph{Int}}(G)$ .

Definition 4.3.

The switchability $s(v)$ of a vertex $v$ is the smallest possible depth of a $v$ -switcher. We will also write $\overline{s}=\max_{v\in V}s(v)$ .

Figure 3: Two illustrations of switchability. In the first,

s(v)=1

, and a

v

-switcher of depth

1

is shown in blue. In the second,

s(v)=2

, a

v

-switcher of depth

2

is shown in blue, and one example of an

A

-compatible path is shown in red.

Thus, while the set $A$ shown in Figure 2 has 5 edges, it has $\text{Depth}(A)=1$ , and so in that case we have $s(u)=1$ . Figure 3 shows two further illustrations of switchability. For certain games, constructing a small $v$ -switcher is quite straightforward (see proof of Proposition 6.1 later); in other cases where determining switchability is not obvious, the following upper bound may be used instead.

Proposition 4.4.

If there is a directed path of length $\ell$ from $v_{0}$ to $v$ , then $s(v)\leqslant\ell$ .

Proof.

If $P$ is a directed path from $v_{0}$ to $v$ , then every $E(P)$ -compatible path $v_{0}\ldots v_{\ell}$ with $v_{\ell}\notin\text{Int}(G)$ includes $P$ as a prefix, and hence also includes $v$ . Thus, $E(P)$ is a $v$ -switcher of depth $\ell$ . ∎

To complete this section, the required lower bound on $\mathbb{P}(u\in\text{Path}_{G}(x,y))$ is provided by the following lemma. Note that as well as improving the naive approach by using $\text{Depth}(A)$ instead of $|A|$ , we also deduce a result of $\gamma^{s(v)}$ instead of $\gamma^{2s(v)}$ by carefully accounting for the fact that at most one player can visit each possible game position (due to the previous assumption that the impartial combinatorial games considered are acyclic).

Lemma 4.5.

Suppose that $p\in\prod_{v\in\text{\emph{Int}}(G)}\mathcal{P}_{\gamma}(F(v))$ and $x,y\sim\text{\emph{Univ}}(\mathcal{X}_{G},p)$ . Then for every $v\in V$ , $\mathbb{P}(v\in\text{\emph{Path}}_{G}(x,y))\geqslant\gamma^{s(v)}$ .

Proof.

The distribution of $\text{Path}_{G}(x,y)$ is the same as the random set $P$ produced by the following process.

1.

Initially, set $z_{0}=v_{0}$ .
2.
For $i\geqslant 0$ , do the following.
1. (a)
  
  If $z_{i}\notin\text{Int}(G)$ , then set $P=\{z_{0},\ldots,z_{i}\}$ .
2. (b)
  
  Otherwise if $z_{i}\in\text{Int}(G)$ , sample $z_{i+1}\sim p(z_{i})$ .

We will generate an instance of the above process in a very specific way using a collection of independent $\text{Unif}([0,1])$ random variables. First, let $A$ be a $v$ -switcher of depth $s(v)$ , and let $B=\{u\in V:\text{$(u,w)\in A$ for some $w\in F(u)$}\}$ . Next, for each $u\in V$ , let $\phi_{u}:[0,1]\to F(u)$ be any function satisfying the following properties.

D1

If $X\sim\text{Unif}([0,1])$ , then $\mathbb{P}(\phi_{u}(X)=w)=p(u,w)$ for every $u\in V$ .
D2

If $s\in[0,\gamma]$ and $u\in B$ , then $(u,\phi_{u}(s))\in A$ .

Note that this is possible because $p(u,w)\geqslant\gamma$ holds for every $u\in V$ and $w\in F(u)$ . The modified process is then as follows.

0.

Let $X_{1},\ldots,X_{n},Y_{1},\ldots,Y_{n}$ be independent $\text{Unif}([0,1])$ random variables.
1.

Initially, set $z_{0}=v_{0}$ .
2.
For $i\geqslant 0$ , do the following
1. (a)
  
  If $z_{i}\notin\text{Int}(G)$ , then set $Q=\{z_{0},\ldots,z_{i}\}$ .
2. (b)
  
  If $z_{i}\in B$ , then set $r_{i}=|\{z_{0},\ldots,z_{i}\}\cap B|$ and $z_{i+1}=\phi_{z_{i}}(X_{r_{i}})$ .
3. (c)
  
  If $z_{i}\in\text{Int}(G)\setminus B$ , then set $r_{i}=|\{z_{0},\ldots,z_{i}\}\setminus B|$ and $z_{i+1}=\phi_{z_{i}}(Y_{r_{i}})$ .

From D1 it follows that $Q$ has the same distribution as $P$ , and hence also as $\text{Path}_{G}(x,y)$ .

We now claim that $v\in Q$ whenever $X_{1},\ldots,X_{s(v)}\in[0,\gamma]$ . The key observation is that under this regime, the first $s(v)$ visits that $Q$ makes to $B$ must be followed by an edge in $A$ . Let us label $Q=z_{0}z_{1}\ldots z_{\ell}$ . We will show by induction on $i$ that $z_{0}\ldots z_{i}$ is $A$ -compatible for every $i\in[\ell]$ , noting that the case $i=0$ holds because $z_{0}=v_{0}$ . For the inductive step, if $z_{0}\ldots z_{i}$ is $A$ -compatible, then the only way for $z_{0}\ldots z_{i+1}$ to not be $A$ -compatible is to have $z_{i}\in B$ and $(z_{i},z_{i+1})\notin A$ . But then, because the first $s(v)$ visits that $Q$ makes to $B$ are followed by an edge in $A$ , we can infer that $z_{0}\ldots z_{i}$ already includes at least $s(v)$ edges in $A$ . Letting $w\in V$ be such that $(z_{i},w)\in A$ , we then have that $R:=z_{0}\ldots z_{i}w$ is a directed path with $|E(R)\cap A|\geqslant s(v)+1$ , a contradiction to the depth of $A$ . So in fact the inductive step holds, and $z_{0}\ldots z_{i}$ is $A$ -compatible for every $i$ . In particular, $Q$ is $A$ -compatible, and hence $v\in Q$ .

Thus, $v\in Q$ whenever $X_{1},\ldots,X_{s(v)}\in[0,\gamma]$ , and hence

\displaystyle\mathbb{P}(v\in\text{Path}_{G}(x,y))=\mathbb{P}(v\in Q)\geqslant% \mathbb{P}(X_{1},\ldots,X_{s(v)}\leqslant\gamma)=\gamma^{s(v)},

as required. ∎

5 Main result

In order to state runtime results, we adopt the standard black box convention where runtime is defined as the number of times a function is queried until the algorithm reaches the desired search objective (see [14]), as follows.

Definition 5.1.

Suppose that $G$ is an impartial combinatorial game, and that $\mathcal{A}$ is an algorithm which makes $\tau$ queries of $f_{G}$ during each generation. Then, given a set $B\subseteq\mathcal{X}_{G}$ , the runtime of $\mathcal{A}$ on $f_{G}$ is defined to be the random variable

T_{\mathcal{A}}^{G}(B)=\tau\cdot\min{\{t:P_{t}\cap B\neq\emptyset\}}

where $P_{t}\subseteq\mathcal{X}_{G}$ is the population of $\mathcal{A}$ at the start of generation $t$ . (If the game $G$ is clear from context, we will write $T_{\mathcal{A}}$ instead of $T_{\mathcal{A}}^{G}$ .)

Our main result is now provided by Theorem 5.2. In simple terms, it states that if Algorithm 1 is executed on an impartial combinatorial game $G$ using a sufficiently large population size $\mu$ , then with high probability its runtime is at most $O(\mu\cdot r(G))$ , where $r(G)$ is a formula of the game graph expressed in terms of its number of vertices $n$ , maximum degree $\Delta$ , and a summation involving the switchability $s(v)$ (Definition 4.3) at each critical position $v\in W_{G}$ (Definition 2.1). Notably, $r(G)$ is increasing in each of $n$ , $\Delta$ , and $s(v)$ , indicating that games for which these quantities are high may be the most difficult to optimise. We remark that the exact parameter settings for Algorithm 1 appearing in the statement have not been chosen to guarantee an optimal runtime, but rather to make the proof more comprehensible.

Theorem 5.2.

There is a constant $C>0$ such that the following holds. Let $G$ be an $n$ -vertex impartial combinatorial game with maximum degree $\Delta$ , and let $\hat{s}=\max_{v\in W_{G}}s(v)$ . Let $K>0$ , and let $\mathcal{A}$ be described by Algorithm 1, where $\gamma=1/(20\Delta n)$ and

\mu\geqslant C(K+\hat{s}+1)(20\Delta n)^{1+2\hat{s}}\log{n}.

(10)

Then,

\mathbb{P}\Biggl{[}T_{\mathcal{A}}^{G}(\text{\emph{Opt}}(G))\geqslant C\mu\sum% _{v\in W_{G}}(20\Delta n)^{s(v)}\log{n}\Biggr{]}\leqslant n^{-K}.

The asymptotic behaviour of the runtime bound may not be immediately obvious from the form stated here. Accordingly, we will shortly provide an easier to digest corollary using the facts $s(v)\leqslant\hat{s}\leqslant\max_{v\in V}s(v)$ and $|W_{G}|\leqslant n$ to remove the role of $W_{G}$ and the corresponding summation. For many games (including the applications considered later), this simplified bound has the same asymptotic behaviour as the one provided by Theorem 5.2. Nonetheless, as it is possible to construct games for which Theorem 5.2 offers significant improvement of the simplified bound, we opt to retain the more general form above.

We will now briefly provide some intuition for the proof of Theorem 5.2. As characterised by Lemma 2.2, we know that any strategy $x\in\mathcal{X}_{G}$ that makes the correct decision at every critical position $v\in W_{G}$ is an element of $\text{Opt}(G)$ (where here, making a correct decision means ensuring $x(v)$ has a Sprague-Grundy value of $0$ ). Thus, we consider the sequence $p_{0},p_{1},\ldots$ appearing in Algorithm 1, and estimate the time until the algorithm arrives at some $p$ such that, with high probability, an $x\sim\text{Univ}(\mathcal{X}_{G},p_{t})$ makes the correct decision at every critical position. The progress to arrive at such a $p$ is effectively broken down into $|W_{G}|$ steps: fixing an ordering $u_{1},\ldots,u_{n}$ of $V$ such that $F(u_{i})\subseteq\{u_{1},\ldots,u_{i-1}\}$ for every $i\in[n]$ (a reverse topological ordering), step $k$ finishes when, with high probability, an $x\sim\text{Univ}(\mathcal{X}_{G},p_{t})$ makes the correct decision at the first $k$ critical positions appearing in the ordering. Bounding the length of time to complete step $k$ is accomplished by combining Lemmas 2.2, 3.2, and 4.5 to show that if sampled individuals are usually making the correct decision at the first $k$ critical positions in the ordering, then the algorithm has a bias towards retaining individuals who also make the correct decision at the next critical position in the ordering. Note that this step-by-step process does not appear explicitly in the proof, but is implicit from the definition of a function $\hat{g}$ measuring progress towards the optimality condition (see (12)).

Proof of Theorem 5.2.

First, let us introduce some further notation. Given a set $V^{\prime}\subseteq V$ we will write $p_{t}(u,V^{\prime})=\sum_{v\in V^{\prime}}p(u,v)$ . Recalling that Algorithm 1 ensures that $p_{t}\in\prod_{v\in\text{Int}(G)}\mathcal{P}_{\gamma}(F(v))$ at every step, we will write $\mathcal{Q}=\prod_{v\in\text{Int}(G)}\mathcal{P}_{\gamma}(F(v))$ . Let $h:V\to\mathbb{N}_{0}$ denote the Sprague-Grundy function, and let $V_{0}=h^{-1}(\{0\})$ and $V_{1}=V\setminus V_{0}$ . We also will assume that $n\geqslant 3$ , as any impartial combinatorial game $G$ with $n<3$ satisfies $\text{Opt}(G)=\mathcal{X}_{G}$ (such games satisfy $\Delta\leqslant 1$ , and so in fact $|\mathcal{X}_{G}|=1$ in these cases).

Let $u_{1},\ldots,u_{n}$ be an ordering of $V$ such that $F(u_{i})\subseteq\{u_{1},\ldots,u_{i-1}\}$ for every $i\in[n]$ (note that such an ordering exists as $G$ is assumed to be acyclic). Let us write $A_{i}$ for the set of $p\in\mathcal{Q}$ such that $p(u,V_{1})\leqslant\textstyle{\frac{1}{10n}}$ for all $u\in W_{G}\cap\{u_{1},\ldots,u_{i}\}$ . If for some generation $t$ we have $p_{t}\in A_{n}$ , then for every $v\in W_{G}$ we have

q_{t}(v)(V_{0})\overset{\emph{\ref{pi-3}}}{\geqslant}\pi_{\gamma}^{F(v)}(q_{t}% (v))(V_{0})-\gamma|F(v)|=p_{t}(v,V_{0})-\gamma|F(v)|\geqslant 1-\textstyle{% \frac{1}{10n}}-\gamma\Delta>1-\textstyle{\frac{1}{5n}}.

(11)

Recalling from Lemma 2.2 that if $x\in\mathcal{X}_{G}\setminus\text{Opt}(G)$ then $x(v)\in V_{1}$ for some $v\in W_{G}$ , we can deduce

	$\displaystyle\|\{j\in[\mu]:P_{t}(j)\notin\text{Opt}(G)\}\|$	$\displaystyle\leqslant\sum_{v\in W_{G}}\|\{j\in[\mu]:P_{t}(j)(v)\in V_{1}\}\|=% \sum_{v\in W_{G}}\mu\cdot q_{t}(v)(V_{1})$
		$\displaystyle=\sum_{v\in W_{G}}\mu\cdot(1-q_{t}(v)(V_{0}))\overset{\eqref{eq:q% -opt-bound}}{<}\frac{\|W_{G}\|\mu}{5n}\leqslant\frac{\mu}{5}<\mu,$

and hence $P_{t}\cap\text{Opt}(G)\neq\emptyset$ . In particular, if $T^{\ast}=\min{\{t:p_{t}\in A_{n}\}}$ then $T_{\mathcal{A}}^{G}(\text{Opt}(G))\leqslant\mu\cdot T^{\ast}$ .

We will define a map $\hat{g}:\mathcal{Q}\to\mathbb{R}_{\geqslant 0}$ that will measure progress towards $A_{n}$ . To do this, first let $g:[\gamma,1-\gamma]\to\mathbb{R}_{\geqslant 0}$ be given by

g(y)=\log{\left(\frac{y}{1-y}\right)}-\log{\left(\frac{\gamma}{1-\gamma}\right% )},

so that $g$ is a monotone increasing function. Then, given $p\in\mathcal{Q}$ , let $\ell(p)=\max{\{i\in[n]:p\in A_{i}\}}$ and define

\hat{g}(p)=\begin{cases}\scalebox{0.85}{$\displaystyle\sum_{i\in[\ell(p)]}% \text{1}(u_{i}\in W_{G})\cdot\left(g(1-\gamma)\cdot\left(\frac{32}{\gamma^{s(u% _{i})}}\right)+1\right)+g(p(u_{\ell(p)+1},V_{0}))\cdot\left(\frac{32}{\gamma^{% s(u_{\ell(p)+1})}}\right)$}&\text{if $\ell(p)<n$,}\\ \scalebox{0.85}{$\displaystyle\sum_{i\in[\ell(p)]}\text{1}(u_{i}\in W_{G})% \cdot\left(g(1-\gamma)\cdot\left(\frac{32}{\gamma^{s(u_{i})}}\right)+1\right)$% }&\text{if $\ell(p)=n$.}\end{cases}

(12)

Define also $g_{\text{max}}=\max_{p\in\mathcal{Q}}\hat{g}(p)$ , and note that $p\in A_{n}$ if and only if $\hat{g}(p)=g_{\text{max}}$ . The motivation for the function $\hat{g}:\mathcal{Q}\to\mathbb{R}_{\geqslant 0}$ is that the value of $\hat{g}(p_{t})$ increases as $p_{t}$ moves through $\mathcal{Q}$ towards $A_{n}$ . Indeed, the first term of (12) is a summation depending on $\ell(p)$ only; its role ensures that $g(p)\geqslant g(p^{\prime})$ whenever $p\in A_{i}$ and $p^{\prime}\notin A_{i}$ , and hence $\hat{g}$ increases true to the sequence $A_{0}\supseteq A_{1}\supseteq\ldots\supseteq A_{n}$ . The second term measures progress within some $A_{i}$ as we move towards $A_{i+1}$ (it increases as the value of $p_{t}(u_{i},V_{1})$ decreases toward $\textstyle{\frac{1}{10n}}$ ).

Denote $X_{t}(i)=g(p_{t}(u_{i},V_{0}))$ . We will later show the following two claims, where the second is a direct consequence of the first.

Claim 5.3.

If $p_{t}\in A_{i-1}$ and $u_{i}\in W_{G}$ , then

\mathbb{P}(X_{t+1}(i)\leqslant\min{\{g(1-2\Delta\gamma),X_{t}(i)+\gamma^{s(u_{% i})}/32\}})\leqslant n^{-K-3\hat{s}-5}.

Claim 5.4.

If $p_{t}\notin A_{n}$ , then $\mathbb{P}(\hat{g}(p_{t+1})\geqslant\hat{g}(p_{t})+1)\geqslant 1-n^{-K-3\hat{s% }-4}$ .

Claim 5.4 asserts that if $p_{t}$ has not yet reached $A_{n}$ , then we should expect the value of $\hat{g}(p_{t})$ to increase by at least $1$ in the next generation. However, $\hat{g}(p_{t})$ cannot increase by at least $1$ more than $\lfloor g_{\text{max}}\rfloor$ times. Precisely, if $T^{\ast}>g_{\text{max}}$ then we must have $p_{t}\notin A_{n}$ and $\hat{g}(p_{t})<g_{\text{max}}$ for every $t\leqslant g_{\text{max}}$ . In particular, it would then hold that $\hat{g}(p_{t})<\hat{g}(p_{t-1})+1$ for some $t\in[\lfloor g_{\text{max}}\rfloor]$ . Therefore, using a union bound with Claim 5.4, we have

\mathbb{P}[T^{\ast}>g_{\text{max}}]\leqslant g_{\text{max}}\cdot n^{-K-3\hat{s% }-4}.

(13)

Noting (using Lemma A.3) that

g(1-\gamma)\overset{\emph{\ref{g-simp-3}}}{\leqslant}2\log{(1/\gamma)}=2\log{(% 20\Delta n)}\leqslant 5\log{n}-1,

(14)

we can bound

$\displaystyle g_{\text{max}}=\sum_{v\in W_{G}}\left(g(1-\gamma)\cdot\left(% \frac{32}{\gamma^{s(v)}}\right)+1\right)$	$\displaystyle\overset{\eqref{eq:g-gamma}}{\leqslant}\sum_{v\in W_{G}}(5\log{n}% )\cdot 32\cdot(20\Delta n)^{s(v)}$
	$\displaystyle\hskip 2.84544pt<C\sum_{v\in W_{G}}(20\Delta n)^{s(v)}\log{n}$	(15)
	$\displaystyle\hskip 2.84544pt\leqslant C\|W_{G}\|(20\Delta n)^{\hat{s}}\log{n}% \leqslant n^{3\hat{s}+4}.$	(16)

We now have

	$\displaystyle\mathbb{P}\Bigl{[}T_{\mathcal{A}}^{G}(\text{Opt}(G))\geqslant C\mu$	$\displaystyle\sum_{v\in W_{G}}(20\Delta n)^{s(v)}\log{n}\Bigr{]}\leqslant% \mathbb{P}\Bigl{[}T^{\ast}\geqslant C\sum_{v\in W_{G}}(20\Delta n)^{s(v)}\log{% n}\Bigr{]}$
		$\displaystyle\overset{\eqref{eq:df1}}{\leqslant}\mathbb{P}\left[T^{\ast}>g_{% \text{max}}\right]\overset{\eqref{eq:T-ast-tau}}{\leqslant}g_{\text{max}}\cdot n% ^{-K-3\hat{s}-4}\overset{\eqref{eq:df2}}{\leqslant}n^{3\hat{s}+4}\cdot n^{-K-3% \hat{s}-4}=n^{-K},$

as required. Therefore, all that remains is to prove Claims 5.3 and 5.4.

Proof of Claim 5.3.

Assume $x,y\sim\text{Univ}(\mathcal{X}_{G},p_{t})$ are independent. To assist with this claim, we will introduce some further notation. Let $r=\mathbb{P}(u_{i}\in\text{Path}_{G}(x,y))$ , and note that from Lemma 4.5 we have

r\geqslant\gamma^{s(u_{i})}.

(17)

Given $w\in V$ , let us write $N_{w}$ as a shorthand for the event $f_{G}^{w}(x,y)=1$ , and note that because $x$ and $y$ are independent and identically distributed,

\mathbb{P}(N_{w})=\mathbb{P}(f_{G}^{w}(x,y)=1)=\mathbb{P}(f_{G}^{w}(y,x)=1).

(18)

Finally, let us also write $F_{0}=F(u_{i})\cap V_{0}$ and $F_{1}=F(u_{i})\cap V_{1}$ .

Given $v\in V$ , we wish to consider $\mathbb{P}(z(u_{i})=v)$ , where $z$ is the winner of the game $G$ played between $x$ and $y$ (as in Lemma 3.2). This will be useful, as the individuals $P_{t+1}(1),\ldots,P_{t+1}(\mu)$ selected in lines 10-17 of Algorithm 1 are independent and with the same distribution as $z$ , and hence for every $v\in V$ ,

\mu\cdot q_{t+1}(u_{i},v)\sim\text{Bin}(\mu,\mathbb{P}(z(u_{i})=v)).

(19)

To analyse $\mathbb{P}(z(u_{i})=v)$ , first note that we have

1-\mathbb{P}(N_{u_{i}})=\mathbb{P}(f_{G}^{u_{i}}(x,y)=-1)=\sum_{w\in F(u_{i})}% p_{t}(u_{i},w)\mathbb{P}(f_{G}^{w}(y,x)=1)\overset{\eqref{eq:Nw-equiv}}{=}\sum% _{w\in F(u_{i})}p_{t}(u_{i},w)\mathbb{P}(N_{w}).

(20)

Therefore, applying Lemma 3.2 with $u=u_{i}$ ,

$\displaystyle\mathbb{P}(z(u_{i})=v)$	$\displaystyle\overset{\eqref{eq:sampled-individual}}{=}p_{t}(u_{i},v)\cdot[1+r% \cdot(1-\mathbb{P}(N_{v})-\mathbb{P}(N_{u_{i}}))]$
	$\displaystyle\overset{\eqref{eq:Nui-expansion}}{=}p_{t}(u_{i},v)\cdot\Bigl{[}1% +r\cdot\Bigl{(}-\mathbb{P}(N_{v})+\sum_{w\in F(u_{i})}p_{t}(u_{i},w)\mathbb{P}% (N_{w})\Bigr{)}\Bigr{]}$
	$\displaystyle=p_{t}(u_{i},v)\cdot\Bigl{[}1+r\cdot\Bigl{(}-\mathbb{P}(N_{v})+% \sum_{w\in F_{0}}p_{t}(u_{i},w)\mathbb{P}(N_{w})+\sum_{w\in F_{1}}p_{t}(u_{i},% w)\mathbb{P}(N_{w})\Bigr{)}\Bigr{]}.$	(21)

In particular, we also have

\mathbb{P}(z(u_{i})=v)\overset{\eqref{eq:cg-sel-press}}{\geqslant}p(u_{i},v)% \cdot[1-r].

(22)

Next, we would like to place some simple bounds on $\mathbb{P}(N_{w})$ for $w\in F_{0}\cup F_{1}$ . If $w\in\{u_{1},\ldots,u_{i-1}\}$ satisfies $h(w)\neq 0$ , then by using the fact that $p_{t}\in A_{i-1}$ ,

	$\displaystyle\mathbb{P}(N_{w})$	$\displaystyle=\mathbb{P}(f_{G}^{w}(x,y)=1)\overset{\text{Lemma~{}\ref{lm:% optimality-characterisation}}}{\geqslant}\mathbb{P}(\text{$h(x(v))=0$ for all % $v\in W_{G}\cap\{u_{1},\ldots,u_{i-1}\}$})$
		$\displaystyle=\prod_{v\in W_{G}\cap\{u_{1},\ldots,u_{i-1}\}}p_{t}(v,V_{0})=% \prod_{v\in W_{G}\cap\{u_{1},\ldots,u_{i-1}\}}(1-p_{t}(v,V_{1}))$
		$\displaystyle\geqslant(1-\textstyle{\frac{1}{10n}})^{n}\geqslant\textstyle{% \frac{9}{10}}.$

On the other hand, if $w\in\{u_{1},\ldots,u_{i-1}\}$ satisfies $h(w)=0$ , then by using the fact that $p_{t}\in A_{i-1}$ ,

\displaystyle 1-\mathbb{P}(N_{w})

\displaystyle=\mathbb{P}(f_{G}^{w}(x,y)=-1)\overset{\text{Lemma~{}\ref{lm:% optimality-characterisation}}}{\geqslant}\mathbb{P}(\text{$h(y(v))=0$ for all % $v\in W_{G}\cap\{u_{1},\ldots,u_{i-1}\}$})\geqslant\textstyle{\frac{9}{10}}.

In summary, we have

	$\displaystyle\mathbb{P}(N_{w})\leqslant\textstyle{\frac{1}{10}}$	$\displaystyle\qquad\text{whenever $w\in F_{0}$},$		(23)
	$\displaystyle\mathbb{P}(N_{w})\geqslant\textstyle{\frac{9}{10}}$	$\displaystyle\qquad\text{whenever $w\in F_{1}$}.$		(24)

Finally, we will apply Corollary A.2 to establish that certain events occur with very low probability. A straightforward numerical manipulation we will use after each application is that for every $v\in F(u_{i})$ , because $p_{t}(u_{i},v)\geqslant\gamma$ ,

	$\displaystyle\exp{\left(-\frac{r^{2}\mu p_{t}(u_{i},v)/16}{8(1+r/4)}\right)}$	$\displaystyle\overset{\eqref{eq:r-bound}}{\leqslant}\exp{\left(-\frac{\gamma^{% 2s(u_{i})+1}\mu}{200}\right)}\leqslant\exp{\left(-\frac{\gamma^{2\hat{s}+1}\mu% }{200}\right)}$
		$\displaystyle\overset{\eqref{eq:mu-lower}}{\leqslant}\exp{\left(-\frac{C(K+% \hat{s}+1)\log{n}}{200}\right)}\leqslant\frac{1}{2}n^{-K-3\hat{s}-6}.$		(25)

We now complete the proof of the claim by dividing into two cases. Note that the properties E1-E3 and F1-F2 quoted hereafter are from the results of Section A.

Case 1: $p_{t}(u_{i},F_{1})\leqslant\textstyle{\frac{1}{2}}$ . If $v\in F_{1}$ , then

	$\displaystyle\mathbb{P}(z(u_{i})=v)$	$\displaystyle\overset{\eqref{eq:cg-sel-press},\eqref{eq:Nw-F0},\eqref{eq:Nw-F1% }}{\leqslant}p_{t}(u_{i},v)\cdot[1+r\cdot(-\textstyle{\frac{9}{10}}+\textstyle% {\frac{1}{10}}p_{t}(u_{i},F_{0})+p_{t}(u_{i},F_{1}))]$
		$\displaystyle\hskip 19.91684pt\leqslant p_{t}(u_{i},v)\cdot[1-\textstyle{\frac% {1}{4}}r].$		(26)

By using (19) and (26) to apply E2 with $\alpha=r/4$ , it holds for any fixed $v\in F_{1}$ that

\mathbb{P}(q_{t+1}(u_{i},v)>(1-r/8)p_{t}(u_{i},v))\leqslant\exp{\left(-\frac{r% ^{2}\mu p_{t}(u_{i},v)/16}{8(1+r/4)}\right)}\overset{\eqref{eq:bb-manip}}{% \leqslant}\frac{1}{2}n^{-K-3\hat{s}-6}\leqslant n^{-K-3\hat{s}-6}.

Therefore, by taking a union bound over $F_{1}$ , it occurs with probability at least $1-n^{-K-3\hat{s}-5}$ that

\displaystyle q_{t+1}(u_{i},v)\leqslant(1-r/8)p_{t}(u_{i},v)

for every

v\in F_{1}

(27)

and so we proceed under the assumption that this occurs. Note that this automatically gives us for any $v\in F_{1}$ that

p_{t+1}(u_{i},v)=\pi_{\gamma}^{F(u_{i})}(q_{t+1}(u_{i},\,\cdot\,))(v)\overset{% \emph{\ref{pi-2}}}{\leqslant}\max{\{\gamma,q_{t+1}(u_{i},v)\}}\overset{\eqref{% eq:c1-hp1}}{\leqslant}\max{\{\gamma,p_{t}(u_{i},v)\}}=p_{t}(u_{i},v).

(28)

So if $p_{t}(u_{i},F_{1})\leqslant 2\Delta\gamma$ then $p_{t+1}(u_{i},F_{1})\leqslant 2\Delta\gamma$ and hence $X_{t+1}(i)=g(p_{t+1}(u_{i},V_{0}))=g(p_{t+1}(u_{i},F_{0}))=g(1-p_{t+1}(u_{i},F% _{1}))\geqslant g(1-2\Delta\gamma)$ . On the other hand, if $p_{t}(u_{i},F_{1})\geqslant 2\Delta\gamma$ then there is some $v\in F_{1}$ such that $p(u_{i},v)\geqslant\textstyle{\frac{1}{\Delta}}p_{t}(u_{i},F_{1})\geqslant 2\gamma$ , and hence

$\displaystyle p_{t+1}(u_{i},F_{1})$	$\displaystyle=p_{t+1}(u_{i},v)+p_{t+1}(u_{i},F_{1}\setminus\{v\})\overset{% \emph{\ref{pi-2}}}{\leqslant}\max{\{\gamma,q_{t+1}(u_{i},v)\}}+p_{t+1}(u_{i},F% _{1}\setminus\{v\})$
	$\displaystyle\overset{\eqref{eq:c1-hp1},\eqref{eq:pt-on-F1}}{\leqslant}\max{\{% \gamma,(1-r/8)p_{t}(u_{i},v)\}}+p_{t}(u_{i},F_{1}\setminus\{v\})$
	$\displaystyle=(1-r/8)p_{t}(u_{i},v)+p_{t}(u_{i},F_{1}\setminus\{v\})$
	$\displaystyle=p_{t}(u_{i},F_{1})-(r/8)p_{t}(u_{i},v)\overset{\eqref{eq:r-bound% }}{\leqslant}(1-\gamma^{s(u_{i})}/8)p_{t}(u_{i},F_{1}).$	(29)

In particular, this would then imply that

	$\displaystyle X_{t+1}(i)$	$\displaystyle=g(p_{t+1}(u_{i},V_{0}))=g(1-p_{t+1}(u_{i},V_{1}))\overset{\eqref% {eq:p-adjust-1}}{\geqslant}g(1-(1-(\gamma^{s(u_{i})}/8))p_{t}(u_{i},V_{1}))$
		$\displaystyle\overset{\emph{\ref{g-simp-2}}}{\geqslant}g(1-p_{t}(u_{i},V_{1}))% +\frac{\gamma^{s(u_{i})}}{16}=X_{t}(i)+\frac{\gamma^{s(u_{i})}}{16}.$

Combining the cases $p_{t}(u_{i},F_{1})\leqslant 2\Delta\gamma$ and $p_{t}(u_{i},F_{1})\geqslant 2\Delta\gamma$ shows that the event $X_{t+1}(i)\geqslant\min{\{g(1-2\Delta\gamma),X_{t}(i)+\gamma^{s(u_{i})}/32\}}$ holds with probability at least $1-n^{-K-3\hat{s}-5}$ .

Case 2: $p_{t}(u_{i},F_{1})\geqslant\textstyle{\frac{1}{2}}$ . If $v\in F_{0}$ , then

	$\displaystyle\mathbb{P}(z(u_{i})=v)$	$\displaystyle\overset{\eqref{eq:cg-sel-press},\eqref{eq:Nw-F0},\eqref{eq:Nw-F1% }}{\geqslant}p_{t}(u_{i},v)\cdot[1+r\cdot(-\textstyle{\frac{1}{10}}+\textstyle% {\frac{9}{10}}p_{t}(u_{i},F_{1}))]$
		$\displaystyle\hskip 19.91684pt\geqslant p_{t}(u_{i},v)\cdot[1+\textstyle{\frac% {1}{4}}r].$		(30)

By using (19) and (30) to apply E1 with $\alpha=r/4$ , it holds for every $v\in F_{0}$ that

\mathbb{P}(q_{t+1}(u_{i},v)<(1+r/8)p_{t}(u_{i},v))\leqslant\exp{\left(-\frac{r% ^{2}\mu p_{t}(u_{i},v)/16}{8(1+r/4)}\right)}\overset{\eqref{eq:bb-manip}}{% \leqslant}\frac{1}{2}n^{-K-3\hat{s}-6}.

By using (19) and (22) to apply E3 with $\alpha=r$ , it holds for every $v\in F(u_{i})$ that

\mathbb{P}(q_{t+1}(u_{i},v)<(1-2r)p_{t}(u_{i},v))\leqslant\exp{\left(-\frac{r^% {2}\mu p_{t}(u_{i},v)/16}{8(1+r/4)}\right)}\overset{\eqref{eq:bb-manip}}{% \leqslant}\frac{1}{2}n^{-K-3\hat{s}-6}.

Therefore, by taking a union bound over $F_{0}$ and also $F(u_{i})$ , it occurs with probability at least $1-n^{-K-3\hat{s}-5}$ that

	$\displaystyle q_{t+1}(u_{i},v)\geqslant(1+r/8)p_{t}(u_{i},v)$	for every $v\in F_{0}$ ,		(31)
	$\displaystyle q_{t+1}(u_{i},v)\geqslant(1-2r)p_{t}(u_{i},v)$	for every $v\in F(u_{i})$ ,		(32)

and so we proceed under the assumption that this occurs. Recalling that $\gamma=1/(20\Delta n)$ and the assumption that $n\geqslant 3$ , we can now bound $\beta_{\gamma}^{-}(q_{t+1}(u_{i},\,\cdot\,))$ above as

	$\displaystyle\beta_{\gamma}^{-}(q_{t+1}(u_{i},\,\cdot\,))$	$\displaystyle=\sum_{v\in F(u_{i})}\max{\{\gamma-q_{t+1}(u_{i},v),0\}}\overset{% \eqref{eq:c2-hp2}}{\leqslant}\sum_{v\in F(u_{i})}\max{\{\gamma-(1-2r)\gamma,0\}}$
		$\displaystyle\leqslant\sum_{v\in F(u_{i})}2r\gamma\leqslant 2\Delta r\gamma=% \frac{r}{10n}\leqslant\frac{r}{30}\leqslant\frac{r}{24}\cdot(1-\textstyle{% \frac{1}{60}})\leqslant\displaystyle\frac{r}{24}\cdot(1-\gamma\Delta).$		(33)

Hence, using that $q_{t+1}(u_{i},v)\geqslant\gamma$ for every $v\in F_{0}$ ,

	$\displaystyle p_{t+1}(u_{i},F_{0})$	$\displaystyle=\pi_{\gamma}^{F(u_{i})}(q_{t+1}(u_{i},\,\cdot\,))(F_{0})\overset% {\emph{\ref{pi-1}}}{\geqslant}\left(1-\frac{\beta_{\gamma}^{-}(q_{t+1}(u_{i},% \,\cdot\,))}{1-\gamma\Delta}\right)q_{t+1}(u_{i},F_{0})$
		$\displaystyle\overset{\eqref{eq:b-minus},\eqref{eq:c2-hp1}}{\geqslant}(1-r/24)% (1+r/8)p_{t}(u_{i},F_{0})\overset{\eqref{eq:r-bound}}{\geqslant}(1+\gamma^{s(u% _{i})}/16)p_{t}(u_{i},F_{0}).$		(34)

This would then imply that

	$\displaystyle X_{t+1}(i)$	$\displaystyle=g(p_{t+1}(u_{i},F_{0}))\overset{\eqref{eq:p-adjust-2}}{\geqslant% }g((1+\gamma^{s(u_{i})}/16)p_{t}(u_{i},F_{0}))$
		$\displaystyle\overset{\emph{\ref{g-simp-1}}}{\geqslant}g(p_{t}(u_{i},F_{0}))+% \frac{\gamma^{s(u_{i})}}{32}=X_{t}(i)+\frac{\gamma^{s(u_{i})}}{32}.$

Thus, the event $X_{t+1}(i)\geqslant\min{\{g(1-2\Delta\gamma),X_{t}(i)+\gamma^{s(u_{i})}/32\}}$ holds with probability at least $1-n^{-K-3\hat{s}-5}$ . ∎

Proof of Claim 5.4.

Suppose that $p_{t}\notin A_{n}$ , so that $\ell:=\ell(p_{t})<n$ . For every $i\in[\ell]$ with $u_{i}\in W_{G}$ , it follows from the fact that $p_{t}\in A_{i}$ that $p(u_{i},V_{0})=1-p(u_{i},V_{1})\geqslant 1-\textstyle{\frac{1}{10n}}$ and hence

X_{t}(i)=g(p(u_{i},V_{0}))\geqslant g(1-\textstyle{\frac{1}{10n}})=g(1-2\Delta% \gamma).

(35)

Let $I=\{i\in[\ell+1]:u_{i}\in W_{G}\}$ so that $\ell+1\in I$ . For $i\in I$ , let $E_{i}$ be the event that

X_{t+1}(i)\geqslant\min{\{g(1-2\Delta\gamma),X_{t}(i)+\gamma^{s(u_{i})}/32\}}.

We will now show that if $E_{i}$ holds for every $i\in I$ , then $\hat{g}(p_{t+1})\geqslant\hat{g}(p_{t})+1$ . Indeed, if $E_{i}$ holds for every $i\in I$ , then it follows from (35) that $X_{t+1}(i)\geqslant g(1-2\Delta\gamma)$ for every $i\in[\ell]$ with $u_{i}\in W_{G}$ , and hence $p_{t+1}\in A_{\ell}$ . If additionally $\ell(p_{t+1})>\ell$ , then $\hat{g}(p_{t+1})\geqslant\hat{g}(p_{t})+1$ is immediate from (12). On the other hand, if $\ell(p_{t+1})=\ell$ , then $E_{\ell+1}$ implies $X_{t+1}(\ell+1)\geqslant X_{t}(\ell+1)+\gamma^{s(u_{\ell+1})}/32$ and hence,

\hat{g}(p_{t+1})-\hat{g}(p_{t})=(X_{t+1}(\ell+1)-X_{t}(\ell+1))\cdot\left(% \frac{32}{\gamma^{s(u_{\ell+1})}}\right)\geqslant 1.

Therefore, using a union bound we have

\displaystyle\mathbb{P}(\hat{g}(p_{t+1})\geqslant\hat{g}(p_{t})+1)\geqslant% \mathbb{P}(\wedge_{i\in I}E_{i})\geqslant 1-\sum_{i\in I}\mathbb{P}(E_{i}^{c})% \overset{\text{Claim~{}\ref{clm:cg-logit-drift}}}{\geqslant}1-|I|\cdot n^{-K-3% \hat{s}-5}\geqslant 1-n^{-K-3\hat{s}-4},

as required. ∎ $\square$

∎

For many applications, rather than applying Theorem 5.2 directly it will be convenient to use the following corollary.

Corollary 5.5.

There is a constant $C>0$ such that the following holds. Let $G$ be an impartial combinatorial game with maximum degree $\Delta$ , and let $\overline{s}=\max_{v\in V}s(v)$ . Let $K>0$ , and assume $\mathcal{A}$ uses parameters $\gamma=1/(20\Delta n)$ and $\mu=C(K+\overline{s}+1)(20\Delta n)^{1+2\overline{s}}\log{n}$ . Then,

\mathbb{P}[T_{\mathcal{A}}^{G}(\text{\emph{Opt}}(G))\geqslant C^{2}(K+% \overline{s}+1)(20\Delta n)^{2+3\overline{s}}\log^{2}{n}]\leqslant n^{-K}.

Proof.

By noting that

\hat{s}=\max_{v\in W_{G}}s(v)\leqslant\max_{v\in V}s(v)=\overline{s},

and

\sum_{v\in W_{G}}(20\Delta n)^{s(v)}\leqslant\sum_{v\in V}(20\Delta n)^{s(v)}% \leqslant\sum_{v\in V}(20\Delta n)^{\overline{s}}=n\cdot(20\Delta n)^{% \overline{s}}\leqslant(20\Delta n)^{\overline{s}+1},

this is an immediate consequence of Theorem 5.2. ∎

6 Applications

In this section we will apply Theorem 5.2 to obtain several runtimes for Algorithm 1 on a number of well-established combinatorial games. Throughout, we state runtimes in terms of $n$ , the number of possible game positions, and always assume that $\mathcal{A}$ is described by Algorithm 1. All described games are played under the normal play convention (that a player unable to move loses), as established in Section 2.

6.1 Subtraction Nim

Nim is a strategic game in which players take turns removing items from distinct heaps. Variants have been played across cultures since ancient history [56, 67], and it was also the game of choice for some of the earliest machines and computers dedicated to game playing [37, 44, 52]. Nim is also perhaps the most important impartial combinatorial game from a mathematical perspective, with the Sprague-Grundy theorem establishing that, for a particular formulation of equivalence which characterises strategic continuation, every position in any impartial combinatorial game is equivalent to some position of a one-heap game of Nim [7].

While the version central to combinatorial game theory typically allows players to remove any positive number of items on their turn, here we consider the well-known one-heap variant in which there is an upper limit on the number of items that can be taken at once (see, for example, [19]). Given parameters $n$ and $k$ , $\textsc{SubtractionNim}_{n}^{k}$ begins with an initial heap of $(n-1)$ items, and on each turn a player may remove between $1$ and $k$ items from the heap. The game graph for $\textsc{SubtractionNim}_{n}^{2}$ is shown in Figure 3. This game constitutes the simplest example of a subtraction game [26] of also a take-away game [59], both of which are expansive and well-studied classes of impartial combinatorial games. We have the following polynomial runtime for $\textsc{SubtractionNim}_{n}^{k}$ .

Proposition 6.1.

$\emph{{SubtractionNim}}_{n}^{k}$ satisfies $\overline{s}\leqslant 1$ and $\Delta\leqslant k$ . Thus, for each $K>0$ there exists $C>0$ such that for appropriately chosen parameters in Algorithm 1,

\mathbb{P}[T_{\mathcal{A}}(\text{\emph{Opt}}(\emph{{SubtractionNim}}_{n}^{k}))% \geqslant C(kn)^{5}\log^{2}{n}]\leqslant n^{-K}.

Proof.

For $\textsc{SubtractionNim}_{n}^{k}$ we have $V=\{0,1,\ldots,n-1\}$ , $v_{0}=n-1$ , and $F(v)=\{v-1,\ldots,v-k\}\cap V$ . Note that $V\setminus\text{Int}(G)=\{0\}$ .

We need to verify that $s(v)\leqslant 1$ for every $v\in V$ . Given $v$ , let $A_{v}=\{(v+i,v):i\in[k-1]\}\cap E(G)$ . We have $\text{Depth}(A_{v})=1$ , as any directed path in $G$ can visit $v$ at most once. To see that $A_{v}$ is a $v$ -switcher, suppose that $z_{0}\ldots z_{\ell}$ is an $A_{v}$ -compatible directed path from $z_{0}=n-1$ to $V\setminus\text{Int}(G)=\{0\}$ . Because at most $k$ items are removed on each turn, there is some $i$ such that $z_{i}\in\{v,v+1,\ldots,v+(k-1)\}$ . But then we either have $z_{i}=v$ or, in order for $z_{0}\ldots z_{i+1}$ to remain $A_{v}$ -compatible, $z_{i}z_{i+1}\in A_{v}$ and hence $z_{i+1}=v$ . In either case, we deduce that $v$ lies on every $A_{v}$ -compatible directed path from $n-1$ to $0$ . Thus, $A_{v}$ is a $v$ -switcher of depth $1$ , and hence $s(v)\leqslant 1$ .

From this, we have that $\overline{s}\leqslant 1$ . Combined with the observation that $\Delta\leqslant k$ , the result then follows from Corollary 5.5. ∎

6.2 Silver Dollar

We consider the variant of Silver Dollar played without the eponymous silver dollar [7, 15, 26]; however, it should be noted that Theorem 5.2 also implies a similar polynomial runtime for the original version of Silver Dollar attributed to de Bruijn (see also [7]).

Given parameters $m$ and $k$ , $\textsc{SilverDollar}_{m}^{k}$ is played using $k$ coins on a horizontal strip of $m$ squares, with the coins initially placed on the rightmost $k$ squares (most descriptions actually have the coins placed on arbitrary starting squares, however this does not significantly affect our analysis). A turn consists of moving one coin leftwards any number of spaces, provided the coin does not go past any other coins. In addition, coins may never occupy the same square. Assuming $k$ is a fixed constant, the number of game positions is $n=\binom{m}{k}$ . We have the following polynomial runtime for $\textsc{SilverDollar}_{m}^{k}$ .

Proposition 6.2.

Let $k\in\mathbb{N}$ be fixed. $\emph{{SilverDollar}}_{m}^{k}$ satisfies $\overline{s}\leqslant k$ and $\Delta\leqslant m=O(n^{1/k})$ . Thus, for each $K>0$ there exists $C>0$ such that for appropriately chosen parameters in Algorithm 1,

\mathbb{P}[T_{\mathcal{A}}(\text{\emph{Opt}}(\emph{{SilverDollar}}_{m}^{k}))% \geqslant Cn^{5+3k+(2/k)}\log^{2}{n}]\leqslant n^{-K}.

Proof.

On each turn, for each empty square there is at most one possible move that places a coin onto that square. Therefore, $\Delta\leqslant m-k\leqslant m=O(n^{1/k})$ . Next, any possible game position can be reached from the starting position in at most $k$ moves (simply move each coin in order from left to right onto the required square). Therefore, using Proposition 4.4, we have $\overline{s}\leqslant k$ . The required result then follows from Corollary 5.5 using these bounds on $\Delta$ and $\overline{s}$ . ∎

6.3 Turning Turtles

Here we consider one instance of a large class of coin turning games [3, 26]. Given a parameter $m$ , $\textsc{TurningTurtles}_{m}$ is played using a row of $m$ coins, initially all showing heads. A turn consists of turning over one coin from heads to tails, and then optionally turning over one more coin anywhere to the left of that one (regardless of whether it is showing heads or tails). Play continues until all coins show tails. Noting that the total number of game positions is $n=2^{m}$ , we have the following quasipolynomial runtime for $\textsc{TurningTurtles}_{m}$ .

Proposition 6.3.

$\emph{{TurningTurtles}}_{m}$ satisfies $\overline{s}\leqslant\log_{2}{n}$ and $\Delta\leqslant(\log_{2}{n})^{2}$ . Thus, for each $K>0$ there exists $c>0$ such that for appropriately chosen parameters in Algorithm 1,

\mathbb{P}[T_{\mathcal{A}}(\text{\emph{Opt}}(\emph{{TurningTurtles}}_{m}))% \geqslant n^{c\log{n}}]\leqslant n^{-K}.

Proof.

On each turn, there are at most $m$ possible moves that turn over only one coin and at most $\binom{m}{2}$ possible moves that turn over two coins. Therefore, $\Delta\leqslant m+\binom{m}{2}\leqslant m^{2}$ . Next, any possible game position can be reached from the starting position in at most $m$ moves (simply turn over the required coins from heads to tails one by one). Therefore, using Proposition 4.4 we have $\overline{s}\leqslant m$ . Noting that $m=\log_{2}{n}$ , and hence

C^{2}(K+\overline{s}+1)(20\Delta n)^{2+3\overline{s}}\log^{2}{n}\leqslant C^{2% }(K+\log_{2}{n}+1)(20(\log_{2}{n})^{2}n)^{2+3\log_{2}{n}}(\log{n})^{2}% \leqslant n^{c\log{n}},

the required result then follows from Corollary 5.5. ∎

6.4 Chomp

Since its introduction by Schuh [58] and later by Gale [21], Chomp has inspired a great deal of theoretical and empirical analysis, as well as numerous variants incorporating, for example, graphs and simplicial complexes [22]. While typically played on any rectangular board, we focus on square instances for the sake of conciseness.

Given a parameter $m$ , $\textsc{Chomp}_{m}$ is played on an $m\times m$ board. A turn consists of removing one square, as well as all squares to the right and above. However, if a player removes the square in the lower-left corner (the ‘poison’ square) they immediately lose. Note that to instantiate this game under our normal play convention, we can make removing the lower-left corner fatal by simply removing the position that has no remaining squares. We can establish the following quasipolynomial runtime for $\textsc{Chomp}_{m}$ .

Proposition 6.4.

$\emph{{Chomp}}_{m}$ satisfies $\overline{s}\leqslant O(\log_{2}{n})$ and $\Delta\leqslant O((\log_{2}{n})^{2})$ . Thus, for each $K>0$ there exists $c>0$ such that for appropriately chosen parameters in Algorithm 1,

\mathbb{P}[T_{\mathcal{A}}(\text{\emph{Opt}}(\emph{{Chomp}}_{m}))\geqslant n^{% c\log{n}}]\leqslant n^{-K}.

Proof.

In each possible game position, every row must be at least as long as the row above it. In particular, there is a correspondence between game positions and lattice paths (i.e., paths that only move right and down along the squares’ edges) from the top-left corner to bottom-right corner, with the path marking out the boundary of the remaining squares. Using stars and bars counting (see [4, Theorem 8.5.1] for a full treatment) and removing the position that has no remaining squares, the total number of game positions is $n=\binom{2m}{m}-1=\Theta(4^{m}/\sqrt{m})$ . On each turn, there are at most $m^{2}$ moves available, and so $\Delta\leqslant m^{2}$ . Next, any possible game position can be reached from the starting position in at most $m$ moves (simply make the appropriate chomp row by row working from top to bottom). Therefore, using Proposition 4.4, we have $\overline{s}\leqslant m$ . Thus, as with Proposition 6.3, we have $\Delta\leqslant m^{2}$ and $\overline{s}\leqslant m$ where $m=O(\log{n})$ , and so the result follows. ∎

7 Concluding remarks

Figure 4: A game that should be easy to optimise, but contains vertices with switchability

\Theta(n)

We conclude with some brief remarks about the main result and future work.

In order to accommodate the high degree of generality in Theorem 5.2, the proof makes a number assumptions about the route taken to the search objective. A notable one is that, if $v$ is the next critical position to be optimised, or one that has already been learned, then the probability $\mathbb{P}(v\in\text{Path}_{G}(x,y))$ that $v$ is encountered in a game played out by sampled individuals $x,y$ is bound below by $\gamma^{s(v)}$ . Lemma 4.5 demonstrates that analysis of $\mathbb{P}(v\in\text{Path}_{G}(x,y))$ is a major contribution to the eventual runtime, serving a role akin to a dynamic learning rate for the algorithm at position $v$ . A key insight is that encountering a large range of game positions by evaluating diverse sets of opponents is essential to an algorithm’s success. However, it is apparent that the general bound $\mathbb{P}(v\in\text{Path}_{G}(x,y))\geqslant\gamma^{s(v)}$ could be greatly improved through closer analysis of coevolutionary dynamics, especially for specific games. For example, if individuals often misplay at a winning position $v$ , opponents should begin to exploit this by steering the game towards $v$ ; the resulting feedback mechanism between $\mathbb{P}(v\in\text{Path}_{G}(x,y))$ and $p_{t}(v,\,\cdot\,)$ can assist more efficient learning.

A related assumption is that game positions are optimised sequentially, moving from the end of the game and working backwards, not unlike a recursive computation of the Sprague-Grundy function. However, this is not the route to optimality we would expect CoEAs to adopt for all games (consider Figure 4, where UMDA would naturally optimise starting from $v_{0}$ and working forwards). Moreover, because Lemma 2.2 is not a necessary condition, there is potential for CoEAs to demonstrate bias towards learning simpler elements of $\text{Opt}(G)$ without the need to implicitly deduce all zeros of the Sprague-Grundy function (for example, when played on a square board, there is an optimal strategy for Chomp that can be described by specifying an action at only $\Theta(m^{2})$ of the $\Theta(4^{m}/\sqrt{m})$ game positions).

In future work, we aim to provide more detailed analysis related to both of the above assumptions in order to provide stronger runtime results on classes of impartial combinatorial games. A longer term goal is the development of runtime analysis applicable to game representations that are practical even for games with exponentially many positions, such as in situations encountered in genetic programming.

References

[1] F. Ben Jedidia, B. Doerr, and M. S. Krejca. Estimation-of-distribution algorithms for multi-valued decision variables. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’23, page 230–238, 2023.
[2] A. Benford and P. K. Lehre. Runtime analysis of coevolutionary algorithms on a class of symmetric zero-sum games. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’24, page 1542–1550, 2024.
[3] E. R. Berlekamp, J. H. Conway, and R. K. Guy. Winning Ways for Your Mathematical Plays, volume 3. A. K. Peters, 2003.
[4] R. A. Brualdi. Introductory Combinatorics. Pearson, 5th edition, 2009.
[5] K. Chellapilla and D. Fogel. Evolving neural networks to play checkers without relying on expert knowledge. IEEE Transactions on Neural Networks, 10(6):1382–1391, 1999.
[6] F. Chung and L. Lu. Concentration inequalities and martingale inequalities: a survey. Internet Mathematics, 3(1):79 – 127, 2006.
[7] J. H. Conway. On Numbers and Games. A.K. Peters, 2nd edition, 2001.
[8] D.-C. Dang and P. K. Lehre. Simplified runtime analysis of estimation of distribution algorithms. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, GECCO ’15, page 513–518, 2015.
[9] D. DeCoste. The significance of Kasparov versus DEEP BLUE and the future of computer chess. ICGA Journal, 21(1):33–43, 1998.
[10] E. Demaine and R. Hearn. Playing games with algorithms: Algorithmic combinatorial game theory. In Games of No Chance 3, volume 56 of Mathematical Sciences Research Institute Publications, pages 3–56. Cambridge University Press, 2009.
[11] B. Doerr. The runtime of the compact genetic algorithm on jump functions. Algorithmica, 83(10):3059–3107, 2021.
[12] B. Doerr and F. Neumann. A survey on recent progress in the theory of evolutionary algorithms for discrete optimization. ACM Transactions on Evolutionary Learning and Optimization, 1(4), oct 2021.
[13] S. Droste. A rigorous analysis of the compact genetic algorithm for linear functions. Natural Computing, 5:257–283, 2006.
[14] S. Droste, T. Jansen, and I. Wegener. Upper and lower bounds for randomized search heuristics in black-box optimization. Theory of Computing Systems, 39(4):525–544, 2006.
[15] G. Farr and N. B. Ho. The Sprague–Grundy function for some nearly disjunctive sums of nim and silver dollar games. Theoretical Computer Science, 732:46–59, 2018.
[16] G. Ferrer and W. Martin. Using genetic programming to evolve board evaluation functions. In Proceedings of 1995 IEEE International Conference on Evolutionary Computation, volume 2, pages 747–752, 1995.
[17] S. G. Ficici. Solution concepts in coevolutionary algorithms. PhD thesis, Brandeis University, 2004.
[18] D. Fogel, T. Hays, S. Hahn, and J. Quon. A self-learning evolutionary chess program. Proceedings of the IEEE, 92(12):1947–1954, 2004.
[19] A. S. Fraenkel. Scenic trails ascending from sea-level nim to alpine chess and back. In Games of No Chance, volume 29 of Mathematical Sciences Research Institute Publications, pages 13–42. Cambridge University Press, 1996.
[20] A. S. Fraenkel and D. Lichtenstein. Computing a perfect strategy for n × n chess requires time exponential in n. Journal of Combinatorial Theory, Series A, 31(2):199–214, 1981.
[21] D. Gale. A curious nim-type game. American Mathematical Monthly, 81:876–879, 1974.
[22] I. García-Marco, K. Knauer, and L. P. Montejano. Chomp on generalized Kneser graphs and others. International Journal of Game Theory, 50(3):603–621, 2021.
[23] R. Gold, H. Branquinho, E. Hemberg, U.-M. O’Reilly, and P. García-Sánchez. Genetic programming and coevolution to play the Bomberman™ video game. In Applications of Evolutionary Computation, pages 765–779, 2023.
[24] D. Grier. Deciding the winner of an arbitrary finite poset game is PSPACE-complete. In Automata, Languages, and Programming, pages 497–503, 2013.
[25] P. M. Grundy. Mathematics and games. Eureka, 2:6–8, 1939.
[26] R. K. Guy. Impartial games. In Games of No Chance, volume 29 of Mathematical Sciences Research Institute Publications, pages 61–78. Cambridge University Press, 1996.
[27] R. K. Guy. What is a game? In Games of No Chance, volume 29 of Mathematical Sciences Research Institute Publications, pages 43–60. Cambridge University Press, 1996.
[28] S. N. Harris and D. R. Tauritz. Competitive coevolution for defense and security: Elo-based similar-strength opponent sampling. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’21, page 1898–1906, 2021.
[29] A. Hauptman. Evolving search heuristics for combinatorial games with genetic programming. PhD thesis, Ben-Gurian University of the Negev, 2009.
[30] A. Hauptman and M. Sipper. GP-EndChess: using genetic programming to evolve chess endgame players. In Proceedings of the 8th European Conference on Genetic Programming, EuroGP ’05, page 120–131, 2005.
[31] M. A. Hevia Fajardo and P. K. Lehre. How fitness aggregation methods affect the performance of competitive CoEAs on bilinear problems. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’23, page 1593–1601, 2023.
[32] M. A. Hevia Fajardo, P. K. Lehre, and S. Lin. Runtime analysis of a co-evolutionary algorithm: Overcoming negative drift in maximin-optimisation. In Proceedings of the 17th Conference on Foundations of Genetic Algorithms, FOGA ’23, page 73–83, 2023.
[33] J. Hofbauer and K. Sigmund. Evolutionary game dynamics. Bulletin of the American Mathematical Society, 40(4):479–519, 2003.
[34] T. Jansen and R. P. Wiegand. The cooperative coevolutionary (1+1) EA. Evolutionary Computation, 12(4):405–434, 2004.
[35] W. Jaśkowski, K. Krawiec, and B. Wieloch. Fitnessless coevolution. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’08, page 355–362, 2008.
[36] W. Jaśkowski, M. Szubert, and P. Liskowski. Multi-criteria comparison of coevolution and temporal difference learning on othello. In Applications of Evolutionary Computation, pages 301–312, 2014.
[37] A. H. Jorgensen. Context and driving forces in the development of the early computer game Nimbi. IEEE Annals of the History of Computing, 31(3):44–53, 2009.
[38] K. Krawiec and M. Heywood. Solving complex problems with coevolutionary algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’20, page 832–858, 2020.
[39] J. Lange, M. Stanke, and M. Ebner. Co-evolution of spies and resistance fighters. In Applications of Evolutionary Computation, pages 487–502, 2022.
[40] P. K. Lehre. Runtime analysis of competitive co-evolutionary algorithms for maximin optimisation of a bilinear function. Algorithmica, 86(7):2352–2392, 2024.
[41] P. K. Lehre, M. A. Hevia Fajardo, J. Toutouh, E. Hemberg, and U.-M. O’Reilly. Analysis of a pairwise dominance coevolutionary algorithm and DefendIt. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’23, page 1027–1035, 2023.
[42] S. Lin and P. K. Lehre. Overcoming binary adversarial optimisation with competitive coevolution. In Parallel Problem Solving from Nature XVIII, 2024.
[43] A. Lubberts and R. Miikkulainen. Co-evolving a go-playing neural network. In Coevolution: Turning Adaptive Algorithms Upon Themselves, 2001.
[44] H. K. McCoy. The game of nim - the Nimatron. Carnegie Technical, page 14, February 1951.
[45] D. Michie. Experiments on the mechanization of game-learning part I: characterization of the model and its parameters. The Computer Journal, 6(3):232–236, 11 1963.
[46] G. A. Monroy, K. O. Stanley, and R. Miikkulainen. Coevolution of neural networks using a layered pareto archive. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’06, page 329–336, 2006.
[47] G. Nivasch. More on the Sprague-Grundy function for Whythoff’s game. In Games of No Chance 3, volume 56 of Mathematical Sciences Research Institute Publications, pages 377–410. Cambridge University Press, 2009.
[48] J. Noble and R. A. Watson. Pareto coevolution: using performance against coevolved opponents in a game as dimensions for pareto selection. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’01, page 493–500, 2001.
[49] M. Pelikan, M. Hauschild, and F. G. Lobo. Estimation of distribution algorithms. In Springer Handbook of Computational Intelligence, pages 899–928. Springer, 2015.
[50] J. B. Pollack and A. D. Blair. Co-evolution in the successful learning of backgammon strategy. Machine Learning, 32(3):225–240, Sep 1998.
[51] E. Popovici, A. Bucci, R. P. Wiegand, and E. D. De Jong. Coevolutionary Principles, pages 987–1033. Springer, 2012.
[52] R. Redheffer. A machine for playing the game nim. The American Mathematical Monthly, 55(6):343–349, 1948.
[53] J. M. Robson. The complexity of go. In Proceedings of the IFIP 9th World Computer Congress on Information Processing, pages 413–417, 1983.
[54] J. M. Robson. N by N checkers is exptime complete. SIAM Journal on Computing, 13(2):252–267, 1984.
[55] C. D. Rosin and R. K. Belew. New methods for competitive coevolution. Evolutionary Computation, 5(1):1–29, 1997.
[56] L. Rougetet. A prehistory of nim. The College Mathematics Journal, 45(5):358–363, 2014.
[57] M. Saburov. On discrete-time replicator equations with nonlinear payoff functions. Dynamic Games and Applications, 12(2):643–661, 2022.
[58] F. Schuh. Spel van delers. Nieuw Tijdschrift voor Wiskunde, 39:299–304, 1952.
[59] A. J. Schwenk. Take-away games. The Fibonacci Quarterly, 8:225–234, 1970.
[60] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
[61] R. Sprague. Über mathematische kampfspiele. Tohoku Mathematical Journal, First Series, 41:438–444, 1935.
[62] R. Sprague. Über zwei abarten von nim. Tohoku Mathematical Journal, First Series, 43:351–354, 1937.
[63] M. Szubert, W. Jaśkowski, P. Liskowski, and K. Krawiec. The role of behavioral diversity and difficulty of opponents in coevolving game-playing agents. In Applications of Evolutionary Computation, pages 394–405, 2015.
[64] M. Szubert, W. Jaśkowski, and K. Krawiec. On scalability, generalization, and hybridization of coevolutionary learning: a case study for othello. IEEE Transactions on Computational Intelligence and AI in Games, 5(3):214–226, 2013.
[65] C. Witt. Upper bounds on the running time of the univariate marginal distribution algorithm on onemax. Algorithmica, 81:632–667, 2019.
[66] C. Witt. How majority-vote crossover and estimation-of-distribution algorithms cope with fitness valleys. Theoretical Computer Science, 940:18–42, 2023.
[67] I. M. Yaglom. Two games with matchsticks. In Kvant Selecta: Combinatorics, I, volume 17 of Mathematical World, pages 1–8. American Mathematical Society, 2001.

Appendix A Preliminary results

Here we provide two straightforward results that will be useful to quote throughout the proof of Theorem 5.2. The first is derived from the Chernoff bounds for binomial random variables given by Theorem A.1, which is in turn an immediate consequences of [6, Theorem 3.2]. We remark that the conclusions E1-E3 have been optimised for ease of integration with the proofs in this paper, rather than tightness of bound.

Theorem A.1.

If $X\sim\text{\emph{Bin}}(m,q)$ , then for any $t\geqslant 0$ it holds that

	$\displaystyle\mathbb{P}(X\leqslant mq-t)$	$\displaystyle\leqslant\exp{\left(-\frac{t^{2}/2}{mq}\right)},$		(36)
	$\displaystyle\mathbb{P}(X\geqslant mq+t)$	$\displaystyle\leqslant\exp{\left(-\frac{t^{2}/2}{mq+t/3}\right)}.$		(37)

Corollary A.2.

Suppose $q\in[0,1/2]$ and $X\sim\text{\emph{Bin}}(\mu,q)$ .

For any $\alpha>0$ and $p\in[0,1/2]$ satisfying $q\geqslant(1+\alpha)p$ ,

\mathbb{P}(X/\mu\leqslant(1+\alpha/2)p)\leqslant\exp{\left(-\frac{\alpha^{2}% \mu p}{8(1+\alpha)}\right)}.

For any $\alpha>0$ and $p\in[0,1/2]$ satisfying $q\leqslant(1-\alpha)p$ ,

\mathbb{P}(X/\mu\geqslant(1-\alpha/2)p)\leqslant\exp{\left(-\frac{\alpha^{2}% \mu p}{8(1+\alpha)}\right)}.

For any $\alpha>0$ and $p\in[0,1/2]$ satisfying $q\geqslant(1-\alpha)p$ ,

\mathbb{P}(X/\mu\leqslant(1-2\alpha)p)\leqslant\exp{\left(-\frac{\alpha^{2}\mu p% /16}{8(1+\alpha/4)}\right)}.

Proof.

For E1, let $Y_{1}\sim\text{Bin}(\mu,(1+\alpha)p)$ so that $X\succcurlyeq Y_{1}$ . We then have

	$\displaystyle\mathbb{P}(X/\mu\leqslant(1+\alpha/2)p)$	$\displaystyle=\mathbb{P}(X\leqslant(1+\alpha/2)p\mu)\leqslant\mathbb{P}(Y_{1}% \leqslant(1+\alpha/2)p\mu)$
		$\displaystyle=\mathbb{P}(Y_{1}\leqslant(1+\alpha)p\mu-\alpha p\mu/2)\overset{% \eqref{eq:chernoff-1}}{\leqslant}\exp{\left(-\frac{\alpha^{2}p^{2}\mu^{2}/8}{% \mu(1+\alpha)p}\right)}\leqslant\exp{\left(-\frac{\alpha^{2}p\mu}{8(1+\alpha)}% \right)},$

as required. For E2, let $Y_{2}\sim\text{Bin}(\mu,(1-\alpha)p)$ so that $X\preccurlyeq Y_{2}$ . We then have

	$\displaystyle\mathbb{P}(X/\mu\geqslant(1-\alpha/2)p)$	$\displaystyle=\mathbb{P}(X\geqslant(1-\alpha/2)p\mu)\leqslant\mathbb{P}(Y_{2}% \geqslant(1-\alpha/2)p\mu)$
		$\displaystyle=\mathbb{P}(Y_{1}\geqslant(1-\alpha)p\mu+\alpha p\mu/2)\overset{% \eqref{eq:chernoff-2}}{\leqslant}\exp{\left(-\frac{\alpha^{2}p^{2}\mu^{2}/8}{% \mu(1-\alpha)p+(\alpha p\mu/6)}\right)}$
		$\displaystyle\leqslant\exp{\left(-\frac{\alpha^{2}p\mu}{8(1+\alpha)}\right)},$

as required. For E3, let $Y_{3}\sim\text{Bin}(\mu,(1-\alpha)p)$ so that $X\succcurlyeq Y_{3}$ . We then have

	$\displaystyle\mathbb{P}(X/\mu\leqslant(1-2\alpha)p)$	$\displaystyle=\mathbb{P}(X\leqslant(1-2\alpha)p\mu)\leqslant\mathbb{P}(Y_{3}% \leqslant(1-2\alpha)p\mu)$
		$\displaystyle=\mathbb{P}(Y_{3}\leqslant(1-\alpha)p\mu-\alpha p\mu)\overset{% \eqref{eq:chernoff-1}}{\leqslant}\exp{\left(-\frac{\alpha^{2}p^{2}\mu^{2}/2}{% \mu(1-\alpha)p}\right)}\leqslant\exp{\left(-\frac{\alpha^{2}p\mu/16}{8(1+% \alpha/4)}\right)},$

as required. ∎

Lemma A.3.

Given $\gamma\in[0,\textstyle{\frac{1}{2}})$ , let $g:[\gamma,1-\gamma]\to\mathbb{R}_{\geqslant 0}$ be given by

g(y)=\log{\left(\frac{y}{1-y}\right)}-\log{\left(\frac{\gamma}{1-\gamma}\right% )}.

Then, the following properties hold.

F1

If $y\in[\gamma,\textstyle{\frac{1}{2}}]$ and $a\in[0,1)$ , then $g((1+a)y)-g(y)\geqslant a/2$ .
F2

If $y\in[\textstyle{\frac{1}{2}},1-\gamma]$ and $a\in[0,1)$ , then $g(1-(1+a)y)-g(1-y)\geqslant a/2$ .
F3

$\max_{y\in[\gamma,1-\gamma]}g(y)\leqslant 2\log{(1/\gamma)}$ .

Proof.

F1: If $y\in[\gamma,\textstyle{\frac{1}{2}}]$ and $a\in[0,1)$ , then

	$\displaystyle g((1+a)y)-g(y)$	$\displaystyle=\log{\left(\frac{(1+a)y}{1-(1+a)y}\right)}-\log{\left(\frac{y}{1% -y}\right)}=\log{\left(\frac{(1+a)(1-y)}{1-(1+a)y}\right)}$
		$\displaystyle=\log{\left(1+\frac{a}{1-(1+a)y}\right)}\geqslant\log{(1+a)}% \geqslant a/2.$

F2: If $y\in[\textstyle{\frac{1}{2}},1-\gamma]$ and $a\in[0,1)$ , then

	$\displaystyle g(1-(1-a)y)-g(1-y)$	$\displaystyle=\log{\left(\frac{1-(1-a)y}{(1-a)y}\right)}-\log{\left(\frac{1-y}% {y}\right)}=\log{\left(\frac{1-(1-a)y}{(1-a)(1-y)}\right)}$
		$\displaystyle=\log{\left(1+\frac{a}{(1-a)(1-y)}\right)}\geqslant\log{(1+a)}% \geqslant a/2.$

F3: Because $g$ is an increasing function,

\max_{y\in[\gamma,1-\gamma]}g(y)=g(1-\gamma)=\log{\left(\frac{1-\gamma}{\gamma% }\right)}-\log{\left(\frac{\gamma}{1-\gamma}\right)}=2\log{\left(\frac{1-% \gamma}{\gamma}\right)}\leqslant 2\log{(1/\gamma)},

as required. ∎