Towards a Rigorous Understanding of the Population Dynamics of the NSGA-III: Tight Runtime Bounds

Written by AAAI Press Staff1
AAAI Style Contributions by Pater Patel Schneider, Sunil Issar,
J. Scott Penberthy, George Ferguson, Hans Guesgen, Francisco Cruz\equalcontrib, Marc Pujol-Gonzalez\equalcontrib
With help from the AAAI Publications Committee.
   Andre Opris
Abstract

Evolutionary algorithms are widely used for solving multi-objective optimization problems. A prominent example is NSGA-III, which is particularly well suited for solving problems involving more than three objectives, distinguishing it from the classical NSGA-II. Despite its empirical success, the theoretical understanding of NSGA III remains very limited, especially with respect to runtime analysis. A central open problem concerns its population dynamics, which involve controlling the maximum number of individuals sharing the same fitness value during the exploration process. In this paper, we make a significant step towards such an understanding by proving tight runtime bounds for NSGA-III on the bi-objective OneMinMax (22-OMM) problem. Firstly, we prove that NSGA-III requires Ω(n2log(n)/μ)\Omega(n^{2}\log(n)/\mu) generations in expectation to optimize 22-OMM assuming the population size μ\mu satisfies n+1μ=O(log(n)c(n+1))n+1\leq\mu=O(\log(n)^{c}(n+1)) where nn denotes the problem size and c<1c<1 is a constant. Apart from (opris2025multimodal), this is the first proven lower runtime bound for NSGA-III on a classical benchmark problem. Complementing this, we secondly improve the best known upper bound of NSGA-III on the mm-objective OneMinMax problem (mm-OMM) of O(nlog(n))O(n\log(n)) generations by a factor of μ/(2n/m+1)m/2\mu/(2n/m+1)^{m/2} for a constant number mm of objectives and population size (2n/m+1)m/2μO(log(n)(2n/m+1)m/2)(2n/m+1)^{m/2}\leq\mu\in O(\sqrt{\log(n)}(2n/m+1)^{m/2}). This yields tight runtime bounds in the case m=2m=2, and the surprising result that NSGA-III beats NSGA-II by a factor of μ/n\mu/n in the expected runtime.

Introduction

Decision making is a fundamental aspect of many areas in artificial intelligence, where it is often important to explore trade-offs and compromises between different options before reaching a conclusion (LUUKKONEN2023102537). Such situations are often formulated as multi-objective optimization problems, which are typically tackled using evolutionary multi-objective algorithms (STEWART2021103830). They apply principles of nature to optimize functions with conflicting objectives, aiming to find a diverse Pareto-optimal set of solutions. This offers decision makers a range of trade-off solutions, enabling them to select the one that best aligns with their preferences (TAMSSAOUET202287). It is therefore not surprising that such algorithms have become essential tools, widely applied across various practical domains. These include artificial intelligence (ArtIntKoziel), often in combination with bioinformatics (HANDLBIO; MultiBioBook), as well as constraint optimization (GARCIA2021100983), machine learning (9515233; EMOAsMachine), and engineering (MultiobjEngineerring; EMOAEngineer). In particular, many of such real-world applications involve optimization problems with many objectives. However, a huge challenge is that, as the number of objectives increases, the Pareto front expands exponentially, making the problems increasingly complex. Additionally, identifying dependencies between individual objectives becomes more difficult. There are already differences between two and more objectives. In the case of two objectives, sorting non-dominated individuals according to the first objective naturally leads to a reverse sorting with respect to the second, making the crowding distance, which measures the proximity of search points based on their sorting across objectives, a reliable indicator of their relative closeness. However, this relationship breaks down for problems with three or more objectives, as a solution can have a crowding distance of zero even when it is not close to other solutions (see, for example (Zheng2023Inefficiency)). As a result, NSGA-II (Deb2002), the most cited EMOA (\sim 56000 citations) which uses the crowding distance as a tie-breaker, succeeds in solving bi-objective problems (see for example (ZhengLuiDoerrAAAI22) for a rigorous analysis or (Deb2002) for empirical results), but fails in optimizing many problems where the number of objectives is large (compare with (Zheng2023Inefficiency) for large differences already between two and three objectives or for empirical studies (NSGAIIINEFF2003)). To overcome this problem, DebJain2014 designed the NSGA-III algorithm. It uses a set of predefined reference points instead of the crowding distance. A major advantage is that these reference points can be predefined by users based on their specific needs. Hence, this algorithm has a huge practical impact (\sim6000 citations) and it is empirically shown that it can efficiently solve problems with at least four objectives (DebJain2014; NSGAIIIAppl; NSGAIIIAPPLL). However, theoretical understanding of its success lags far behind its practical impact and the first papers addressing rigorous runtime analyses of this algorithm appeared only recently (see for example (WiethegerD23; OprisNSGAIII) for breakthroughs). Surprisingly, even in simple settings, opris2025multimodal showed that NSGA-III exhibits population dynamics that differ significantly from those of NSGA-II: NSGA-III successively iterates through all reference points, always choosing a point associated to a reference point with the fewest chosen individuals so far for the next generation, while NSGA-II treats all points with zero crowding distance equally. Hence, NSGA-III tends to spread solutions very evenly across the Pareto front (compare also with (CHAUDHARI20221509) for empirical results). Indeed, for appropriate population sizes, it was shown in (opris2025multimodal) that NSGA-III outperforms NSGA-II on the pseudo-Boolean bi-objective multimodal function OJZJ for appropriate population sizes. However, they showed how NSGA-III spreads solutions evenly across the Pareto front after converging to local optima. How this distribution evolves during exploration, particularly before reaching a local optimum or when no local optima exist at all, as in mm-OMM, remains unclear. It is still unknown when and why NSGA-III performs well, or how quickly it spreads solutions across the Pareto front. As a first step toward understanding its limitations on many-objective problems, we focus on the bi-objective case, which already exhibits complex population dynamics.

Our contribution: We significantly increase the understanding of the population dynamics of the NSGA-III on the pseudo-Boolean 22-OMM by investigating the maximum cover number β\beta, defined as the maximum number of individuals in the population sharing the same fitness vector. opris2025multimodal showed that β\beta is non-increasing. Our first two results are about the time to firstly cover a subset 𝒜\mathcal{A} of the Pareto front of a given cardinality α\alpha (Lemma 3), and then spread all solutions evenly on that set (Lemma 4). With high probability, this time is O(α)O(\alpha). On the other hand, for a given maximum cover number β\beta, we analyze the population’s exploration towards the extreme point 1n1^{n}. Specifically, for two constants 0<a<b3/40<a<b\leq 3/4, we provide a lower bound on the time required to reduce the population’s distance to 1n1^{n} from at least nbn^{b} to at most nan^{a}. With high probability, this time is Ω(nlnn/β)\Omega(n\ln n/\beta) (Lemmas 5 and 6). This bound increases asymptotically with 1/β1/\beta, which is unsurprising since a smaller β\beta reduces the probability of selecting individuals already close to 1n1^{n}, and further decreasing their distance through mutation. Then, this bound can be used, in conjunction with Lemmas 3 and 4, to further reduce the cover number by covering and afterwards spreading on a set of cardinality Ω(nln(n)/β)\Omega(n\ln(n)/\beta) for carefully chosen population sizes. This again results, by Lemmas 5 and 6, in a larger lower bound of Ω(nlnn/β)\Omega(n\ln n/\beta). By repeating this process a finite number of times, we finally reduce β\beta to a value of O(μ/n)O(\mu/n) before creating a search point xx with distance at most n1/16n^{1/16} to 1n1^{n}. Notably, this is the smallest possible value for β\beta up to a multiplicative constant, and leads to a lower bound of Ω(n2ln(n)/μ)\Omega(n^{2}\ln(n)/\mu) expected generations for NSGA-III to optimize 22-OMM (Theorem 7). To the best of our knowledge, this is the first established lower runtime bound of NSGA-III on a classical benchmark problem without multimodality. Finally, we improve the upper bound from (OprisNSGAIII) for any constant number mm of objectives and (2n/m+1)m/2μO(ln(n)(2n/m+1)m/2)(2n/m+1)^{m/2}\leq\mu\in O(\sqrt{\ln(n)}(2n/m+1)^{m/2}) by a factor of O((2n/m+1)m/2/μ)O((2n/m+1)^{m/2}/\mu), showing that mm-OMM can be optimized in O(nln(n)(2n/m+1)m/2/μ)O\left(n\ln(n)(2n/m+1)^{m/2}/\mu\right) generations in expectation (Theorem 8). This aligns with the earlier lower bound for the case m=2m=2 and reveals the somewhat surprising result that NSGA-III outperforms NSGA-II by a factor of μ/n\mu/n (see (DoerrQu2023a) for a lower runtime bound of Ω(nln(n))\Omega(n\ln(n)) generations for NSGA-II to optimize 22-OMM, if 4(n+1)μo(nν)(n+1)4(n+1)\leq\mu\leq o(n^{\nu})(n+1) for ν<1\nu<1). Despite the latter being the state of the art algorithm for two objectives and widely used in practice (with around 60,000 citations).

Related work: The mathematical runtime analysis of modern practical MOEAs began only recently. ZhengLuiDoerrAAAI22 conducted the first runtime analysis of NSGA-II on classical benchmark functions which was the starting point for plenty of successive works in similar contexts dealing with optimizing bi-objective functions by NSGA-II (Qu2022PPSN; DaOp2023; NSgaIIBeat; Dang2024; DoerrQ23b; DoerrApprox; UpBian; 2025Lessons) which has even been extended on combinatorial optimization problems like minimum spanning trees or subset selection (NSGAIICombIJCAI; MOEASubset). Very recently, variants of the NSGA-II were proposed to overcome its shortcomings in solving many-objective problems by adding a simple tie-breaking rule (Krejca2025b) or by using an alternative version of the crowding distance (ZhengDoerrCrowding). The most prominent algorithms when dealing with many-objectives are the SPEA2, SMS-EMOA and NSGA-III which have also been analyzed successfully (WiethegerD23; Zheng_Doerr_2024; OprisNSGAIII; DoerrNearTight; Opris2025; opris2025multimodal). However, up to (opris2025multimodal; DoerrQu2023a) there were no proven tight runtime bounds on the performance of NSGA-II or NSGA-III on classical benchmark functions, despite investigating limitations of EMOAs and proving lower bounds by analyzing their population dynamics is a highly active area of research. For example, Opris2025PAES abalyzed the PAES-25 evolutionary strategy with one-bit mutation on the mm-objective LeadingOnesTrailingZeros problem, proving tight runtime bounds of Θ(n3)\Theta(n^{3}) for m=2m=2, Θ(n3lnn)\Theta(n^{3}\ln n) for m=4m=4, and Θ(n(2n/m)m/2ln(n/m))\Theta\left(n(2n/m)^{m/2}\ln(n/m)\right) for m>4m>4. Additional tight runtime bounds for the GSEMO algorithm on the bi-objective COCZ and OMM benchmarks are provided in (doerr2025tightruntimeGSEMO). To the best of our knowledge, there are no existing results on the population dynamics of NSGA-III, apart from the investigations on OJZJ in (opris2025multimodal).

Preliminaries

Given two random variables XX and YY on 0\mathbb{N}_{0}, we say that YY stochastically dominates XX if Pr(Yc)Pr(Xc)\Pr(Y\leq c)\leq\Pr(X\leq c) for all c0c\geq 0. The number of ones in a bit string xx is denoted by |x|1|x|_{1} and the number of zeros by |x|0|x|_{0}, respectively. For any finite set AA, we write |A||A| to denote its cardinality. For nn\in\mathbb{N} let [n]:={1,,n}[n]:=\{1,\ldots,n\}, denote by ln\ln the natural logarithm (i.e. to base ee) and let poly(n)\text{poly}(n) be a placeholder for some polynomial in nn.

This paper is about many-objective optimization, specifically the maximization of a discrete mm-objective function f(x):=(f1(x),,fm(x))f(x):=(f_{1}(x),\ldots,f_{m}(x)) for mm\in\mathbb{N} where each fi:{0,1}n0f_{i}:\{0,1\}^{n}\to\mathbb{N}_{0} for i{1,,m}i\in\{1,\ldots,m\}. When m=2m=2, the function is also called bi-objective. For a bit string xx let x:=(x1,,xm/2)x:=(x^{1},\ldots,x^{m/2}) where all xjx^{j} are of equal length 2n/m2n/m. For a subset N{0,1}nN\subseteq\{0,1\}^{n}, we define f(N):={f(x)xN}f(N):=\{f(x)\mid x\in N\}. Given two search points x,y{0,1}nx,y\in\{0,1\}^{n}, xx weakly dominates yy, denoted by xyx\succeq y, if fi(x)fi(y)f_{i}(x)\geq f_{i}(y) for all i[m]i\in[m] and xx (strictly) dominates yy, denoted by xyx\succ y, if one inequality is strict; if neither xyx\succeq y nor yxy\succeq x then xx and yy are incomparable. A set S{0,1}nS\subseteq\{0,1\}^{n} is a set of mutually incomparable solutions with respect to ff if all search points in SS are incomparable. Each solution not dominated by any other in {0,1}n\{0,1\}^{n} is called Pareto-optimal. A mutually incomparable set of these solutions that covers all possible non-dominated fitness values is called a Pareto(-optimal) set of ff. For a population Pt{0,1}nP_{t}\subset\{0,1\}^{n} and v0mv\in\mathbb{N}_{0}^{m} denote by ct(v):=|{xPtf(x)=v}|c_{t}(v):=|\{x\in P_{t}\mid f(x)=v\}| the cover number of vv, i.e. the number of individuals from PtP_{t} with fitness vector vv. We say that vv is covered if ct(v)1c_{t}(v)\geq 1.

1Initialize P0Unif(({0,1}n)μ)P_{0}\sim\text{Unif}((\{0,1\}^{n})^{\mu})
2 for t:=0t:=0 to \infty do
3   Initialize Qt:=Q_{t}:=\emptyset
4 for i=1i=1 to μ\mu do
5      Sample ss from PtP_{t} uniformly at random
6      Create rr by standard bit mutation on ss with mutation probability 1/n1/n
7      Update Qt:=Qt{r}Q_{t}:=Q_{t}\cup\{r\}
8    
9  Set Rt:=PtQtR_{t}:=P_{t}\cup Q_{t}
10   Partition RtR_{t} into layers Ft1,Ft2,,FtkF^{1}_{t},F^{2}_{t},\ldots,F^{k}_{t} of non-dominated solutions
11   Find i1i^{*}\geq 1 such that i=1i1|Fti|<μ\sum_{i=1}^{i^{*}-1}\lvert{F_{t}^{i}}\rvert<\mu and i=1i|Fti|μ\sum_{i=1}^{i^{*}}\lvert{F_{t}^{i}}\rvert\geq\mu
12   Compute Yt=i=1i1FtiY_{t}=\bigcup_{i=1}^{i^{*}-1}F_{t}^{i}
13   Choose F~tiFti\tilde{F}_{t}^{i^{*}}\subset F_{t}^{i^{*}} such that |YtF~ti|=μ\lvert{Y_{t}\cup\tilde{F}_{t}^{i^{*}}}\rvert=\mu with Algorithm 2
14  Create the next population Pt+1:=YtF~tiP_{t+1}:=Y_{t}\cup\tilde{F}^{i^{*}}_{t}
15 
Algorithm 1 NSGA-III ((DebJain2014)) with population size μ\mu on an mm-objective function ff
1Compute the normalisation fnf^{n} of ff
2 Associate each xYtFtix\in Y_{t}\cup F_{t}^{i^{*}} with its reference point rp(x)\mathrm{rp}(x) such that the distance between fn(x)f^{n}(x) and the line through the origin and rp(x)\mathrm{rp}(x) is minimized
3 For each rpr\in\mathcal{R}_{p}, set ρr:=|{xYtrp(x)=r}|\rho_{r}:=|\{x\in Y_{t}\mid\mathrm{rp}(x)=r\}|
4 Initialize F~ti=\tilde{F}_{t}^{i^{*}}=\emptyset and R:=pR^{\prime}:=\mathcal{R}_{p}
5 while true do
6   Determine rminRr_{\min}\in R^{\prime} such that ρrmin\rho_{r_{\min}} is minimal (where ties are broken randomly) 
7   Determine xrminFtiF~tix_{r_{\min}}\in F_{t}^{i^{*}}\setminus\tilde{F}_{t}^{i^{*}} which is associated with rminr_{\min} and minimizes the distance between the vectors fn(xrmin)f^{n}(x_{r_{\min}}) and rminr_{\min} (where ties are broken randomly)
8 if xrminx_{r_{\min}} exists then
9    F~ti=F~ti{xrmin}\tilde{F}_{t}^{i^{*}}=\tilde{F}_{t}^{i^{*}}\cup\{x_{r_{\min}}\}
10    ρrmin=ρrmin+1\rho_{r_{\min}}=\rho_{r_{\min}}+1
11    if |Yt|+|F~ti|=μ\lvert{Y_{t}}\rvert+\lvert{\tilde{F}_{t}^{i^{*}}}\rvert=\mu then
12       return F~ti\tilde{F}_{t}^{i^{*}}
13    
14 else R=R{rmin}R^{\prime}=R^{\prime}\setminus\{r_{\min}\};
15 
Algorithm 2 Selection procedure utilizing a set p\mathcal{R}_{p} of reference points to maximize an mm-objective function ff

The NSGA-III algorithm, originated in (DebJain2014) is shown in Algorithm 1. Initially, a population of size μ\mu is created by choosing μ\mu individuals from {0,1}n\{0,1\}^{n} uniformly at random. Then in each iteration tt, a multiset QtQ_{t} of μ\mu new offspring is created by μ\mu times choosing an individual sPts\in P_{t} uniformly at random and applying standard bit mutation on ss, i.e. each bit is flipped independently with probability 1/n1/n. During the survival selection, the parent and offspring populations PtP_{t} and QtQ_{t} are merged into RtR_{t}. Then RtR_{t} is partitioned into layers Ft+11,Ft+12,F^{1}_{t+1},F^{2}_{t+1},\dots using the non-dominated sorting algorithm (Deb2002) where Ft+11F^{1}_{t+1} consists of all non-dominated individuals, and Ft+1iF^{i}_{t+1} for i>1i>1 of individuals only dominated by those from Ft+11,,Ft+1i1F^{1}_{t+1},\dots,F^{i-1}_{t+1}. Then the critical rank ii^{*} with i=1i1|Fti|<μ\sum_{i=1}^{i^{*}-1}\lvert{F_{t}^{i}}\rvert<\mu and i=1i|Fti|μ\sum_{i=1}^{i^{*}}\lvert{F_{t}^{i}}\rvert\geq\mu is determined. All individuals with a lower rank than ii^{*} are included in Pt+1P_{t+1}, while the remaining individuals are selected from FtiF_{t}^{i^{*}} using Algorithm 2. Hereby, a normalized objective function fnf^{n} is computed and then each individual with rank at most ii^{*} is associated with reference points. For the first, we use the normalization procedure from (WiethegerD23) which can be also used for maximization problems as shown in (OprisNSGAIII). We omit detailed explanations as they are not needed for our purposes. For an mm-objective function f:{0,1}n0mf\colon\{0,1\}^{n}\rightarrow\mathbb{N}_{0}^{m}, the normalized fitness vector fn(x):=(f1n(x),,fmn(x))f^{n}(x):=(f_{1}^{n}(x),\dots,f_{m}^{n}(x)) of a search point xx is computed as

fjn(x)=fj(x)yjminyjnadyjmin\displaystyle f_{j}^{n}(x)=\frac{f_{j}(x)-y_{j}^{\min}}{y_{j}^{\text{nad}}-y_{j}^{\min}}

for each j[m]j\in[m] where ynad:=(y1nad,,ymnad)y^{\text{nad}}:=(y_{1}^{\text{nad}},\ldots,y_{m}^{\text{nad}}) and ymin:=(y1min,,ymmin)y^{\min}:=(y_{1}^{\min},\dots,y_{m}^{\min}) from the objective space are called nadir and ideal points, respectively. Computing the nadir point is not trivial and we have yjnadεnady_{j}^{\text{nad}}\geq\varepsilon_{\text{nad}}, and yjminyjnadyjmaxy_{j}^{\text{min}}\leq y_{j}^{\text{nad}}\leq y_{j}^{\text{max}} for every j[m]j\in[m] where εnad\varepsilon_{\text{nad}} is a positive threshold set by the user (see (Blank2019) or (WiethegerD23) for the details). Further, yjmaxy_{j}^{\max} and yjminy_{j}^{\min} are the maximum and minimum value in objective jj from all search points seen so far (i.e. from P0,Q0,,Pt,QtP_{0},Q_{0},\ldots,P_{t},Q_{t}). After computing the normalisation, each individual xx is associated with the reference point rp(x)\text{rp}(x) such that the distance between fn(x)f^{n}(x) and the line through the origin and rp(x)\text{rp}(x) is minimal. We use the same set of reference points p\mathcal{R}_{p} as proposed in (DebJain2014), originated in (Das1998). The points are defined as

p={(a1p,,amp) | (a1,,am)0m,i=1mai=p}\mathcal{R}_{p}=\left\{\left(\frac{a_{1}}{p},\ldots,\frac{a_{m}}{p}\right)\text{ }\Big|\text{ }(a_{1},\dots,a_{m})\in\mathbb{N}_{0}^{m},\sum_{i=1}^{m}a_{i}=p\right\}

where pp\in\mathbb{N} is a parameter one can choose according to the fitness function ff. These are uniformly distributed on the simplex determined by the unit vectors (1,0,,0),(0,1,,0),,(0,0,,1)(1,0,\dots,0)^{\intercal},(0,1,\dots,0)^{\intercal},\dots,(0,0,\dots,1)^{\intercal}.

Then, one iterates through all the reference points where the reference point with the fewest associated individuals that are already selected for the next generation Pt+1P_{t+1} is chosen. A reference point is omitted if it only has associated individuals that are already selected for Pt+1P_{t+1} and ties are broken uniformly at random. Next, from the individuals associated to that reference point who have not yet been selected, the one closest to the chosen reference point is selected for the next generation, where ties are again broken uniformly at random. Once the required number of individuals is reached (i.e. if |Yt|+|F~ti|=μ\lvert{Y_{t}}\rvert+\lvert{\tilde{F}_{t}^{i^{*}}}\rvert=\mu) the selection ends.

For our analysis, we need the following key lemma regarding the cover number of a Pareto-optimal fitness vector. This also includes the fact that NSGA-III protects Pareto-optimal solutions meaning that if the population size μ\mu is larger than a set of mutually incomparable solutions and a Pareto-optimal fitness vector is covered, then it is covered for all future generations. It is a combination of Lemma 2 from (OprisNSGAIII) and Lemma 3.4 from (opris2025multimodal).

Lemma 1.

Consider NSGA-III on an mm-objective function ff with εnadfmax\varepsilon_{\text{nad}}\geq f_{\max} and a set p\mathcal{R}_{p} of reference points for pp\in\mathbb{N} with p2m3/2fmaxp\geq 2m^{3/2}f_{\max}. Denote by PtP_{t} the current population and by VV the Pareto front of ff. Let SS be a maximum set of mutually incomparable solutions and suppose that μ|S|\mu\geq|S|. Then the following properties hold.

  1. (1)

    If μ|S|\mu\geq|S| then for each xFt1x\in F_{t}^{1} there is an yPt+1y\in P_{t+1} weakly dominating xx.

  2. (2)

    Let v𝒫v\in\mathcal{P} and 0αμ/|S|0\leq\alpha\leq\mu/|S|. If ct(v)αc_{t}(v)\geq\alpha then also ct+1(v)αc_{t+1}(v)\geq\alpha.

  3. (3)

    Let v𝒫v\in\mathcal{P} and suppose that ct+1(v)<ct(v)c_{t+1}(v)<c_{t}(v). Then ct+1(w)ct(v)c_{t+1}(w)\leq c_{t}(v) for every w𝒫w\in\mathcal{P}.

  4. (4)

    Suppose that every xPtx\in P_{t} is Pareto-optimal. Then dt:=max{ct(v)v𝒫}d_{t}:=\max\{c_{t}(v)\mid v\in\mathcal{P}\} does not increase.

The m-OMMm\text{-}\textsc{OMM} benchmark, originated in (Zheng2023Inefficiency), is defined as follows.

Definition 2.

Let mm be divisible by 22 and let the problem size be a multiple of m/2m/2. Then the mm-objective function mm-OMM is defined by m-OMM:{0,1}n0mm\text{-}\textsc{OMM}:\{0,1\}^{n}\to\mathbb{N}_{0}^{m} as

m-OMM(x)=(f1(x),,fm(x))m\text{-}\textsc{OMM}(x)=(f_{1}(x),\ldots,f_{m}(x))

with

f(x)={i=12n/mxi+n(1)/m, if  is odd,i=12n/m(1xi+n(2)/m), else,f_{\ell}(x)=\begin{cases}\sum_{i=1}^{2n/m}x_{i+n(\ell-1)/m},&\text{ if $\ell$ is odd,}\\ \sum_{i=1}^{2n/m}(1-x_{i+n(\ell-2)/m}),&\text{ else,}\end{cases}

for all x=(x1,,xn){0,1}nx=(x_{1},\ldots,x_{n})\in\{0,1\}^{n}.

In m-OMMm\text{-}\textsc{OMM}, the bit string is divided into m/2m/2 blocks, where f2j1f_{2j-1} and f2jf_{2j} correspond to block jj. Specifically, f2j1f_{2j-1} counts the number of ones, and f2jf_{2j} counts the number of zeros in block jj. Every search point is Pareto-optimal, as the total sum of objectives of any bit string is nn. A Pareto-optimal set thus, which is also a maximum set of mutually incomparable solutions, has cardinality (2n/m+1)m/2(2n/m+1)^{m/2}, since for each block j[m/2]j\in[m/2] there are at most 2n/m+12n/m+1 many fitness values (f2j1,f2j)(f_{2j-1},f_{2j}).

Population Dynamics of NSGA-III on 22-OMM

Bounding the Maximum Cover Number: First, we establish a general upper bound on the time required to cover a subset of the Pareto front of a given cardinality with high probability. Then, we provide an additional bound on the time needed to evenly distribute solutions across that subset.

Lemma 3.

Consider NSGA-III on f2f\coloneqq 2-OMM under the same conditions as in Lemma 1. (i.e. p42np\geq 4\sqrt{2}n since fmax=nf_{\max}=n and μn+1\mu\geq n+1). Then for a given natural α3n/8\alpha\leq 3n/8 there is 𝒜P\mathcal{A}\subset P with cardinality α\alpha which is covered in 64α64\alpha generations with probability at least 1eΩ(α)1-e^{-\Omega(\alpha)}.

Proof.

Denote by FF the Pareto front of 22-OMM. By a classical Chernoff bound the probability is at least 1eΩ(n)1-e^{-\Omega(n)} that there is an individual xx initialized with fj(x)[3n/8,5n/8]f_{j}(x)\in[3n/8,5n/8]. Suppose that this happens and fix such an individual x0x_{0}. Let 𝒜{vFvi[fi(x0)α/2,fi(x0)+α/2] for all i{1,2}}[3n/8α/2,5n/8+α/2]\mathcal{A}\coloneqq\{v\in F\mid v_{i}\in[f_{i}(x_{0})-\alpha/2,f_{i}(x_{0})+\alpha/2]\text{ for all }i\in\{1,2\}\}\subset[3n/8-\alpha/2,5n/8+\alpha/2]. Then we see that |𝒜|α|\mathcal{A}|\geq\alpha. Fix a covered v𝒜v\in\mathcal{A} and another uncovered w𝒜w\in\mathcal{A}. We show with probability at least 1eΩ(α)1-e^{-\Omega(\alpha)} the vector ww is covered after 64α64\alpha generations. Let Bt{xPtf(x)𝒜}B_{t}\coloneqq\{x\in P_{t}\mid f(x)\in\mathcal{A}\} and dtminxBt|f1(x)w1|d_{t}\coloneqq\min_{x\in B_{t}}|f_{1}(x)-w_{1}|. Note that 0dtα0\leq d_{t}\leq\alpha. Further, we see that ww is covered if dt=0d_{t}=0 since if f1(x)=w1f_{1}(x)=w_{1} then also f2(x)=w2f_{2}(x)=w_{2} due to f1(x)+f2(x)=w1+w2=nf_{1}(x)+f_{2}(x)=w_{1}+w_{2}=n. By Lemma 1(1), dtd_{t} cannot increase, but it can be decreased in one single trial by choosing xPtx\in P_{t} with |f1(x)w1|=dt|f_{1}(x)-w_{1}|=d_{t} as parent (probability at least 1/μ1/\mu) and then flipping a one bit to zero and not changing any other bit if f1(x)w1>0f_{1}(x)-w_{1}>0. On the other hand, if f1(x)w1<0f_{1}(x)-w_{1}<0, flip a zero bit to one. Both happen with probability at least (3n/8α/2)/n(11/n)n1(3n/83n/16)/(en)3/(16e)(3n/8-\alpha/2)/n\cdot(1-1/n)^{n-1}\geq(3n/8-3n/16)/(en)\geq 3/(16e). Then in one generation, dtd_{t} decreases with probability at least 1(1316eμ)μ3/(16e)1+3/(16e)332ep1-(1-\frac{3}{16e\mu})^{\mu}\geq\frac{3/(16e)}{1+3/(16e)}\geq\frac{3}{32e}\eqqcolon p where the first inequality is due to Lemma 10 in (Badkobeh2015). Note that p=Ω(1)p=\Omega(1) and for each i[α]i\in[\alpha], define the random variable XiX_{i} as the number of generations such that dt=id_{t}=i. Then the time until dt=0d_{t}=0 is Xi=1αXiX\coloneqq\sum_{i=1}^{\alpha}X_{i}, the latter stochastically dominated by the independent sum Yi=1αYiY\coloneqq\sum_{i=1}^{\alpha}Y_{i} of geometrically distributed random variables with success probability pi=p=Ω(1)p_{i}=p=\Omega(1). Note that E[Y]=α/p\mathrm{E}\left[Y\right]=\alpha/p and we obtain by Theorem 1 in (Witt14) for si=1α1/pi2=1024αe2/9s\coloneqq\sum_{i=1}^{\alpha}1/p_{i}^{2}=1024\alpha e^{2}/9 and λ0\lambda\geq 0 the inequality Pr(YE[Y]+λ)exp(14min{λ2s,λp})\Pr(Y\geq\mathrm{E}\left[Y\right]+\lambda)\leq\exp(-\frac{1}{4}\min\{\frac{\lambda^{2}}{s},\lambda p\}). For λ=α/p\lambda=\alpha/p we obtain Pr(X64α)Pr(X64eα/3)=Pr(X2α/p)Pr(Y2α/p)=Pr(YE[Y]+α/p)eΩ(α)\Pr(X\geq 64\alpha)\leq\Pr(X\geq 64e\alpha/3)=\Pr(X\geq 2\alpha/p)\leq\Pr(Y\geq 2\alpha/p)=\Pr(Y\geq\mathrm{E}\left[Y\right]+\alpha/p)\leq e^{-\Omega(\alpha)}. By a union bound on all w𝒜w\in\mathcal{A} we see that 𝒜\mathcal{A} is covered in 64α64\alpha generations with probability 1eΩ(α)1-e^{-\Omega(\alpha)}. ∎

Now we give the following upper bound on the time such that, with high probability, the solutions are evenly spread on a set 𝒜\mathcal{A} with cardinality α\alpha or, in other words, the cover number of each v𝒜v\in\mathcal{A} is bounded by μ/α\lceil{\mu/\alpha}\rceil from above.

Lemma 4.

Consider NSGA-III on f2f\coloneqq 2-OMM and suppose that all conditions of Lemma 1 are satisfied. Suppose that μ=poly(n)\mu=\text{poly}(n). Let α3n/8\alpha\leq 3n/8 be a natural number and let γ:=min{n/ln(n),μ/α}\gamma:=\min\{\lceil{n/\ln(n)}\rceil,\lceil{\mu/\alpha}\rceil\}. Then after 84α+46γ84\alpha+46\gamma generations, each v0mv\in\mathbb{N}_{0}^{m} has cover number at most μ/α\lceil{\mu/\alpha}\rceil with probability 1o(1)1-o(1). Hence, if αn/ln(n)\alpha\geq n/\ln(n) then γα\gamma\leq\alpha and therefore, 130α130\alpha generations suffice. The expected number of generations is O(α+γ)O(\alpha+\gamma).

Proof.

Denote by FF the Pareto front of 22-OMM. By Lemma 3 there is a set 𝒜P\mathcal{A}\subset P with cardinality α\alpha which is covered after 64α64\alpha generations with probability at least 1eΩ(α)1-e^{-\Omega(\alpha)}. Suppose that this happens. Denote the decrease of the cover number of a vector vFv\in F before reaching μ/α\lceil{\mu/\alpha}\rceil as a success. If a success occurs, we see that the cover number of all other Pareto-optimal vectors is at most μ/α\lceil{\mu/\alpha}\rceil by Lemma 1(3) and it cannot increase by Lemma 1(4) (since every solution is Pareto-optimal) and hence, the lemma holds. We show with probability 1o(1)1-o(1) that all v𝒜v\in\mathcal{A} have a cover number of at least μ/α\lceil{\mu/\alpha}\rceil or a success occurred after further 46γ+20α46\gamma+20\alpha generations. In the former case we have that μ/α=μ/α\lceil{\mu/\alpha}\rceil=\mu/\alpha and all other vF𝒜v\in F\setminus\mathcal{A} have cover number 0 and hence, the cover number of all vFv\in F is also bounded by γ\gamma. Depending on the value of γ\gamma, we consider two cases where FF denotes the Pareto front of 22-OMM.

Case 1: Let μ/αn/ln(n)\lceil{\mu/\alpha}\rceil\leq\lceil{n/\ln(n)}\rceil (i.e. γ=μ/α\gamma=\lceil{\mu/\alpha}\rceil). Fix vFv\in F, denote by ctc_{t} its cover number and for j[γ1]j\in[\gamma-1] let XjX_{j} be a random variable that counts the number of generations with ct=jc_{t}=j. Then the number of generations until a success occurs or the cover number of vv is at least γ\gamma is at most Xj=1γ1XjX\coloneqq\sum_{j=1}^{\gamma-1}X_{j}. Note that ctc_{t} can be increased by choosing an individual yy with f(x)=f(y)f(x)=f(y) as parent and flipping no bits (prob. 1/μ(11/n)n1/(4μ)σt1/\mu\cdot(1-1/n)^{n}\geq 1/(4\mu)\eqqcolon\sigma_{t}). Hence, the probability of increasing ctc_{t} in one generation is at least 1(1σt)μσtμ1+σtμ=1/41+1/4=151-(1-\sigma_{t})^{\mu}\geq\frac{\sigma_{t}\mu}{1+\sigma_{t}\mu}=\frac{1/4}{1+1/4}=\frac{1}{5}. Hence, XX is stochastically dominated by an independent sum Zj=1γ1ZjZ\coloneqq\sum_{j=1}^{\gamma-1}Z_{j} of geometrically distributed random variables ZjZ_{j} with parameter p=1/5p=1/5. Then E[X]E[Z]5γ\mathrm{E}\left[X\right]\leq\mathrm{E}\left[Z\right]\leq 5\gamma and hence, by Theorem 1 in (Witt14), we obtain for si=1γ11/pi225γs\coloneqq\sum_{i=1}^{\gamma-1}1/p_{i}^{2}\leq 25\gamma, and λ0\lambda\geq 0 the inequality Pr(ZE[Z]+λ)exp(14min{λ2s,λp})\Pr(Z\geq\mathrm{E}\left[Z\right]+\lambda)\leq\exp(-\frac{1}{4}\min\{\frac{\lambda^{2}}{s},\lambda p\}) and for λ=40γ+20α\lambda=40\gamma+20\alpha we obtain Pr(X(5+40)γ+20α)=Pr(X5γ+(40γ+20α))Pr(ZE[Z]+40γ+20α)e2γα\Pr(X\geq(5+40)\gamma+20\alpha)=\Pr(X\geq 5\gamma+(40\gamma+20\alpha))\leq\Pr(Z\geq\mathrm{E}\left[Z\right]+40\gamma+20\alpha)\leq e^{-2\gamma-\alpha}. By a union bound on |𝒜|=α|\mathcal{A}|=\alpha different Pareto-optimal vectors, we see that with probability at most αe2γα=o(1)\alpha\cdot e^{-2\gamma-\alpha}=o(1) a success occurred or the cover number of all v𝒜v\in\mathcal{A} is at least μ/α\lceil{\mu/\alpha}\rceil after further 45γ+20α45\gamma+20\alpha generations.

Case 2: Suppose that μ/α>n/ln(n)\lceil{\mu/\alpha}\rceil>\lceil{n/\ln(n)}\rceil (i.e. γ=n/ln(n)\gamma=\lceil{n/\ln(n)}\rceil). Fix vFv\in F, and let YtY_{t} denote the number of individuals xx such that f(x)=vf(x)=v at generation tt. Then with probability 1e2γα1-e^{-2\gamma-\alpha} after further 45γ+20α45\gamma+20\alpha generations by Case 1, a success occurred or the cover number of vv is at least nln(n)n\ln(n). Suppose the latter (otherwise the statement of the lemma holds) and let ZtZ_{t} be the number of newly created individuals with fitness vector vv in generation tt. We have E[Zt]Yt/4n/(4ln(n))\mathrm{E}\left[Z_{t}\right]\geq Y_{t}/4\geq n/(4\ln(n)): A generation consists of μ\mu independent trials and in each trial, with probability at least n/(ln(n)μ)n/(\ln(n)\mu), an individual xx with f(x)=vf(x)=v is selected as the parent, and during mutation, no bit is flipped with probability at least (11/n)n1/4(1-1/n)^{n}\geq 1/4. Hence, by a classical Chernoff bound, Pr(Zt3/5E[Zt])=Pr(Zt(12/5)E[Zt])eΩ(E[Zt])=eΩ(n/ln(n))\Pr(Z_{t}\leq 3/5\cdot\mathrm{E}\left[Z_{t}\right])=\Pr(Z_{t}\leq(1-2/5)\cdot\mathrm{E}\left[Z_{t}\right])\leq e^{-\Omega(\mathrm{E}\left[Z_{t}\right])}=e^{-\Omega(n/\ln(n))}. Hence, with probability at least 1eΩ(n/ln(n))1-e^{-\Omega(n/\ln(n))}, we have Yt+1min{Yt+3/5E[Zt],μ/α}=min{Yt+3Yt/20,μ/α}=min{23Yt/20,μ/α}Y_{t+1}\geq\min\left\{Y_{t}+3/5\cdot\mathrm{E}\left[Z_{t}\right],\lceil{\mu/\alpha}\rceil\right\}=\min\left\{Y_{t}+3Y_{t}/20,\lceil{\mu/\alpha}\rceil\right\}=\min\left\{23Y_{t}/20,\lceil{\mu/\alpha}\rceil\right\}. In other words, YtY_{t} increases by a factor of at least 23/2023/20 unless the value μ/α\lceil\mu/\alpha\rceil has already been reached. Note that n/ln(n)\lceil{n/\ln(n)}\rceil such generations in succession are sufficient to reach a cover number of vv of at least μ/α\lceil\mu/\alpha\rceil or a success occured, since n/ln(n)(23/20)n/ln(n)=ω(μ)n/\ln(n)\cdot(23/20)^{n/\ln(n)}=\omega(\mu). Moreover, such a sequence of generations occurs with probability at least 1eΩ(n/ln(n))1-e^{-\Omega(n/\ln(n))} by a union bound on all these generations. By a further union bound on all v𝒜v\in\mathcal{A}, we see that after n/ln(n)=γ\lceil{n/\ln(n)}\rceil=\gamma generations a success occurred or the cover number of each vFv\in F is at least μ/α\lceil{\mu/\alpha}\rceil with probability 1o(1)1-o(1).

Hence, in any case we see that with probability 1o(1)1-o(1), after κ:=64α+45γ+20α+γ=84α+46γ\kappa:=64\alpha+45\gamma+20\alpha+\gamma=84\alpha+46\gamma generations, each vFv\in F has cover number at most μ/α\lceil{\mu/\alpha}\rceil with probability 1o(1)1-o(1). If this does not happen, we repeat the arguments from either Case 1 or Case 2 for another period of κ\kappa generations, including the preceding phase from Lemma 3 to cover 𝒜\mathcal{A} if necessary. The expected number of periods is 1+o(1)1+o(1), concluding the proof. ∎

Controlling the Exploration of Search Points: First, we bound the spread of solutions in O(n/ln(n))O(n/\ln(n)) generations.

Lemma 5.

Consider NSGA-III on 22-OMM under the same conditions as in Lemma 1. Suppose that μ=poly(n)\mu=\text{poly}(n) and let c>0c>0 be a constant. Then, after cn/ln(n)cn/\ln(n) generations, there is no yPty\in P_{t} with |y|13n/4|y|_{1}\geq 3n/4 with probability 1o(1)1-o(1).

Proof.

Let dtmin{max{3n/4|y|1,0}yPt}d_{t}\coloneqq\min\{\max\{3n/4-|y|_{1},0\}\mid y\in P_{t}\}. By a classical Chernoff bound each individual xx satisfies 3n/8<|x|1<5n/83n/8<|x|_{1}<5n/8 with probability 1μeΩ(n)=1o(1)1-\mu e^{-\Omega(n)}=1-o(1) after initialization. Suppose that this happens. Then d0n/8d_{0}\geq n/8 and therefore, in order to create an individual yy with |y|13n/4|y|_{1}\geq 3n/4 within cn/ln(n)2cn/ln(n)\lceil{cn/\ln(n)}\rceil\leq 2cn/\ln(n) generations, it is necessary that dtd_{t} reaches 0, particularly decreases by at least ln(n)/(16c)\ln(n)/(16c) in one such iteration. This requires that at least :=ln(n)/(16c)\ell:=\lceil{\ln(n)/(16c)}\rceil many zero bits are flipped simultaneously in one individual. The latter happens with probability at most (n)(1n)=n!!(n)!n1!e=eω()=eω(ln(n))\binom{n}{\ell}\left(\frac{1}{n}\right)^{\ell}=\frac{n!}{\ell!(n-\ell)!n^{\ell}}\leq\frac{1}{\ell!}\leq\frac{e^{\ell}}{\ell^{\ell}}=e^{-\omega(\ell)}=e^{-\omega(\ln(n))} in one single trial where the last inequality is due to Stirling’s formula. By a union bound on at most μcn/ln(n)\mu\lceil{cn/\ln(n)}\rceil mutation steps after cn/ln(n)cn/\ln(n) generations, we see that the probability is o(1)o(1) to decrease dtd_{t} by at least \ell one time within cn/ln(n)\lceil{cn/\ln(n)}\rceil generations (since μ=poly(n)\mu=\text{poly}(n)), concluding the proof. ∎

We now bound the exploration of search points on an interval of the form [nnb,nna][n-n^{b},n-n^{a}] for constants 0a<b3/40\leq a<b\leq 3/4 towards the all-one string by providing a lower bound on the number of generations required to traverse this interval.

Lemma 6.

Consider NSGA-III on 22-OMM under the same conditions as in Lemma 1. Let 0a<b3/40\leq a<b\leq 3/4 be two constants. Assume that the maximum cover number is at most β=o(n1b)\beta=o(n^{1-b}). Suppose every xPtx\in P_{t} satisfies |x|1nnb|x|_{1}\leq n-n^{b}. Then with probability 1o(1)1-o(1) NSGA-III requires more than (ba)nln(n)/(32eβ)(b-a)n\ln(n)/(32e\beta) generations to create an individual xx with |x|1nna|x|_{1}\geq n-n^{a}. Hence, the expected number of generations is at least Ω(nln(n)/β)\Omega(n\ln(n)/\beta).

Proof.

Consider Ytmax{max{|x|1,nnb}xPt}Y_{t}\coloneqq\max\{\max\{|x|_{1},n-\lceil{n^{b}}\rceil\}\mid x\in P_{t}\}. If all individuals xx satisfy |x|1nnb|x|_{1}\leq n-\lceil{n^{b}}\rceil (which is the case at the beginning) then Yt=nnbY_{t}=n-\lceil{n^{b}}\rceil and we created an xx with |x|1nna|x|_{1}\geq n-n^{a} if YtnnaY_{t}\geq n-\lfloor{n^{a}}\rfloor. At first we bound the probability pp^{*} to increase YtY_{t} by at least 88 in one generation from above as follows. In one single trial for each i{0,,Yt}i\in\{0,\ldots,Y_{t}\} one can choose an xPtx\in P_{t} with |x|1=Yti|x|_{1}=Y_{t}-i (i.e. |x|0=nYt+i|x|_{0}=n-Y_{t}+i) (prob. at most β/μ\beta/\mu) and then flip i+8i+8 zero bits (prob. at most (nYt+ii+8)1/ni+8\binom{n-Y_{t}+i}{i+8}\cdot 1/n^{i+8}). By a union bound on all ii, we obtain that YtY_{t} increases by at least 88 in a single trial with probability at most βμi=0Yt(nYt+ii+8)1ni+8βμi=0Yt(nb+ii+8)1ni+8βμ(nbn)8i=0Yt(nb+i)(nb+1)ni(i+8)!βμ(nbn)8\frac{\beta}{\mu}\sum_{i=0}^{Y_{t}}\binom{n-Y_{t}+i}{i+8}\frac{1}{n^{i+8}}\leq\frac{\beta}{\mu}\sum_{i=0}^{Y_{t}}\binom{\lceil{n^{b}}\rceil+i}{i+8}\frac{1}{n^{i+8}}\leq\frac{\beta}{\mu}\left(\frac{\lceil{n^{b}}\rceil}{n}\right)^{8}\sum_{i=0}^{Y_{t}}\frac{(\lceil{n^{b}}\rceil+i)\cdot\ldots\cdot(\lceil{n^{b}}\rceil+1)}{n^{i}(i+8)!}\leq\frac{\beta}{\mu}\left(\frac{\lceil{n^{b}}\rceil}{n}\right)^{8}(where we used nb+i2n\lceil{n^{b}}\rceil+i\leq 2n and i=0Yt2i(i+8)!1\sum_{i=0}^{Y_{t}}\frac{2^{i}}{(i+8)!}\leq 1). Hence, by a union bound on μ\mu single trials, we obtain the inequality pβ(nb/n)8256β/n2p^{*}\leq\beta\cdot(\lceil{n^{b}}\rceil/n)^{8}\leq 256\beta/n^{2} (since b3/4b\leq 3/4 as well as r2r\lceil{r}\rceil\leq 2r for all r1r\geq 1 is satisfied). Again by a union bound, YtY_{t} changed by at least 88 after (ba)nln(n)/(32aβ)(b-a)n\ln(n)/(32a\beta) generations with probability o(1)o(1). So we assume that YtY_{t} is never changed by at least 88 and for Yt[nnb,,nna]Y_{t}\in[n-\lceil{n^{b}}\rceil,\ldots,n-\lfloor{n^{a}}\rfloor] and natural 1(nbna)/81\leq\ell\leq(\lceil{n^{b}}\rceil-\lfloor{n^{a}}\rfloor)/8 let XX_{\ell} be the random variable which counts the number of generations with Yt{nnb+8(1),,nnb+81}Y_{t}\in\{n-\lceil{n^{b}}\rceil+8(\ell-1),\ldots,n-\lceil{n^{b}}\rceil+8\ell-1\}. Now, for k:=k(n,):=nb8(1)k:=k(n,\ell):=\lceil{n^{b}}\rceil-8(\ell-1) we justify that XX_{\ell} stochastically dominates a geometrically distributed random variable ZZ_{\ell} with success probability p=1.5eβk/np_{\ell}=1.5e\beta k/n:

A necessary condition that YtY_{t} leaves {nk,,nnb+81}\{n-k,\ldots,n-\lceil{n^{b}}\rceil+8\ell-1\} is that YtY_{t} increases by one in a generation which happens with probability at most βμi=0Yt(nYt+ii+1)1ni+1βμi=0Yt(k+ii+1)1ni+1βμkni=0Yt(k+i)(k+1)ni(i+1)!1.5eβkμn\frac{\beta}{\mu}\sum_{i=0}^{Y_{t}}\binom{n-Y_{t}+i}{i+1}\frac{1}{n^{i+1}}\leq\frac{\beta}{\mu}\sum_{i=0}^{Y_{t}}\binom{k+i}{i+1}\frac{1}{n^{i+1}}\leq\frac{\beta}{\mu}\frac{k}{n}\sum_{i=0}^{Y_{t}}\frac{(k+i)\cdot\ldots\cdot(k+1)}{n^{i}(i+1)!}\leq\frac{1.5e\beta k}{\mu n} (by choosing a parent xx with |x|1=Yti|x|_{1}=Y_{t}-i and then flipping i+1i+1 zero bits for i{0,,Yt}i\in\{0,\ldots,Y_{t}\}). For the last inequality we used k+i(1+ln(1.5))nk+i\leq(1+\ln(1.5))n for all i{0,,Yt}i\in\{0,\ldots,Y_{t}\} and nn sufficiently large, and i=0Yt(1+ln(1.5))i(i+1)!i=0(1+ln(1.5))i(i+1)!=e1+ln(1.5)=1.5e\sum_{i=0}^{Y_{t}}\frac{(1+\ln(1.5))^{i}}{(i+1)!}\leq\sum_{i=0}^{\infty}\frac{(1+\ln(1.5))^{i}}{(i+1)!}=e^{1+\ln(1.5)}=1.5e. Then apply a union bound on μ\mu trials to finish the justification.

This implies that for δ:=δ(a,b,n):=nbna8\delta:=\delta(a,b,n):=\lfloor{\frac{\lceil{n^{b}}\rceil-\lfloor{n^{a}}\rfloor}{8}}\rfloor the number TT of generations until YtnnaY_{t}\geq n-\lfloor{n^{a}}\rfloor (which is at least =1δX\sum_{\ell=1}^{\delta}X_{\ell}) stochastically dominates the independent sum Z=1δZZ\coloneqq\sum_{\ell=1}^{\delta}Z_{\ell} of geometrically distributed random variables ZZ_{\ell}. Note also that ln(n)i=1n1/iln(n)+1\ln(n)\leq\sum_{i=1}^{n}1/i\leq\ln(n)+1 and therefore i=1n1/ii=1q1/iln(n)(ln(q)+1)=ln(n/q)1\sum_{i=1}^{n}1/i-\sum_{i=1}^{q}1/i\geq\ln(n)-(\ln(q)+1)=\ln(n/q)-1 for q[n]q\in[n]. Therefore, we obtain E[Z]==1δE[Z]==1δ1p==1δn1.5eβ(nb8(1))=n12eβ=0δ11nb/8n12eβ=0δ11γn12eβ(=0γ11γ=δγ11γ)n12eβ(=1γ1=1γδ1)n12eβ(ln(γγδ)1)n12eβ(ln(nb/8na/8+1)1)n12eβ(ln(nbna+8)1)(ba)nln(n)16eβ\mathrm{E}\left[Z\right]=\sum_{\ell=1}^{\delta}\mathrm{E}\left[Z_{\ell}\right]=\sum_{\ell=1}^{\delta}\frac{1}{p_{\ell}}=\sum_{\ell=1}^{\delta}\frac{n}{1.5e\beta(\lceil{n^{b}}\rceil-8(\ell-1))}=\frac{n}{12e\beta}\sum_{\ell=0}^{\delta-1}\frac{1}{\lceil{n^{b}}\rceil/8-\ell}\geq\frac{n}{12e\beta}\sum_{\ell=0}^{\delta-1}\frac{1}{\gamma-\ell}\geq\frac{n}{12e\beta}\left(\sum_{\ell=0}^{\gamma-1}\frac{1}{\gamma-\ell}-\sum_{\ell=\delta}^{\gamma-1}\frac{1}{\gamma-\ell}\right)\geq\frac{n}{12e\beta}\left(\sum_{\ell=1}^{\gamma}\frac{1}{\ell}-\sum_{\ell=1}^{\gamma-\delta}\frac{1}{\ell}\right)\geq\frac{n}{12e\beta}\left(\ln\left(\frac{\gamma}{\gamma-\delta}\right)-1\right)\geq\frac{n}{12e\beta}\left(\ln\left(\frac{\lceil{n^{b}}\rceil/8}{\lfloor{n^{a}}\rfloor/8+1}\right)-1\right)\geq\frac{n}{12e\beta}\left(\ln\left(\frac{n^{b}}{n^{a}+8}\right)-1\right)\geq\frac{(b-a)n\ln(n)}{16e\beta} for nn sufficiently large. Then, under the condition that YtY_{t} never changes by at least 88 within (ba)nln(n)/(32eβ)(b-a)n\ln(n)/(32e\beta) generations, we see by Theorem 1 in (Witt14) that for λ:=E[Z]/2\lambda:=\mathrm{E}\left[Z\right]/2 and s:==1δ1/p2==1δn22.25e2β2(nb8(1))2j=1n22.25e2β2j2n2π213.5e2β2s:=\sum_{\ell=1}^{\delta}1/p_{\ell}^{2}=\sum_{\ell=1}^{\delta}\frac{n^{2}}{2.25e^{2}\beta^{2}(\lceil{n^{b}}\rceil-8(\ell-1))^{2}}\leq\sum_{j=1}^{\infty}\frac{n^{2}}{2.25e^{2}\beta^{2}j^{2}}\leq\frac{n^{2}\pi^{2}}{13.5e^{2}\beta^{2}} (due to i=11/i2=π2/6\sum_{i=1}^{\infty}1/i^{2}=\pi^{2}/6) the inequality Pr(T(ba)nln(n)32eβ)Pr(ZE[Z]/2)=Pr(ZE[Z]E[Z]/2)exp(λ2/(2s))=o(1)\Pr(T\leq\frac{(b-a)n\ln(n)}{32e\beta})\leq\Pr(Z\leq\mathrm{E}\left[Z\right]/2)=\Pr(Z\leq\mathrm{E}\left[Z\right]-\mathrm{E}\left[Z\right]/2)\leq\exp(-\lambda^{2}/(2s))=o(1) holds. This proves the lemma with the law of total probability. ∎

A Lower Runtime Bound

In this section, we establish the desired lower bound on the runtime of NSGA-III on the 22-OMM problem by putting together the results from the previous section.

Theorem 7.

Consider NSGA-III on 22-OMM under the same conditions as in Lemma 1. Further suppose that μO(ln(n)cn)\mu\in O(\ln(n)^{c}n) for a constant 0<c<10<c<1. Then the expected number of generations to cover the whole Pareto front is at least Ω(n2ln(n)/μ)\Omega(n^{2}\ln(n)/\mu).

Proof.

Fix a constant χ>0\chi>0 such that μχln(n)cn\mu\leq\chi\ln(n)^{c}n for nn sufficiently large. At first we see by Lemma 3 that with probability 1o(1)1-o(1) there is no individual yy in PtP_{t} with |y|13n/4|y|_{1}\geq 3n/4 within 130n/ln(n)130\lfloor{n/\ln(n)}\rfloor generations. Further, by Lemma 4 on α=n/ln(n)\alpha=\lfloor{n/\ln(n)}\rfloor we obtain that after 130α130\alpha generations the maximum cover number is at most μ/αμn/ln(n)1+12μln(n)/n2ln(n)1+c\lceil{\mu/\alpha}\rceil\leq\frac{\mu}{n/\ln(n)-1}+1\leq 2\mu\ln(n)/n\leq 2\ln(n)^{1+c} with probability 1o(1)1-o(1). Suppose that this happens. We now apply Lemma 6 with b=3/4b=3/4 and a=1/2a=1/2 to obtain with probability 1o(1)1-o(1) that after further (ba)nln(n)/(32e2ln(n)1+c)=n/(256eln(n)c)=d0n/ln(n)c(b-a)n\ln(n)/(32e\cdot 2\ln(n)^{1+c})=n/(256e\ln(n)^{c})=d_{0}n/\ln(n)^{c} generations for d0=1/(256e)d_{0}=1/(256e), no solution xx with |x|1nn1/2|x|_{1}\geq n-n^{1/2} is created. If this happens, apply Lemma 4 on that number of generations for α=d0n/(130ln(n)c)\alpha=\lfloor{d_{0}n/(130\ln(n)^{c})}\rfloor (note that 130αd0n/ln(n)c130\alpha\leq d_{0}n/\ln(n)^{c}) to obtain for e0:=260χ/d0e_{0}:=260\chi/d_{0} that with probability 1o(1)1-o(1) the maximum cover number is at most μ/αχln(n)cn/αe0ln(n)2c=max{e0ln(n)2c,16μ/(3n)}\lceil{\mu/\alpha}\rceil\leq\lceil{\chi\ln(n)^{c}n/\alpha}\rceil\leq e_{0}\ln(n)^{2c}=\max\{e_{0}\ln(n)^{2c},16\mu/(3n)\} for nn sufficiently large (the latter equality holds due to e0ln(n)2cω(μ/n)e_{0}\ln(n)^{2c}\in\omega(\mu/n)).

Suppose that these two happen. In the following, we iteratively reduce the maximum cover number as the population approaches the extreme solution 1n1^{n}. To this end, let :=(2c+1)/(1c)O(1)\ell:=\lceil{(2c+1)/(1-c)}\rceil\in O(1) and suppose that for j{0,,1}j\in\{0,\ldots,\ell-1\} there are constants 0<bj<1/20<b_{j}<1/2 and dj,ej0d_{j},e_{j}\geq 0 such that after djnln(n)j/ln(n)(2+j1)cd_{j}n\ln(n)^{j}/\ln(n)^{(2+j-1)c} generations, no solution xx with |x|1nnbj|x|_{1}\geq n-n^{b_{j}} is created, that (ln(n))(2+j)cj=ω(μ/n)(\ln(n))^{(2+j)c-j}=\omega(\mu/n) and the maximum cover number is at most β=ej(ln(n))(2+j)cj=max{ej(ln(n))(2+j)cj,16μ/(3n)}\beta=e_{j}(\ln(n))^{(2+j)c-j}=\max\{e_{j}(\ln(n))^{(2+j)c-j},16\mu/(3n)\} (where the case j=0j=0 already occurred). Now fix a further constant bj+1b_{j+1} with 1/8<bj+1<bj1/8<b_{j+1}<b_{j}. Then again by Lemma 6 we see that with probability 1o(1)1-o(1) in (bjbj+1)nln(n)32eβ=(bjbj+1)nln(n)32eej(ln(n))(2+j)cj=dj+1nln(n)j+1ln(n)(2+j)c\frac{(b_{j}-b_{j+1})n\ln(n)}{32e\cdot\beta}=\frac{(b_{j}-b_{j+1})n\ln(n)}{32e\cdot e_{j}(\ln(n))^{(2+j)c-j}}=\frac{d_{j+1}n\ln(n)^{j+1}}{\ln(n)^{(2+j)c}} generations for dj+1=bjbj+132eejd_{j+1}=\frac{b_{j}-b_{j+1}}{32e\cdot e_{j}} no solution xx with |x|1nnbj+1|x|_{1}\geq n-n^{b_{j+1}} is created. After this time, by Lemma 4 on α=min{dj+1nln(n)j+1(130ln(n)(2+j)c),3n/8}\alpha=\min\{\lfloor{\frac{d_{j+1}n\ln(n)^{j+1}}{(130\ln(n)^{(2+j)c})}}\rfloor,\lfloor{3n/8}\rfloor\}, the maximum cover number is at most μ/αmax{260μln(n)(2+j)cdj+1nln(n)j+1,16μ3n}max{260χnln(n)(3+j)cdj+1nln(n)j+1,16μ3n}=max{ej+1ln(n)(3+j)cln(n)j+1,16μ3n}\lceil{\mu/\alpha}\rceil\leq\max\{\frac{260\mu\ln(n)^{(2+j)c}}{d_{j+1}n\ln(n)^{j+1}},\frac{16\mu}{3n}\}\leq\max\{\frac{260\chi n\ln(n)^{(3+j)c}}{d_{j+1}n\ln(n)^{j+1}},\frac{16\mu}{3n}\}=\max\{\frac{e_{j+1}\ln(n)^{(3+j)c}}{\ln(n)^{j+1}},\frac{16\mu}{3n}\} for ej+1:=260χ/dj+1e_{j+1}:=260\chi/d_{j+1} with probability 1o(1)1-o(1). If ln(n)(3+j)c(j+1)=ω(μ/n)\ln(n)^{(3+j)c-(j+1)}=\omega(\mu/n), we increase jj by one and repeat this argument. We stop when ln(n)(3+j)c(j+1)=O(μ/n)\ln(n)^{(3+j)c-(j+1)}=O(\mu/n). Since (3+)c(+1)0(3+\ell)c-(\ell+1)\leq 0, we have at most =O(1)\ell=O(1) such repetitions. After the last repetition we have that α=Ω(n)\alpha=\Omega(n). Hence, by applying a union bound on all repetitions, we conclude that with probability 1o(1)1-o(1), there exists a generation tspreadt^{\text{spread}} such that no individual xx with |x|1nn1/8|x|_{1}\leq n-n^{1/8} is created and the maximum cover number is at most espreadμ/ne_{\text{spread}}\mu/n for a constant espread>0e_{\text{spread}}>0. Suppose this event occurs, and apply Lemma 6 once more with b=1/8b=1/8 and a=1/16a=1/16. This yields that, after Ω(nln(n)/(espreadμ/n))=Ω(n2ln(n)/μ)\Omega(n\ln(n)/(e_{\text{spread}}\mu/n))=\Omega(n^{2}\ln(n)/\mu) generations in expectation (from time tspreadt^{\text{spread}} onward), a search point xx with |x|1nn1/16|x|_{1}\geq n-n^{1/16} is created, concluding the proof. ∎

An Improved Upper Runtime Bound

To complement our analysis, we establish an improved upper bound on the expected runtime of NSGA-III on mm-OMM for a constant number of objectives mm. Our approach closely follows the methodology provided by (OprisNSGAIII), with the added consideration of the cover number.

Theorem 8.

Consider NSGA-III on m-OMMm\text{-}\textsc{OMM} for a constant number mm of objectives under the same conditions as in Lemma 1 with population size μ(2n/m+1)m/2=:Sm\mu\geq(2n/m+1)^{m/2}=:S_{m}. Then a Pareto-optimal set of m-OMMm\text{-}\textsc{OMM} is found in expected O(min{Smnlnn/μ+nμ/Sm,nln(n)})O(\min\{S_{m}n\ln n/\mu+n\mu/S_{m},n\ln(n)\}) generations or, in other words, in expected O(min{Smnlnn+nμ2/Sm,μnln(n)})O(\min\{S_{m}n\ln n+n\mu^{2}/S_{m},\mu n\ln(n)\}) fitness evaluations.

Proof.

We can assume that μln(n)Sm\mu\leq\ln(n)S_{m} and μω(Sm)\mu\in\omega(S_{m}) since otherwise the bound from Theorem 5.2 in (OprisNSGAIII) of O(nln(n))O(n\ln(n)) expected generations holds. Fix a vector vv on the Pareto front. We estimate the probability not to cover vv after 6Smenln(n)/μ+10nμ/Sm6S_{m}en\ln(n)/\mu+10n\lceil{\mu/S_{m}}\rceil generations. For each generation tt let dt:=minxPtj=1m/2|f2j1(x)v2j1|d_{t}:=\min_{x\in P_{t}}\sum_{j=1}^{m/2}|f_{2j-1}(x)-v_{2j-1}|. Note that 0dtn0\leq d_{t}\leq n and that we have covered vv if dt=0d_{t}=0. Further, by Lemma 1(1), dtd_{t} cannot increase. Let yPty\in P_{t} be with j=1m/2|f2j1(y)v2j1|=dt\sum_{j=1}^{m/2}|f_{2j-1}(y)-v_{2j-1}|=d_{t}. We first increase the cover number of f(y)f(y) to μ/Sm\lfloor{\mu/S_{m}}\rfloor (which, by Lemma 1(2), can only decrease if it exceeds μ/Sm\lfloor{\mu/S_{m}\rfloor}, and even then not below this value), and then proceed to decrease dtd_{t}. The latter then happens with probability at least μ/Sm/μi/n(11/n)n1i/(2Smen)\lfloor{\mu/S_{m}\rfloor}/\mu\cdot i/n\cdot(1-1/n)^{n-1}\geq i/(2S_{m}en) in one single trial and hence, with probability at least 1(1i/(2Smen))μiμ/(2Smen)iμ/(2Smen)+1=:pi1-(1-i/(2S_{m}en))^{\mu}\geq\frac{i\mu/(2S_{m}en)}{i\mu/(2S_{m}en)+1}=:p_{i} in one generation. Hence, the time TT until dt=0d_{t}=0 is stochastically dominated by an independent sum of geometrically distributed random variables YiY_{i} , ZjZ_{j} (i{1,,n}i\in\{1,\ldots,n\} and j{1,,ν}j\in\{1,\ldots,\nu\} for ν:=nμ/Sm\nu:=n\cdot\lfloor{\mu/S_{m}}\rfloor) with success probability pip_{i} and p~=1/5\tilde{p}=1/5 respectively (compare also with the proof of Lemma 4 for the latter). Let Y:=i=1nYiY:=\sum_{i=1}^{n}Y_{i} and Z:=j=1νZjZ:=\sum_{j=1}^{\nu}Z_{j}. We have E[Y]=i=1n1/pi=i=1n(1+Smen/(iμ))n+Smen(ln(n)+1)/μ2Smenln(n)/μ\mathrm{E}\left[Y\right]=\sum_{i=1}^{n}1/p_{i}=\sum_{i=1}^{n}(1+S_{m}en/(i\mu))\leq n+S_{m}en(\ln(n)+1)/\mu\leq 2S_{m}en\ln(n)/\mu for nn sufficiently large and E[Z]=5nμ/Sm\mathrm{E}\left[Z\right]=5n\cdot\lfloor{\mu/S_{m}}\rfloor. By Theorem 1 in (Witt14) we obtain for s=i=1n1/pi2=i=1n(1+2Smen/(iμ))2s=\sum_{i=1}^{n}1/p_{i}^{2}=\sum_{i=1}^{n}(1+2S_{m}en/(i\mu))^{2}, p=mini[n]pi=p1μ/(4Smen)p=\min_{i\in[n]}p_{i}=p_{1}\geq\mu/(4S_{m}en), and λ=8mSmenln(n)/μ\lambda=8mS_{m}en\ln(n)/\mu that Pr(YE[Y]+λ)exp(14min{λ2s,λp})n2m\Pr(Y\geq\mathrm{E}\left[Y\right]+\lambda)\leq\exp\left(-\frac{1}{4}\min\left\{\frac{\lambda^{2}}{s},\lambda p\right\}\right)\leq n^{-2m} since λ2/s=Ω(ln2(n))\lambda^{2}/s=\Omega(\ln^{2}(n)) and λp2mln(n)\lambda p\geq 2m\ln(n). Further, we see for s~=25ν\tilde{s}=25\nu, and λ~=E[Z]\tilde{\lambda}=\mathrm{E}\left[Z\right] that Pr(ZE[Z]+λ~)exp(14min{λ~2s~,λ~p~})=eΩ(n).\Pr(Z\geq\mathrm{E}\left[Z\right]+\tilde{\lambda})\leq\exp(-\frac{1}{4}\min\{\frac{\tilde{\lambda}^{2}}{\tilde{s}},\tilde{\lambda}\tilde{p}\})=e^{-\Omega(n)}. These two inequalities imply Pr(TE[Y]+E[Z]+λ+λ~)Pr(Y+ZE[Y]+E[Z]+λ+λ~)Pr(YE[Y]+λ)+Pr(ZE[Z]+λ~)2n2m.\Pr(T\geq\mathrm{E}\left[Y\right]+\mathrm{E}\left[Z\right]+\lambda+\tilde{\lambda})\leq\Pr(Y+Z\geq\mathrm{E}\left[Y\right]+\mathrm{E}\left[Z\right]+\lambda+\tilde{\lambda})\leq\Pr(Y\geq\mathrm{E}\left[Y\right]+\lambda)+\Pr(Z\geq\mathrm{E}\left[Z\right]+\tilde{\lambda})\leq 2n^{-2m}. By a union bound on all possible vv, the probability that there is a fitness vector vv such that PtP_{t} does not contain a Pareto-optimal solution xx with f(x)=vf(x)=v after E[Y]+E[Z]+λ+λ~10Smenln(n)/μ+10nμ/Sm=O(Smnln(n)/μ+μn/Sm)\mathrm{E}\left[Y\right]+\mathrm{E}\left[Z\right]+\lambda+\tilde{\lambda}\leq 10S_{m}en\ln(n)/\mu+10n\lfloor{\mu/S_{m}}\rfloor=O(S_{m}n\ln(n)/\mu+\mu n/S_{m}) generations is at most (2n/m+1)m/22n2m=o(1)(2n/m+1)^{m/2}\cdot 2n^{-2m}=o(1). If this does not happen we repeat all the above arguments. Note that in expectation, 1+o(1)1+o(1) such periods are sufficient. ∎

This result improves the corresponding upper bound from (OprisNSGAIII) by a factor of min{Sm/μ,μ/(Smln(n))}\min\{S_{m}/\mu,\mu/(S_{m}\ln(n))\} both in terms of generations and fitness evaluations if Smμ=O(Smln(n))S_{m}\leq\mu=O(S_{m}\ln(n)). Along with Theorem 7, we see in the case m=2m=2 that for n+1μ(n+1)ln(n)cn+1\leq\mu\leq(n+1)\ln(n)^{c} for a constant 0c1/20\leq c\leq 1/2 the full Pareto front is covered in expected Θ(n2ln(n)/μ)\Theta(n^{2}\ln(n)/\mu) generations, which is a tight runtime bound.

Conclusions

In this paper, we analyzed the widely used NSGA-III algorithm on the simple mm-OMM problem and established lower runtime bounds for m=2m=2, as well as improved upper runtime bounds for a constant number mm of objectives compared to (OprisNSGAIII). For m=2m=2, this leads to a tight runtime bound when employing a superconstant yet carefully chosen population size μ\mu. In this setting, NSGA-III even outperforms NSGA-II, due to its ability to distribute solutions very evenly across the Pareto front. This is very surprising, since the latter is the state of the art algorithm for bi-objective problems (with around 60000 citations). Unlike previous work (opris2025multimodal), where NSGA-III’s dynamics were analyzed on mm-OJZJ by first exploring the local optima, and then spreading the solutions evenly across the Pareto front, our analysis required a more refined investigation of the population dynamics. In particular, we bound the maximum cover number during the exploration process toward the all-ones string in several stages, where the spread of solutions is not hindered by local optima. These insights provide a deeper understanding of the strengths and limitations of NSGA-III and may serve as a foundation for analyzing its behavior on more complex fitness landscapes. Ultimately, this understanding can aid practitioners in developing enhanced versions of the algorithm with improved performance for efficiently optimizing problems defined by diverse and rugged fitness landscapes. Future research directions may include bounding the maximum cover number on benchmark problems, where it is necessary to reach the Pareto front at a first glance, as well as applying the insights on population dynamics to practical scheduling and graph problems.