Towards a Rigorous Understanding of the Population Dynamics of the NSGA-III: Tight Runtime Bounds
Abstract
Evolutionary algorithms are widely used for solving multi-objective optimization problems. A prominent example is NSGA-III, which is particularly well suited for solving problems involving more than three objectives, distinguishing it from the classical NSGA-II. Despite its empirical success, the theoretical understanding of NSGA III remains very limited, especially with respect to runtime analysis. A central open problem concerns its population dynamics, which involve controlling the maximum number of individuals sharing the same fitness value during the exploration process. In this paper, we make a significant step towards such an understanding by proving tight runtime bounds for NSGA-III on the bi-objective OneMinMax (-OMM) problem. Firstly, we prove that NSGA-III requires generations in expectation to optimize -OMM assuming the population size satisfies where denotes the problem size and is a constant. Apart from (opris2025multimodal), this is the first proven lower runtime bound for NSGA-III on a classical benchmark problem. Complementing this, we secondly improve the best known upper bound of NSGA-III on the -objective OneMinMax problem (-OMM) of generations by a factor of for a constant number of objectives and population size . This yields tight runtime bounds in the case , and the surprising result that NSGA-III beats NSGA-II by a factor of in the expected runtime.
Introduction
Decision making is a fundamental aspect of many areas in artificial intelligence, where it is often important to explore trade-offs and compromises between different options before reaching a conclusion (LUUKKONEN2023102537). Such situations are often formulated as multi-objective optimization problems, which are typically tackled using evolutionary multi-objective algorithms (STEWART2021103830). They apply principles of nature to optimize functions with conflicting objectives, aiming to find a diverse Pareto-optimal set of solutions. This offers decision makers a range of trade-off solutions, enabling them to select the one that best aligns with their preferences (TAMSSAOUET202287). It is therefore not surprising that such algorithms have become essential tools, widely applied across various practical domains. These include artificial intelligence (ArtIntKoziel), often in combination with bioinformatics (HANDLBIO; MultiBioBook), as well as constraint optimization (GARCIA2021100983), machine learning (9515233; EMOAsMachine), and engineering (MultiobjEngineerring; EMOAEngineer). In particular, many of such real-world applications involve optimization problems with many objectives. However, a huge challenge is that, as the number of objectives increases, the Pareto front expands exponentially, making the problems increasingly complex. Additionally, identifying dependencies between individual objectives becomes more difficult. There are already differences between two and more objectives. In the case of two objectives, sorting non-dominated individuals according to the first objective naturally leads to a reverse sorting with respect to the second, making the crowding distance, which measures the proximity of search points based on their sorting across objectives, a reliable indicator of their relative closeness. However, this relationship breaks down for problems with three or more objectives, as a solution can have a crowding distance of zero even when it is not close to other solutions (see, for example (Zheng2023Inefficiency)). As a result, NSGA-II (Deb2002), the most cited EMOA ( 56000 citations) which uses the crowding distance as a tie-breaker, succeeds in solving bi-objective problems (see for example (ZhengLuiDoerrAAAI22) for a rigorous analysis or (Deb2002) for empirical results), but fails in optimizing many problems where the number of objectives is large (compare with (Zheng2023Inefficiency) for large differences already between two and three objectives or for empirical studies (NSGAIIINEFF2003)). To overcome this problem, DebJain2014 designed the NSGA-III algorithm. It uses a set of predefined reference points instead of the crowding distance. A major advantage is that these reference points can be predefined by users based on their specific needs. Hence, this algorithm has a huge practical impact (6000 citations) and it is empirically shown that it can efficiently solve problems with at least four objectives (DebJain2014; NSGAIIIAppl; NSGAIIIAPPLL). However, theoretical understanding of its success lags far behind its practical impact and the first papers addressing rigorous runtime analyses of this algorithm appeared only recently (see for example (WiethegerD23; OprisNSGAIII) for breakthroughs). Surprisingly, even in simple settings, opris2025multimodal showed that NSGA-III exhibits population dynamics that differ significantly from those of NSGA-II: NSGA-III successively iterates through all reference points, always choosing a point associated to a reference point with the fewest chosen individuals so far for the next generation, while NSGA-II treats all points with zero crowding distance equally. Hence, NSGA-III tends to spread solutions very evenly across the Pareto front (compare also with (CHAUDHARI20221509) for empirical results). Indeed, for appropriate population sizes, it was shown in (opris2025multimodal) that NSGA-III outperforms NSGA-II on the pseudo-Boolean bi-objective multimodal function OJZJ for appropriate population sizes. However, they showed how NSGA-III spreads solutions evenly across the Pareto front after converging to local optima. How this distribution evolves during exploration, particularly before reaching a local optimum or when no local optima exist at all, as in -OMM, remains unclear. It is still unknown when and why NSGA-III performs well, or how quickly it spreads solutions across the Pareto front. As a first step toward understanding its limitations on many-objective problems, we focus on the bi-objective case, which already exhibits complex population dynamics.
Our contribution: We significantly increase the understanding of the population dynamics of the NSGA-III on the pseudo-Boolean -OMM by investigating the maximum cover number , defined as the maximum number of individuals in the population sharing the same fitness vector. opris2025multimodal showed that is non-increasing. Our first two results are about the time to firstly cover a subset of the Pareto front of a given cardinality (Lemma 3), and then spread all solutions evenly on that set (Lemma 4). With high probability, this time is . On the other hand, for a given maximum cover number , we analyze the population’s exploration towards the extreme point . Specifically, for two constants , we provide a lower bound on the time required to reduce the population’s distance to from at least to at most . With high probability, this time is (Lemmas 5 and 6). This bound increases asymptotically with , which is unsurprising since a smaller reduces the probability of selecting individuals already close to , and further decreasing their distance through mutation. Then, this bound can be used, in conjunction with Lemmas 3 and 4, to further reduce the cover number by covering and afterwards spreading on a set of cardinality for carefully chosen population sizes. This again results, by Lemmas 5 and 6, in a larger lower bound of . By repeating this process a finite number of times, we finally reduce to a value of before creating a search point with distance at most to . Notably, this is the smallest possible value for up to a multiplicative constant, and leads to a lower bound of expected generations for NSGA-III to optimize -OMM (Theorem 7). To the best of our knowledge, this is the first established lower runtime bound of NSGA-III on a classical benchmark problem without multimodality. Finally, we improve the upper bound from (OprisNSGAIII) for any constant number of objectives and by a factor of , showing that -OMM can be optimized in generations in expectation (Theorem 8). This aligns with the earlier lower bound for the case and reveals the somewhat surprising result that NSGA-III outperforms NSGA-II by a factor of (see (DoerrQu2023a) for a lower runtime bound of generations for NSGA-II to optimize -OMM, if for ). Despite the latter being the state of the art algorithm for two objectives and widely used in practice (with around 60,000 citations).
Related work: The mathematical runtime analysis of modern practical MOEAs began only recently. ZhengLuiDoerrAAAI22 conducted the first runtime analysis of NSGA-II on classical benchmark functions which was the starting point for plenty of successive works in similar contexts dealing with optimizing bi-objective functions by NSGA-II (Qu2022PPSN; DaOp2023; NSgaIIBeat; Dang2024; DoerrQ23b; DoerrApprox; UpBian; 2025Lessons) which has even been extended on combinatorial optimization problems like minimum spanning trees or subset selection (NSGAIICombIJCAI; MOEASubset). Very recently, variants of the NSGA-II were proposed to overcome its shortcomings in solving many-objective problems by adding a simple tie-breaking rule (Krejca2025b) or by using an alternative version of the crowding distance (ZhengDoerrCrowding). The most prominent algorithms when dealing with many-objectives are the SPEA2, SMS-EMOA and NSGA-III which have also been analyzed successfully (WiethegerD23; Zheng_Doerr_2024; OprisNSGAIII; DoerrNearTight; Opris2025; opris2025multimodal). However, up to (opris2025multimodal; DoerrQu2023a) there were no proven tight runtime bounds on the performance of NSGA-II or NSGA-III on classical benchmark functions, despite investigating limitations of EMOAs and proving lower bounds by analyzing their population dynamics is a highly active area of research. For example, Opris2025PAES abalyzed the PAES-25 evolutionary strategy with one-bit mutation on the -objective LeadingOnesTrailingZeros problem, proving tight runtime bounds of for , for , and for . Additional tight runtime bounds for the GSEMO algorithm on the bi-objective COCZ and OMM benchmarks are provided in (doerr2025tightruntimeGSEMO). To the best of our knowledge, there are no existing results on the population dynamics of NSGA-III, apart from the investigations on OJZJ in (opris2025multimodal).
Preliminaries
Given two random variables and on , we say that stochastically dominates if for all . The number of ones in a bit string is denoted by and the number of zeros by , respectively. For any finite set , we write to denote its cardinality. For let , denote by the natural logarithm (i.e. to base ) and let be a placeholder for some polynomial in .
This paper is about many-objective optimization, specifically the maximization of a discrete -objective function for where each for . When , the function is also called bi-objective. For a bit string let where all are of equal length . For a subset , we define . Given two search points , weakly dominates , denoted by , if for all and (strictly) dominates , denoted by , if one inequality is strict; if neither nor then and are incomparable. A set is a set of mutually incomparable solutions with respect to if all search points in are incomparable. Each solution not dominated by any other in is called Pareto-optimal. A mutually incomparable set of these solutions that covers all possible non-dominated fitness values is called a Pareto(-optimal) set of . For a population and denote by the cover number of , i.e. the number of individuals from with fitness vector . We say that is covered if .
The NSGA-III algorithm, originated in (DebJain2014) is shown in Algorithm 1. Initially, a population of size is created by choosing individuals from uniformly at random. Then in each iteration , a multiset of new offspring is created by times choosing an individual uniformly at random and applying standard bit mutation on , i.e. each bit is flipped independently with probability . During the survival selection, the parent and offspring populations and are merged into . Then is partitioned into layers using the non-dominated sorting algorithm (Deb2002) where consists of all non-dominated individuals, and for of individuals only dominated by those from . Then the critical rank with and is determined. All individuals with a lower rank than are included in , while the remaining individuals are selected from using Algorithm 2. Hereby, a normalized objective function is computed and then each individual with rank at most is associated with reference points. For the first, we use the normalization procedure from (WiethegerD23) which can be also used for maximization problems as shown in (OprisNSGAIII). We omit detailed explanations as they are not needed for our purposes. For an -objective function , the normalized fitness vector of a search point is computed as
for each where and from the objective space are called nadir and ideal points, respectively. Computing the nadir point is not trivial and we have , and for every where is a positive threshold set by the user (see (Blank2019) or (WiethegerD23) for the details). Further, and are the maximum and minimum value in objective from all search points seen so far (i.e. from ). After computing the normalisation, each individual is associated with the reference point such that the distance between and the line through the origin and is minimal. We use the same set of reference points as proposed in (DebJain2014), originated in (Das1998). The points are defined as
where is a parameter one can choose according to the fitness function . These are uniformly distributed on the simplex determined by the unit vectors .
Then, one iterates through all the reference points where the reference point with the fewest associated individuals that are already selected for the next generation is chosen. A reference point is omitted if it only has associated individuals that are already selected for and ties are broken uniformly at random. Next, from the individuals associated to that reference point who have not yet been selected, the one closest to the chosen reference point is selected for the next generation, where ties are again broken uniformly at random. Once the required number of individuals is reached (i.e. if ) the selection ends.
For our analysis, we need the following key lemma regarding the cover number of a Pareto-optimal fitness vector. This also includes the fact that NSGA-III protects Pareto-optimal solutions meaning that if the population size is larger than a set of mutually incomparable solutions and a Pareto-optimal fitness vector is covered, then it is covered for all future generations. It is a combination of Lemma 2 from (OprisNSGAIII) and Lemma 3.4 from (opris2025multimodal).
Lemma 1.
Consider NSGA-III on an -objective function with and a set of reference points for with . Denote by the current population and by the Pareto front of . Let be a maximum set of mutually incomparable solutions and suppose that . Then the following properties hold.
-
(1)
If then for each there is an weakly dominating .
-
(2)
Let and . If then also .
-
(3)
Let and suppose that . Then for every .
-
(4)
Suppose that every is Pareto-optimal. Then does not increase.
The benchmark, originated in (Zheng2023Inefficiency), is defined as follows.
Definition 2.
Let be divisible by and let the problem size be a multiple of . Then the -objective function -OMM is defined by as
with
for all .
In , the bit string is divided into blocks, where and correspond to block . Specifically, counts the number of ones, and counts the number of zeros in block . Every search point is Pareto-optimal, as the total sum of objectives of any bit string is . A Pareto-optimal set thus, which is also a maximum set of mutually incomparable solutions, has cardinality , since for each block there are at most many fitness values .
Population Dynamics of NSGA-III on -OMM
Bounding the Maximum Cover Number: First, we establish a general upper bound on the time required to cover a subset of the Pareto front of a given cardinality with high probability. Then, we provide an additional bound on the time needed to evenly distribute solutions across that subset.
Lemma 3.
Consider NSGA-III on -OMM under the same conditions as in Lemma 1. (i.e. since and ). Then for a given natural there is with cardinality which is covered in generations with probability at least .
Proof.
Denote by the Pareto front of -OMM. By a classical Chernoff bound the probability is at least that there is an individual initialized with . Suppose that this happens and fix such an individual . Let . Then we see that . Fix a covered and another uncovered . We show with probability at least the vector is covered after generations. Let and . Note that . Further, we see that is covered if since if then also due to . By Lemma 1(1), cannot increase, but it can be decreased in one single trial by choosing with as parent (probability at least ) and then flipping a one bit to zero and not changing any other bit if . On the other hand, if , flip a zero bit to one. Both happen with probability at least . Then in one generation, decreases with probability at least where the first inequality is due to Lemma 10 in (Badkobeh2015). Note that and for each , define the random variable as the number of generations such that . Then the time until is , the latter stochastically dominated by the independent sum of geometrically distributed random variables with success probability . Note that and we obtain by Theorem 1 in (Witt14) for and the inequality . For we obtain . By a union bound on all we see that is covered in generations with probability . ∎
Now we give the following upper bound on the time such that, with high probability, the solutions are evenly spread on a set with cardinality or, in other words, the cover number of each is bounded by from above.
Lemma 4.
Consider NSGA-III on -OMM and suppose that all conditions of Lemma 1 are satisfied. Suppose that . Let be a natural number and let . Then after generations, each has cover number at most with probability . Hence, if then and therefore, generations suffice. The expected number of generations is .
Proof.
Denote by the Pareto front of -OMM. By Lemma 3 there is a set with cardinality which is covered after generations with probability at least . Suppose that this happens. Denote the decrease of the cover number of a vector before reaching as a success. If a success occurs, we see that the cover number of all other Pareto-optimal vectors is at most by Lemma 1(3) and it cannot increase by Lemma 1(4) (since every solution is Pareto-optimal) and hence, the lemma holds. We show with probability that all have a cover number of at least or a success occurred after further generations. In the former case we have that and all other have cover number and hence, the cover number of all is also bounded by . Depending on the value of , we consider two cases where denotes the Pareto front of -OMM.
Case 1: Let (i.e. ). Fix , denote by its cover number and for let be a random variable that counts the number of generations with . Then the number of generations until a success occurs or the cover number of is at least is at most . Note that can be increased by choosing an individual with as parent and flipping no bits (prob. ). Hence, the probability of increasing in one generation is at least . Hence, is stochastically dominated by an independent sum of geometrically distributed random variables with parameter . Then and hence, by Theorem 1 in (Witt14), we obtain for , and the inequality and for we obtain . By a union bound on different Pareto-optimal vectors, we see that with probability at most a success occurred or the cover number of all is at least after further generations.
Case 2: Suppose that (i.e. ). Fix , and let denote the number of individuals such that at generation . Then with probability after further generations by Case 1, a success occurred or the cover number of is at least . Suppose the latter (otherwise the statement of the lemma holds) and let be the number of newly created individuals with fitness vector in generation . We have : A generation consists of independent trials and in each trial, with probability at least , an individual with is selected as the parent, and during mutation, no bit is flipped with probability at least . Hence, by a classical Chernoff bound, . Hence, with probability at least , we have . In other words, increases by a factor of at least unless the value has already been reached. Note that such generations in succession are sufficient to reach a cover number of of at least or a success occured, since . Moreover, such a sequence of generations occurs with probability at least by a union bound on all these generations. By a further union bound on all , we see that after generations a success occurred or the cover number of each is at least with probability .
Hence, in any case we see that with probability , after generations, each has cover number at most with probability . If this does not happen, we repeat the arguments from either Case 1 or Case 2 for another period of generations, including the preceding phase from Lemma 3 to cover if necessary. The expected number of periods is , concluding the proof. ∎
Controlling the Exploration of Search Points: First, we bound the spread of solutions in generations.
Lemma 5.
Consider NSGA-III on -OMM under the same conditions as in Lemma 1. Suppose that and let be a constant. Then, after generations, there is no with with probability .
Proof.
Let . By a classical Chernoff bound each individual satisfies with probability after initialization. Suppose that this happens. Then and therefore, in order to create an individual with within generations, it is necessary that reaches , particularly decreases by at least in one such iteration. This requires that at least many zero bits are flipped simultaneously in one individual. The latter happens with probability at most in one single trial where the last inequality is due to Stirling’s formula. By a union bound on at most mutation steps after generations, we see that the probability is to decrease by at least one time within generations (since ), concluding the proof. ∎
We now bound the exploration of search points on an interval of the form for constants towards the all-one string by providing a lower bound on the number of generations required to traverse this interval.
Lemma 6.
Consider NSGA-III on -OMM under the same conditions as in Lemma 1. Let be two constants. Assume that the maximum cover number is at most . Suppose every satisfies . Then with probability NSGA-III requires more than generations to create an individual with . Hence, the expected number of generations is at least .
Proof.
Consider . If all individuals satisfy (which is the case at the beginning) then and we created an with if . At first we bound the probability to increase by at least in one generation from above as follows. In one single trial for each one can choose an with (i.e. ) (prob. at most ) and then flip zero bits (prob. at most ). By a union bound on all , we obtain that increases by at least in a single trial with probability at most (where we used and ). Hence, by a union bound on single trials, we obtain the inequality (since as well as for all is satisfied). Again by a union bound, changed by at least after generations with probability . So we assume that is never changed by at least and for and natural let be the random variable which counts the number of generations with . Now, for we justify that stochastically dominates a geometrically distributed random variable with success probability :
A necessary condition that leaves is that increases by one in a generation which happens with probability at most (by choosing a parent with and then flipping zero bits for ). For the last inequality we used for all and sufficiently large, and . Then apply a union bound on trials to finish the justification.
This implies that for the number of generations until (which is at least ) stochastically dominates the independent sum of geometrically distributed random variables . Note also that and therefore for . Therefore, we obtain for sufficiently large. Then, under the condition that never changes by at least within generations, we see by Theorem 1 in (Witt14) that for and (due to ) the inequality holds. This proves the lemma with the law of total probability. ∎
A Lower Runtime Bound
In this section, we establish the desired lower bound on the runtime of NSGA-III on the -OMM problem by putting together the results from the previous section.
Theorem 7.
Consider NSGA-III on -OMM under the same conditions as in Lemma 1. Further suppose that for a constant . Then the expected number of generations to cover the whole Pareto front is at least .
Proof.
Fix a constant such that for sufficiently large. At first we see by Lemma 3 that with probability there is no individual in with within generations. Further, by Lemma 4 on we obtain that after generations the maximum cover number is at most with probability . Suppose that this happens. We now apply Lemma 6 with and to obtain with probability that after further generations for , no solution with is created. If this happens, apply Lemma 4 on that number of generations for (note that ) to obtain for that with probability the maximum cover number is at most for sufficiently large (the latter equality holds due to ).
Suppose that these two happen. In the following, we iteratively reduce the maximum cover number as the population approaches the extreme solution . To this end, let and suppose that for there are constants and such that after generations, no solution with is created, that and the maximum cover number is at most (where the case already occurred). Now fix a further constant with . Then again by Lemma 6 we see that with probability in generations for no solution with is created. After this time, by Lemma 4 on , the maximum cover number is at most for with probability . If , we increase by one and repeat this argument. We stop when . Since , we have at most such repetitions. After the last repetition we have that . Hence, by applying a union bound on all repetitions, we conclude that with probability , there exists a generation such that no individual with is created and the maximum cover number is at most for a constant . Suppose this event occurs, and apply Lemma 6 once more with and . This yields that, after generations in expectation (from time onward), a search point with is created, concluding the proof. ∎
An Improved Upper Runtime Bound
To complement our analysis, we establish an improved upper bound on the expected runtime of NSGA-III on -OMM for a constant number of objectives . Our approach closely follows the methodology provided by (OprisNSGAIII), with the added consideration of the cover number.
Theorem 8.
Consider NSGA-III on for a constant number of objectives under the same conditions as in Lemma 1 with population size . Then a Pareto-optimal set of is found in expected generations or, in other words, in expected fitness evaluations.
Proof.
We can assume that and since otherwise the bound from Theorem 5.2 in (OprisNSGAIII) of expected generations holds. Fix a vector on the Pareto front. We estimate the probability not to cover after generations. For each generation let . Note that and that we have covered if . Further, by Lemma 1(1), cannot increase. Let be with . We first increase the cover number of to (which, by Lemma 1(2), can only decrease if it exceeds , and even then not below this value), and then proceed to decrease . The latter then happens with probability at least in one single trial and hence, with probability at least in one generation. Hence, the time until is stochastically dominated by an independent sum of geometrically distributed random variables , ( and for ) with success probability and respectively (compare also with the proof of Lemma 4 for the latter). Let and . We have for sufficiently large and . By Theorem 1 in (Witt14) we obtain for , , and that since and . Further, we see for , and that These two inequalities imply By a union bound on all possible , the probability that there is a fitness vector such that does not contain a Pareto-optimal solution with after generations is at most . If this does not happen we repeat all the above arguments. Note that in expectation, such periods are sufficient. ∎
This result improves the corresponding upper bound from (OprisNSGAIII) by a factor of both in terms of generations and fitness evaluations if . Along with Theorem 7, we see in the case that for for a constant the full Pareto front is covered in expected generations, which is a tight runtime bound.
Conclusions
In this paper, we analyzed the widely used NSGA-III algorithm on the simple -OMM problem and established lower runtime bounds for , as well as improved upper runtime bounds for a constant number of objectives compared to (OprisNSGAIII). For , this leads to a tight runtime bound when employing a superconstant yet carefully chosen population size . In this setting, NSGA-III even outperforms NSGA-II, due to its ability to distribute solutions very evenly across the Pareto front. This is very surprising, since the latter is the state of the art algorithm for bi-objective problems (with around 60000 citations). Unlike previous work (opris2025multimodal), where NSGA-III’s dynamics were analyzed on -OJZJ by first exploring the local optima, and then spreading the solutions evenly across the Pareto front, our analysis required a more refined investigation of the population dynamics. In particular, we bound the maximum cover number during the exploration process toward the all-ones string in several stages, where the spread of solutions is not hindered by local optima. These insights provide a deeper understanding of the strengths and limitations of NSGA-III and may serve as a foundation for analyzing its behavior on more complex fitness landscapes. Ultimately, this understanding can aid practitioners in developing enhanced versions of the algorithm with improved performance for efficiently optimizing problems defined by diverse and rugged fitness landscapes. Future research directions may include bounding the maximum cover number on benchmark problems, where it is necessary to reach the Pareto front at a first glance, as well as applying the insights on population dynamics to practical scheduling and graph problems.