1 Advanced Microeconomic Theory 3rd Ed. JEHLE RENY
1 Advanced Microeconomic Theory 3rd Ed. JEHLE RENY
JEHLE
THIRD EDITION
ADVANCED MICROECONOMIC THEORY
PHILIP J. RENY ADVANCED PHILIP J. RENY
MICROECONOMIC
THEORY
THIRD EDITION
ADVANCED
Advanced Microeconomic Theory remains a rigorous, up-to-date standard in microeconomics, giving
all the core mathematics and modern theory the advanced student must master.
Long known for careful development of complex theory, together with clear, patient explanation, this
MICROECONOMIC
student-friendly text, with its efficient theorem-proof organisation, and many examples and exercises,
is uniquely effective in advanced courses.
THEORY
New in this edition
• General equilibrium with contingent commodities
• Expanded treatment of social choice, with a simplified proof of Arrow’s theorem and
complete, step-by-step development of the Gibbard – Satterthwaite theorem
THIRD EDITION
PHILIP J. RENY
GEOFFREY A. JEHLE
• Extensive development of Bayesian games
• New section on efficient mechanism design in the quasi-linear utility, private values
environment. The most complete and easy-to-follow presentation of any text.
• Over fifty new exercises
Essential reading for students at Masters level, those beginning a Ph.D and advanced undergraduates.
A book every professional economist wants in their collection.
Cover photograph
© Getty Images
www.pearson-books.com
G E O F F R E Y A. J E H L E
Vassar College
P H I L I P J. R E N Y
University of Chicago
Pearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE
England
c Geoffrey A. Jehle and Philip J. Reny 2011
The rights of Geoffrey A. Jehle and Philip J. Reny to be identified as author of this work have been asserted by
them in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the
prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom
issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS.
ISBN: 978-0-273-73191-7
10 9 8 7 6 5 4 3 2 1
14 13 12 11
PREFACE xv
PART I
ECONOMIC AGENTS 1
1.6 Exercises 63
viii CONTENTS
PART II
MARKETS AND WELFARE 163
PART III
STRATEGIC BEHAVIOUR 303
REFERENCES 641
INDEX 645
PR E FA C E
In the first two chapters of this volume, we will explore the essential features of modern
consumer theory – a bedrock foundation on which so many theoretical structures in eco-
nomics are built. Some time later in your study of economics, you will begin to notice just
how central this theory is to the economist’s way of thinking. Time and time again you
will hear the echoes of consumer theory in virtually every branch of the discipline – how
it is conceived, how it is constructed, and how it is applied.
bundle x ∈ X is thus represented by a point x ∈ Rn+ . Usually, we’ll simplify things and just
think of the consumption set as the entire non-negative orthant, X = Rn+ . In this case, it is
easy to see that each of the following basic requirements is satisfied.
The notion of a feasible set is likewise very straightforward. We let B represent all
those alternative consumption plans that are both conceivable and, more important, realis-
tically obtainable given the consumer’s circumstances. What we intend to capture here are
precisely those alternatives that are achievable given the economic realities the consumer
faces. The feasible set B then is that subset of the consumption set X that remains after we
have accounted for any constraints on the consumer’s access to commodities due to the
practical, institutional, or economic realities of the world. How we specify those realities
in a given situation will determine the precise configuration and additional properties that
B must have. For now, we will simply say that B ⊂ X.
A preference relation typically specifies the limits, if any, on the consumer’s ability
to perceive in situations involving choice the form of consistency or inconsistency in the
consumer’s choices, and information about the consumer’s tastes for the different objects
of choice. The preference relation plays a crucial role in any theory of choice. Its spe-
cial form in the theory of consumer behaviour is sufficiently subtle to warrant special
examination in the next section.
Finally, the model is ‘closed’ by specifying some behavioural assumption. This
expresses the guiding principle the consumer uses to make final choices and so identifies
the ultimate objectives in choice. It is supposed that the consumer seeks to identify and
select an available alternative that is most preferred in the light of his personal tastes.
accepted as a psychological ‘law’, and early statements of the Law of Demand depended
on it. These are awfully strong assumptions about the inner workings of human beings.
The more recent history of consumer theory has been marked by a drive to render its
foundations as general as possible. Economists have sought to pare away as many of the
traditional assumptions, explicit or implicit, as they could and still retain a coherent theory
with predictive power. Pareto (1896) can be credited with suspecting that the idea of a
measurable ‘utility’ was inessential to the theory of demand. Slutsky (1915) undertook the
first systematic examination of demand theory without the concept of a measurable sub-
stance called utility. Hicks (1939) demonstrated that the Principle of Diminishing Marginal
Utility was neither necessary, nor sufficient, for the Law of Demand to hold. Finally,
Debreu (1959) completed the reduction of standard consumer theory to those bare essen-
tials we will consider here. Today’s theory bears close and important relations to its earlier
ancestors, but it is leaner, more precise, and more general.
alternatives at a time, the assumption of transitivity requires that those pairwise compar-
isons be linked together in a consistent way. At first brush, requiring that the evaluation of
alternatives be transitive seems simple and only natural. Indeed, were they not transitive,
our instincts would tell us that there was something peculiar about them. Nonetheless, this
is a controversial axiom. Experiments have shown that in various situations, the choices
of real human beings are not always transitive. Nonetheless, we will retain it in our
description of the consumer, though not without some slight trepidation.
These two axioms together imply that the consumer can completely rank any finite
number of elements in the consumption set, X, from best to worst, possibly with some ties.
(Try to prove this.) We summarise the view that preferences enable the consumer to con-
struct such a ranking by saying that those preferences can be represented by a preference
relation.
There are two additional relations that we will use in our discussion of consumer
preferences. Each is determined by the preference relation, , and they formalise the
notions of strict preference and indifference.
The relation is called the strict preference relation induced by , or simply the strict
preference relation when is clear. The phrase x1 x2 is read, ‘x1 is strictly preferred
to x2 ’.
The relation ∼ is called the indifference relation induced by , or simply the indifference
relation when is clear. The phrase x1 ∼ x2 is read, ‘x1 is indifferent to x2 ’.
Building on the underlying definition of the preference relation, both the strict prefer-
ence relation and the indifference relation capture the usual sense in which the terms ‘strict
preference’ and ‘indifference’ are used in ordinary language. Because each is derived from
CONSUMER THEORY 7
the preference relation, each can be expected to share some of its properties. Some, yes,
but not all. In general, both are transitive and neither is complete.
Using these two supplementary relations, we can establish something very concrete
about the consumer’s ranking of any two alternatives. For any pair x1 and x2 , exactly one
of three mutually exclusive possibilities holds: x1 x2 , or x2 x1 , or x1 ∼ x2 .
To this point, we have simply managed to formalise the requirement that prefer-
ences reflect an ability to make choices and display a certain kind of consistency. Let us
consider how we might describe graphically a set of preferences satisfying just those first
few axioms. To that end, and also because of their usefulness later on, we will use the
preference relation to define some related sets. These sets focus on a single alternative in
the consumption set and examine the ranking of all other alternatives relative to it.
one of three mutually exclusive categories relative to x0 ; every other point is worse than x0 ,
indifferent to x0 , or preferred to x0 . Thus, for any bundle x0 the three sets ≺ (x0 ), ∼ (x0 ),
and (x0 ) partition the consumption set.
The preferences in Fig. 1.1 may seem rather odd. They possess only the most limited
structure, yet they are entirely consistent with and allowed for by the first two axioms
alone. Nothing assumed so far prohibits any of the ‘irregularities’ depicted there, such as
the ‘thick’ indifference zones, or the ‘gaps’ and ‘curves’ within the indifference set ∼ (x0 ).
Such things can be ruled out only by imposing additional requirements on preferences.
We shall consider several new assumptions on preferences. One has very little
behavioural significance and speaks almost exclusively to the purely mathematical aspects
of representing preferences; the others speak directly to the issue of consumer tastes over
objects in the consumption set.
The first is an axiom whose only effect is to impose a kind of topological regularity
on preferences, and whose primary contribution will become clear a bit later.
From now on we explicitly set X = Rn+ .
AXIOM 3: Continuity. For all x ∈ Rn+ , the ‘at least as good as’ set, (x), and the ‘no
better than’ set, (x), are closed in Rn+ .
Recall that a set is closed in a particular domain if its complement is open in that
domain. Thus, to say that (x) is closed in Rn+ is to say that its complement, ≺ (x), is
open in Rn+ .
The continuity axiom guarantees that sudden preference reversals do not occur.
Indeed, the continuity axiom can be equivalently expressed by saying that if each element
yn of a sequence of bundles is at least as good as (no better than) x, and yn converges to y,
then y is at least as good as (no better than) x. Note that because (x) and (x) are closed,
so, too, is ∼ (x) because the latter is the intersection of the former two. Consequently,
Axiom 3 rules out the open area in the indifference set depicted in the north-west of
Fig. 1.1.
Additional assumptions on tastes lend the greater structure and regularity to prefer-
ences that you are probably familiar with from earlier economics classes. Assumptions of
this sort must be selected for their appropriateness to the particular choice problem being
analysed. We will consider in turn a few key assumptions on tastes that are ordinarily
imposed in ‘standard’ consumer theory, and seek to understand the individual and collec-
tive contributions they make to the structure of preferences. Within each class of these
assumptions, we will proceed from the less restrictive to the more restrictive. We will
generally employ the more restrictive versions considered. Consequently, we let axioms
with primed numbers indicate alternatives to the norm, which are conceptually similar but
slightly less restrictive than their unprimed partners.
When representing preferences over ordinary consumption goods, we will want to
express the fundamental view that ‘wants’ are essentially unlimited. In a very weak sense,
we can express this by saying that there will always exist some adjustment in the compo-
sition of the consumer’s consumption plan that he can imagine making to give himself a
consumption plan he prefers. This adjustment may involve acquiring more of some com-
modities and less of others, or more of all commodities, or even less of all commodities.
CONSUMER THEORY 9
By this assumption, we preclude the possibility that the consumer can even imagine hav-
ing all his wants and whims for commodities completely satisfied. Formally, we state this
assumption as follows, where Bε (x0 ) denotes the open ball of radius ε centred at x0 :1
AXIOM 4’: Local Non-satiation. For all x0 ∈ Rn+ , and for all ε > 0, there exists some x ∈
Bε (x0 ) ∩ Rn+ such that x x0 .
Axiom 4 says that within any vicinity of a given point x0 , no matter how small that
vicinity is, there will always be at least one other point x that the consumer prefers to x0 .
Its effect on the structure of indifference sets is significant. It rules out the possibility of
having ‘zones of indifference’, such as that surrounding x1 in Fig. 1.2. To see this, note that
we can always find some ε > 0, and some Bε (x1 ), containing nothing but points indifferent
to x1 . This of course violates Axiom 4 , because it requires there always be at least one
point strictly preferred to x1 , regardless of the ε > 0 we choose. The preferences depicted
in Fig. 1.3 do satisfy Axiom 4 as well as Axioms 1 to 3.
A different and more demanding view of needs and wants is very common. Accor-
ding to this view, more is always better than less. Whereas local non-satiation requires
that a preferred alternative nearby always exist, it does not rule out the possibility that
the preferred alternative may involve less of some or even all commodities. Specifically,
it does not imply that giving the consumer more of everything necessarily makes that
consumer better off. The alternative view takes the position that the consumer will always
prefer a consumption plan involving more to one involving less. This is captured by the
axiom of strict monotonicity. As a matter of notation, if the bundle x0 contains at least
as much of every good as does x1 we write x0 ≥ x1 , while if x0 contains strictly more of
every good than x1 we write x0
x1 .
AXIOM 4: Strict Monotonicity. For all x0 , x1 ∈ Rn+ , if x0 ≥ x1 then x0 x1 , while if x0
x1 , then x0 x1 .
Axiom 4 says that if one bundle contains at least as much of every commodity as
another bundle, then the one is at least as good as the other. Moreover, it is strictly better
if it contains strictly more of every good. The impact on the structure of indifference and
related sets is again significant. First, it should be clear that Axiom 4 implies Axiom 4 ,
so if preferences satisfy Axiom 4, they automatically satisfy Axiom 4 . Thus, to require
Axiom 4 will have the same effects on the structure of indifference and related sets as
Axiom 4 does, plus some additional ones. In particular, Axiom 4 eliminates the possibility
that the indifference sets in R2+ ‘bend upward’, or contain positively sloped segments. It
also requires that the ‘preferred to’ sets be ‘above’ the indifference sets and that the ‘worse
than’ sets be ‘below’ them.
To help see this, consider Fig. 1.4. Under Axiom 4, no points north-east of x0 or
south-west of x0 may lie in the same indifference set as x0 . Any point north-east, such as
x1 , involves more of both goods than does x0 . All such points in the north-east quadrant
must therefore be strictly preferred to x0 . Similarly, any point in the south-west quadrant,
such as x2 , involves less of both goods. Under Axiom 4, x0 must be strictly preferred
to x2 and to all other points in the south-west quadrant, so none of these can lie in the
same indifference set as x0 . For any x0 , points north-east of the indifference set will be
contained in (x0 ), and all those south-west of the indifference set will be contained in
the set ≺ (x0 ). A set of preferences satisfying Axioms 1, 2, 3, and 4 is given in Fig. 1.5.
x1
x0
x2
x1
CONSUMER THEORY 11
The preferences in Fig. 1.5 are the closest we have seen to the kind undoubtedly
familiar to you from your previous economics classes. They still differ, however, in one
very important respect: typically, the kind of non-convex region in the north-west part of
∼ (x0 ) is explicitly ruled out. This is achieved by invoking one final assumption on tastes.
We will state two different versions of the axiom and then consider their meaning and
purpose.
AXIOM 5’: Convexity. If x1 x0 , then tx1 + (1 − t)x0 x0 for all t ∈ [0, 1].
A slightly stronger version of this is the following:
AXIOM 5: Strict Convexity. If x1 =x0 and x1 x0 , then tx1 + (1 − t)x0 x0 for all
t ∈ (0, 1).
Notice first that either Axiom 5 or Axiom 5 – in conjunction with Axioms 1, 2, 3,
and 4 – will rule out concave-to-the-origin segments in the indifference sets, such as those
in the north-west part of Fig. 1.5. To see this, choose two distinct points in the indifference
set depicted there. Because x1 and x2 are both indifferent to x0 , we clearly have x1 x2 .
Convex combinations of those two points, such as xt , will lie within ≺ (x0 ), violating the
requirements of both Axiom 5 and Axiom 5.
For the purposes of the consumer theory we shall develop, it turns out that Axiom 5
can be imposed without any loss of generality. The predictive content of the theory would
be the same with or without it. Although the same statement does not quite hold for the
slightly stronger Axiom 5, it does greatly simplify the analysis.
There are at least two ways we can intuitively understand the implications of con-
vexity for consumer tastes. The preferences depicted in Fig. 1.6 are consistent with both
Axiom 5 and Axiom 5. Again, suppose we choose x1 ∼ x2 . Point x1 represents a bun-
dle containing a proportion of the good x2 which is relatively ‘extreme’, compared to the
proportion of x2 in the other bundle x2 . The bundle x2 , by contrast, contains a propor-
tion of the other good, x1 , which is relatively extreme compared to that contained in x1 .
Although each contains a relatively high proportion of one good compared to the other,
the consumer is indifferent between the two bundles. Now, any convex combination of
x1 and x2 , such as xt , will be a bundle containing a more ‘balanced’ combination of x1
12 CHAPTER 1
x1
xt
x0
x2
x1
and x2 than does either ‘extreme’ bundle x1 or x2 . The thrust of Axiom 5 or Axiom 5 is
to forbid the consumer from preferring such extremes in consumption. Axiom 5 requires
that any such relatively balanced bundle as xt be no worse than either of the two extremes
between which the consumer is indifferent. Axiom 5 goes a bit further and requires that the
consumer strictly prefer any such relatively balanced consumption bundle to both of the
extremes between which he is indifferent. In either case, some degree of ‘bias’ in favour
of balance in consumption is required of the consumer’s tastes.
Another way to describe the implications of convexity for consumers’ tastes focuses
attention on the ‘curvature’ of the indifference sets themselves. When X = R2+ , the (abso-
lute value of the) slope of an indifference curve is called the marginal rate of substitution
of good two for good one. This slope measures, at any point, the rate at which the con-
sumer is just willing to give up good two per unit of good one received. Thus, the consumer
is indifferent after the exchange.
If preferences are strictly monotonic, any form of convexity requires the indifference
curves to be at least weakly convex-shaped relative to the origin. This is equivalent to
requiring that the marginal rate of substitution not increase as we move from bundles
such as x1 towards bundles such as x2 . Loosely, this means that the consumer is no more
willing to give up x2 in exchange for x1 when he has relatively little x2 and much x1 than
he is when he has relatively much x2 and little x1 . Axiom 5 requires the rate at which the
consumer would trade x2 for x1 and remain indifferent to be either constant or decreasing
as we move from north-west to south-east along an indifference curve. Axiom 5 goes a
bit further and requires that the rate be strictly diminishing. The preferences in Fig. 1.6
display this property, sometimes called the principle of diminishing marginal rate of
substitution in consumption.
We have taken some care to consider a number of axioms describing consumer pref-
erences. Our goal has been to gain some appreciation of their individual and collective
implications for the structure and representation of consumer preferences. We can sum-
marise this discussion rather briefly. The axioms on consumer preferences may be roughly
classified in the following way. The axioms of completeness and transitivity describe a
consumer who can make consistent comparisons among alternatives. The axiom of conti-
nuity is intended to guarantee the existence of topologically nice ‘at least as good as’ and
CONSUMER THEORY 13
‘no better than’ sets, and its purpose is primarily a mathematical one. All other axioms
serve to characterise consumers’ tastes over the objects of choice. Typically, we require
that tastes display some form of non-satiation, either weak or strong, and some bias in
favour of balance in consumption, either weak or strong.
2 See, for example, Barten and Böhm (1982). The classic reference is Debreu (1954).
14 CHAPTER 1
require it simultaneously simplifies the purely mathematical aspects of the problem and
increases the intuitive content of the proof. Notice, however, that we will not require any
form of convexity.
Notice carefully that this is only an existence theorem. It simply claims that under the
conditions stated, at least one continuous real-valued function representing the preference
relation is guaranteed to exist. There may be, and in fact there always will be, more than
one such function. The theorem itself, however, makes no statement on how many more
there are, nor does it indicate in any way what form any of them must take. Therefore, if we
can dream up just one function that is continuous and that represents the given preferences,
we will have proved the theorem. This is the strategy we will adopt in the following proof.
Proof: Let the relation be complete, transitive, continuous, and strictly monotonic. Let
e ≡ (1, . . . , 1) ∈ Rn+ be a vector of ones, and consider the mapping u : Rn+ →R defined so
that the following condition is satisfied:3
u(x)e ∼ x. (P.1)
Let us first make sure we understand what this says and how it works. In words, (P.1)
says, ‘take any x in the domain Rn+ and assign to it the number u(x) such that the bundle,
u(x)e, with u(x) units of every commodity is ranked indifferent to x’.
Two questions immediately arise. First, does there always exist a number u(x)
satisfying (P.1)? Second, is it uniquely determined, so that u(x) is a well-defined function?
To settle the first question, fix x ∈ Rn+ and consider the following two subsets of real
numbers:
A ≡ {t ≥ 0 | te x}
B ≡ {t ≥ 0 | te x}.
3 For t ≥ 0, the vector te will be some point in Rn+ each of whose coordinates is equal to the number t,
because te = t(1, . . . , 1) = (t, . . . , t). If t = 0, then te = (0, . . . , 0) coincides with the origin. If t = 1, then
te = (1, . . . , 1) coincides with e. If t > 1, the point te lies farther out from the origin than e. For 0<t<1, the
point te lies between the origin and e. It should be clear that for any choice of t ≥ 0, te will be a point in Rn+
somewhere on the ray from the origin through e, i.e., some point on the 45◦ line in Fig. 1.7.
CONSUMER THEORY 15
u(x) u(x)e
e x
1
⬃ (x)
45
x1
1 u(x)
According to Exercise 1.11, the continuity of implies that both A and B are closed
in R+ . Also, by strict monotonicity, t ∈ A implies t ∈ A for all t ≥ t. Consequently,
A must be a closed interval of the form [t, ∞). Similarly, strict monotonicity and the
closedness of B in R+ imply that B must be a closed interval of the form [0, t̄]. Now
for any t ≥ 0, completeness of implies that either te x or te x, that is, t ∈ A ∪ B. But
this means that R+ = A ∪ B = [0, t̄] ∪ [t, ∞]. We conclude that t ≤ t̄ so that A ∩ B =∅.
We now turn to the second question. We must show that there is only one number
t ≥ 0 such that te ∼ x. But this follows easily because if t1 e ∼ x and t2 e ∼ x, then by the
transitivity of ∼ (see Exercise 1.4), t1 e ∼ t2 e. So, by strict monotonicity, it must be the
case that t1 = t2 .
We conclude that for every x ∈ Rn+ , there is exactly one number, u(x), such that (P.1)
is satisfied. Having constructed a utility function assigning each bundle in X a number, we
show next that this utility function represents the preferences .
Consider two bundles x1 and x2 , and their associated utility numbers u(x1 ) and
u(x ), which by definition satisfy u(x1 )e ∼ x1 and u(x2 )e ∼ x2 . Then we have the
2
following:
x1 x 2 (P.2)
⇐⇒ u(x1 )e ∼ x1 x2 ∼ u(x2 )e (P.3)
⇐⇒ u(x1 )e u(x2 )e (P.4)
⇐⇒ u(x1 ) ≥ u(x2 ). (P.5)
Here (P.2) ⇐⇒ (P.3) follows by definition of u; (P.3) ⇐⇒ (P.4) follows from the transi-
tivity of , the transitivity of ∼, and the definition of u; and (P.4) ⇐⇒ (P.5) follows from
the strict monotonicity of . Together, (P.2) through (P.5) imply that (P.2) ⇐⇒ (P.5), so
that x1 x2 if and only if u(x1 ) ≥ u(x2 ), as we sought to show.
It remains only to show that the utility function u : Rn+ →R representing is con-
tinuous. By Theorem A1.6, it suffices to show that the inverse image under u of every
16 CHAPTER 1
open ball in R is open in Rn+ . Because open balls in R are merely open intervals, this is
equivalent to showing that u−1 ((a, b)) is open in Rn+ for every a<b.
Now,
The first equality follows from the definition of the inverse image; the second from the
monotonicity of ; and the third from u(x)e ∼ x and Exercise 1.4. Rewriting the last set
on the right-hand side gives
u−1 ((a, b)) = (ae) ≺ (be). (P.6)
By the continuity of , the sets (ae) and (be) are closed in X = Rn+ .
Consequently, the two sets on the right-hand side of (P.6), being the complements of these
closed sets, are open in Rn+ . Therefore, u−1 ((a, b)), being the intersection of two open sets
in Rn+ , is, by Exercise A1.28, itself open in Rn+ .
explicit warning here that no significance whatsoever can be attached to the actual num-
bers assigned by a given utility function to particular bundles – only to the ordering of
those numbers.4 This conclusion, though simple to demonstrate, is nonetheless important
enough to warrant being stated formally. The proof is left as an exercise.
Typically, we will want to make some assumptions on tastes to complete the descrip-
tion of consumer preferences. Naturally enough, any additional structure we impose on
preferences will be reflected as additional structure on the utility function representing
them. By the same token, whenever we assume the utility function to have properties
beyond continuity, we will in effect be invoking some set of additional assumptions on the
underlying preference relation. There is, then, an equivalence between axioms on tastes
and specific mathematical properties of the utility function. We will conclude this section
by briefly noting some of them. The following theorem is exceedingly simple to prove
because it follows easily from the definitions involved. It is worth being convinced, how-
ever, so its proof is left as an exercise. (See Chapter A1 in the Mathematical Appendix for
definitions of strictly increasing, quasiconcave, and strictly quasiconcave functions.)
Later we will want to analyse problems using calculus tools. Until now, we have con-
centrated on the continuity of the utility function and properties of the preference relation
that ensure it. Differentiability, of course, is a more demanding requirement than con-
tinuity. Intuitively, continuity requires there be no sudden preference reversals. It does
not rule out ‘kinks’ or other kinds of continuous, but impolite behaviour. Differentiability
specifically excludes such things and ensures indifference curves are ‘smooth’ as well as
continuous. Differentiability of the utility function thus requires a stronger restriction on
4 Some theorists are so sensitive to the potential confusion between the modern usage of the term ‘utility func-
tion’ and the classical utilitarian notion of ‘utility’ as a measurable quantity of pleasure or pain that they reject
the anachronistic terminology altogether and simply speak of preference relations and their ‘representation
functions’.
18 CHAPTER 1
preferences than continuity. Like the axiom of continuity, what is needed is just the right
mathematical condition. We shall not develop this condition here, but refer the reader to
Debreu (1972) for the details. For our purposes, we are content to simply assume that the
utility representation is differentiable whenever necessary.
There is a certain vocabulary we use when utility is differentiable, so we should learn
it. The first-order partial derivative of u(x) with respect to xi is called the marginal utility
of good i. For the case of two goods, we defined the marginal rate of substitution of good
2 for good 1 as the absolute value of the slope of an indifference curve. We can derive
an expression for this in terms of the two goods’ marginal utilities. To see this, consider
any bundle x1 = (x11 , x21 ). Because the indifference curve through x1 is just a function in
the (x1 , x2 ) plane, let x2 = f (x1 ) be the function describing it. Therefore, as x1 varies, the
bundle (x1 , x2 ) = (x1 , f (x1 )) traces out the indifference curve through x1 . Consequently,
for all x1 ,
Now the marginal rate of substitution of good two for good one at the bundle x1 = (x11 , x21 ),
denoted MRS12 (x11 , x21 ), is the absolute value of the slope of the indifference curve through
(x11 , x21 ). That is,
MRS12 x11 , x21 ≡ f x11 = −f x11 , (1.2)
because f <0. But by (1.1), u(x1 , f (x1 )) is a constant function of x1 . Hence, its derivative
with respect to x1 must be zero. That is,
∂u(x1 , x2 ) ∂u(x1 , x2 )
+ f (x1 ) = 0. (1.3)
∂x1 ∂x2
∂u(x1 )/∂x1
MRS12 (x1 ) = .
∂u(x1 )/∂x2
Similarly, when there are more than two goods we define the marginal rate of
substitution of good j for good i as the ratio of their marginal utilities,
∂u(x)/∂xi
MRSij (x) ≡ .
∂u(x)/∂xj
When marginal utilities are strictly positive, the MRSij (x) is again a positive number, and
it tells us the rate at which good j can be exchanged per unit of good i with no change in
the consumer’s utility.
When u(x) is continuously differentiable on Rn++ and preferences are strictly mono-
tonic, the marginal utility of every good is virtually always strictly positive. That is,
CONSUMER THEORY 19
∂u(x)/∂xi > 0 for ‘almost all’ bundles x, and all i = 1, . . . , n.5 When preferences are
strictly convex, the marginal rate of substitution between two goods is always strictly
diminishing along any level surface of the utility function. More generally, for any
quasiconcave utility function, its Hessian matrix H(x) of second-order partials will satisfy
If the inequality is strict, this says that moving from x in a direction y that is tangent to the
indifference surface through x [i.e., ∇u(x) · y = 0] reduces utility (i.e., yT H(x)y<0).
To make further progress, we make the following assumptions that will be maintained
unless stated otherwise.
5 In case the reader is curious, the term ‘almost all’ means all bundles except a set having Lebesgue measure zero.
However, there is no need to be familiar with Lebesgue measure to see that some such qualifier is necessary.
Consider the case of a single good, x, and the utility function u(x) = x + sin(x). Because u is strictly increasing,
20 CHAPTER 1
x1
Next, we consider the consumer’s circumstances and structure the feasible set. Our
concern is with an individual consumer operating within a market economy. By a market
economy, we mean an economic system in which transactions between agents are mediated
by markets. There is a market for each commodity, and in these markets, a price pi prevails
for each commodity i. We suppose that prices are strictly positive, so pi > 0, i = 1, . . . , n.
Moreover, we assume the individual consumer is an insignificant force on every market. By
this we mean, specifically, that the size of each market relative to the potential purchases
of the individual consumer is so large that no matter how much or how little the consumer
might purchase, there will be no perceptible effect on any market price. Formally, this
means we take the vector of market prices, p
0, as fixed from the consumer’s point
of view.
The consumer is endowed with a fixed money income y ≥ 0. Because the purchase
of xi units of commodity i at price pi per unit requires an expenditure
of pi xi dollars, the
requirement that expenditure not exceed income can be stated as ni=1 pi xi ≤ y or, more
compactly, as p · x ≤ y. We summarise these assumptions on the economic environment
of the consumer by specifying the following structure on the feasible set, B, called the
budget set:
B = {x | x ∈ Rn+ , p · x ≤ y}.
In the two-good case, B consists of all bundles lying inside or on the boundaries of the
shaded region in Fig. 1.9.
If we want to, we can now recast the consumer’s problem in very familiar terms.
Under Assumption 1.2, preferences may be represented by a strictly increasing and strictly
quasiconcave utility function u(x) on the consumption set Rn+ . Under our assumptions on
the feasible set, total expenditure must not exceed income. The consumer’s problem (1.4)
can thus be cast equivalently as the problem of maximising the utility function subject to
it represents strictly monotonic preferences. However, although u (x) is strictly positive for most values of x, it is
zero whenever x = π + 2πk, k = 0, 1, 2, . . .
CONSUMER THEORY 21
B p 1 p 2
x1
y/p 1
Note that if x∗ solves this problem, then u(x∗ ) ≥ u(x) for all x ∈ B, which means that
x∗ x for all x ∈ B. That is, solutions to (1.5) are indeed solutions to (1.4). The converse
is also true.
We should take a moment to examine the mathematical structure of this problem.
As we have noted, under the assumptions on preferences, the utility function u(x) is real-
valued and continuous. The budget set B is a non-empty (it contains 0 ∈ Rn+ ), closed,
bounded (because all prices are strictly positive), and thus compact subset of Rn . By
the Weierstrass theorem, Theorem A1.10, we are therefore assured that a maximum of
u(x) over B exists. Moreover, because B is convex and the objective function is strictly
quasiconcave, the maximiser of u(x) over B is unique. Because preferences are strictly
monotonic, the solution x∗ will satisfy the budget constraint with equality, lying on, rather
than inside, the boundary of the budget set. Thus, when y > 0 and because x∗ ≥ 0, but
x∗ = 0, we know that xi∗ > 0 for at least one good i. A typical solution to this problem in
the two-good case is illustrated in Fig. 1.10.
Clearly, the solution vector x∗ depends on the parameters to the consumer’s problem.
Because it will be unique for given values of p and y, we can properly view the solution to
(1.5) as a function from the set of prices and income to the set of quantities, X = Rn+ . We
therefore will often write xi∗ = xi (p, y), i = 1, . . . , n, or, in vector notation, x∗ = x(p, y).
When viewed as functions of p and y, the solutions to the utility-maximisation problem are
known as ordinary, or Marshallian demand functions. When income and all prices other
than the good’s own price are held fixed, the graph of the relationship between quantity
demanded of xi and its own price pi is the standard demand curve for good i.
The relationship between the consumer’s problem and consumer demand behaviour
is illustrated in Fig. 1.11. In Fig. 1.11(a), the consumer faces prices p01 and p02 and has
income y0 . Quantities x1 (p01 , p02 , y0 ) and x2 (p01 , p02 , y0 ) solve the consumer’s problem and
22 CHAPTER 1
x *2 x*
x1
x 1* y/p 1
x2
y 0/p 20
x1
0 0
x 1(p1 , p2, y0) x 1(p11, p 20, y0)
p10/p20 p11/p 20
(a)
p1
p10
p11
0
x 1(p1, p2, y0)
x1
0 0
x 1(p1 , p2, y0) x 1(p11, p 20, y0)
(b)
maximise utility facing those prices and income. Directly below, in Fig. 1.11(b), we mea-
sure the price of good 1 on the vertical axis and the quantity demanded of good 1 on
the horizontal axis. If we plot the price p01 against the quantity of good 1 demanded at
that price (given the price p02 and income y0 ), we obtain one point on the consumer’s
CONSUMER THEORY 23
Marshallian demand curve for good 1. At the same income and price of good 2, facing
p11 < p01 , the quantities x1 (p11 , p02 , y0 ) and x2 (p11 , p02 , y0 ) solve the consumer’s problem and
maximise utility. If we plot p11 against the quantity of good 1 demanded at that price, we
obtain another point on the Marshallian demand curve for good 1 in Fig. 1.11(b). By con-
sidering all possible values for p1 , we trace out the consumer’s entire demand curve for
good 1 in Fig. 1.11(b). As you can easily verify, different levels of income and different
prices of good 2 will cause the position and shape of the demand curve for good 1 to
change. That position and shape, however, will always be determined by the properties of
the consumer’s underlying preference relation.
If we strengthen the requirements on u(x) to include differentiability, we can
use calculus methods to further explore demand behaviour. Recall that the consumer’s
problem is
Assuming that the solution x∗ is strictly positive, we can apply Kuhn-Tucker methods to
characterise it. If x∗
0 solves (1.6), then by Theorem A2.20, there exists a λ∗ ≥ 0 such
that (x∗ , λ∗ ) satisfy the following Kuhn-Tucker conditions:
∂L ∂u(x∗ )
= − λ∗ pi = 0, i = 1, . . . , n, (1.7)
∂xi ∂xi
p · x∗ − y ≤ 0, (1.8)
λ∗ p · x∗ − y = 0. (1.9)
Now, by strict monotonicity, (1.8) must be satisfied with equality, so that (1.9)
becomes redundant. Consequently, these conditions reduce to
∂L ∂u(x∗ )
= − λ∗ p1 = 0,
∂x1 ∂x1
..
.
∂L ∂u(x∗ )
= − λ∗ pn = 0, (1.10)
∂xn ∂xn
p · x∗ − y = 0.
24 CHAPTER 1
What do these tell us about the solution to (1.6)? There are two possibilities. Either
∇u(x∗ ) = 0 or ∇u(x∗ ) =0. Under strict monotonicity, the first case is possible, but quite
unlikely. We shall simply assume therefore that ∇u(x∗ ) =0. Thus, by strict monotonic-
ity, ∂u(x∗ )/∂xi > 0, for some i = 1, . . . , n. Because pi > 0 for all i, it is clear from
(1.7) that the Lagrangian multiplier will be strictly positive at the solution, because
λ∗ = ui (x∗ )/pi > 0. Consequently, for all j, ∂u(x∗ )/∂xj = λ∗ pj > 0, so marginal utility
is proportional to price for all goods at the optimum. Alternatively, for any two goods j
and k, we can combine the conditions to conclude that
∂u(x∗ )/∂xj pj
= . (1.11)
∂u(x∗ )/∂xk pk
This says that at the optimum, the marginal rate of substitution between any two goods
must be equal to the ratio of the goods’ prices. In the two-good case, conditions (1.10)
therefore require that the slope of the indifference curve through x∗ be equal to the slope
of the budget constraint, and that x∗ lie on, rather than inside, the budget line, as in Fig. 1.10
and Fig. 1.11(a).
In general, conditions (1.10) are merely necessary conditions for a local optimum
(see the end of Section A2.3). However, for the particular problem at hand, these necessary
first-order conditions are in fact sufficient for a global optimum. This is worthwhile stating
formally.
Proof: We shall employ the following fact that you are asked to prove in Exercise 1.28: For
all x, x1 ≥ 0, because u is quasiconcave, ∇u(x)(x1 − x) ≥ 0 whenever u(x1 ) ≥ u(x) and
u is differentiable at x.
Now, suppose that ∇u(x∗ ) exists and (x∗ , λ∗ )
0 solves (1.10). Then
∇u(x∗ ) = λ∗ p, (P.1)
∗
p · x = y. (P.2)
for some t ∈ [0, 1] close enough to one. Letting x1 = tx0 , we then have
where the first equality follows from (P.1), and the second inequality follows from (P.2)
and (P.4). However, because by (P.3) u(x1 ) > u(x∗ ), (P.5) contradicts the fact set forth at
the beginning of the proof.
Because preferences are monotonic, the budget constraint will hold with equality at
the solution. Assuming an interior solution, the Kuhn-Tucker conditions coincide with the
ordinary first-order Lagrangian conditions and the following equations must hold at the
solution values x1 , x2 , and λ:
∂L ρ ρ (1/ρ)−1 ρ−1
= x1 + x2 x1 − λp1 = 0, (E.2)
∂x1
∂L ρ ρ (1/ρ)−1 ρ−1
= x1 + x2 x2 − λp2 = 0, (E.3)
∂x2
∂L
= p1 x1 + p2 x2 − y = 0. (E.4)
∂λ
26 CHAPTER 1
Rearranging (E.2) and (E.3), then dividing the first by the second and rearranging
some more, we can reduce these three equations in three unknowns to only two equations
in the two unknowns of particular interest, x1 and x2 :
p1 1/(ρ−1)
x1 = x2 , (E.5)
p2
y = p1 x1 + p2 x2 . (E.6)
First, substitute from (E.5) for x1 in (E.6) to obtain the equation in x2 alone:
1/(ρ−1)
p1
y = p1 x2 + p2 x2
p2
1/(ρ−1)
p2 y
x2 = ρ/(ρ−1) ρ/(ρ−1)
. (E.8)
p1 + p2
1/(ρ−1)
p1 y
x1 = ρ/(ρ−1) ρ/(ρ−1)
. (E.9)
p1 + p2
Equations (E.8) and (E.9), the solutions to the consumer’s problem (E.1), are the
consumer’s Marshallian demand functions. If we define the parameter r = ρ/(ρ − 1), we
can simplify (E.8) and (E.9) and write the Marshallian demands as
pr−1
1 y
x1 (p, y) = , (E.10)
pr1 + pr2
pr−1
2 y
x2 (p, y) = . (E.11)
pr1 + pr2
Notice that the solutions to the consumer’s problem depend only on its parameters, p1 , p2 ,
and y. Different prices and income, through (E.10) and (E.11), will give different quantities
of each good demanded. To drive this point home, consider Fig. 1.12. There, at prices
p1 , p̄2 and income ȳ, the solutions to the consumer’s problem will be the quantities of x1
and x2 indicated. The pair (p1 , x1 (p1 , p̄2 , ȳ)) will be a point on (one of) the consumer’s
demand curves for good x1 .
CONSUMER THEORY 27
x2
p 2r1y/( p 1r p 2r )
x1
p1 p 1r1y/( p 1r p 2r )
p1/p 2
p1
x1(p1, p2, y)
x1
p 1r1y/( p 1r p 2r )
Finally, a word on the properties of the demand function x(p, y) derived from the
consumer’s maximisation problem. We have made enough assumptions to ensure (by
Theorem A2.21 (the theorem of the maximum)) that x(p, y) will be continuous on Rn++ .
But we shall usually want more than this. We would like to be able to consider the slopes
of demand curves and hence we would like x(p, y) to be differentiable. From this point on,
we shall simply assume that x(p, y) is differentiable whenever we need it to be. But just to
let you know what this involves, we state without proof the following result.
The function v(p, y) is called the indirect utility function. It is the maximum-value
function corresponding to the consumer’s utility maximisation problem. When u(x) is
continuous, v(p, y) is well-defined for all p
0 and y≥0 because a solution to the maximi-
sation problem (1.12) is guaranteed to exist. If, in addition, u(x) is strictly quasiconcave,
then the solution is unique and we write it as x(p, y), the consumer’s demand function. The
maximum level of utility that can be achieved when facing prices p and income y therefore
will be that which is realised when x(p, y) is chosen. Hence,
Geometrically, we can think of v(p, y) as giving the utility level of the highest indifference
curve the consumer can reach, given prices p and income y, as illustrated in Fig. 1.13.
x2
y/p 2
x (p, y)
p 1/p 2
u v (p,y)
x1
y/p 1
There are several properties that the indirect utility function will possess. Continuity
of the constraint function in p and y is sufficient to guarantee that v(p, y) will be contin-
uous in p and y on Rn++ ×R+ . (See Section A2.4.) Effectively, the continuity of v(p, y)
follows because at positive prices, ‘small changes’ in any of the parameters (p, y) fixing
the location of the budget constraint will only lead to ‘small changes’ in the maximum
level of utility the consumer can achieve. In the following theorem, we collect together a
number of additional properties of v(p, y).
∂v(p0 , y0 )/∂pi
xi (p0 , y0 ) = − , i = 1, . . . , n.
∂v(p0 , y0 )/∂y
Proof: Property 1 follows from Theorem A2.21 (the theorem of the maximum). We shall
not pursue the details.
The second property is easy to prove. We must show that v(p, y) = v(tp, ty) for all
t > 0. But v(tp, ty) = [max u(x) s.t. tp · x ≤ ty], which is clearly equivalent to [max u(x)
s.t. p · x ≤ y] because we may divide both sides of the constraint by t > 0 without affecting
the set of bundles satisfying it. (See Fig. 1.14.) Consequently, v(tp, ty) = [max u(x) s.t.
p · x ≤ y] = v(p, y).
Intuitively, properties 3 and 4 simply say that any relaxation of the consumer’s bud-
get constraint can never cause the maximum level of achievable utility to decrease, whereas
any tightening of the budget constraint can never cause that level to increase.
To prove 3 (and to practise Lagrangian methods), we shall make some additional
assumptions although property 3 can be shown to hold without them. To keep things
simple, we’ll assume for the moment that the solution to (1.12) is strictly positive and
differentiable, where (p, y)
0 and that u(·) is differentiable with ∂u(x)/∂xi > 0, for all
x
0.
As we have remarked before, because u(·) is strictly increasing, the constraint in
(1.12) must bind at the optimum. Consequently, (1.12) is equivalent to
x2
ty/tp2 = y/p 2
Figure 1.14. Homogeneity of the indirect utility function in prices and income.
Now, for (p, y)
0, let x∗ = x(p, y) solve (P.1). By our additional assumption,
x∗
0, so we may apply Lagrange’s theorem to conclude that there is a λ∗ ∈ R such
that
∂ L(x∗ , λ∗ ) ∂u(x∗ )
= − λ∗ pi = 0, i = 1, . . . , n. (P.3)
∂xi ∂xi
Note that because both pi and ∂u(x∗ )/∂xi are positive, so, too, is λ∗ .
Our additional differentiability assumptions allow us to now apply Theorem A2.22,
the Envelope theorem, to establish that v(p, y) is strictly increasing in y. According to
the Envelope theorem, the partial derivative of the maximum value function v(p, y) with
respect to y is equal to the partial derivative of the Lagrangian with respect to y evaluated
at (x∗ , λ∗ ),
∂v(p, y) ∂ L(x∗ , λ∗ )
= = λ∗ > 0. (P.4)
∂y ∂y
Thus, v(p, y) is strictly increasing in y > 0. So, because v is continuous, it is then strictly
increasing on y ≥ 0.
For property 4, one can also employ the Envelope theorem. However, we shall
give a more elementary proof that does not rely on any additional hypotheses. So con-
sider p0 ≥ p1 and let x0 solve (1.12) when p = p0 . Because x0 ≥ 0, (p0 − p1 ) · x0 ≥ 0.
Hence, p1 ·x0 ≤ p0 ·x0 ≤ y, so that x0 is feasible for (1.12) when p = p1 . We conclude that
v(p1 , y) ≥ u(x0 ) = v(p0 , y), as desired.
Property 5 says that a consumer would prefer one of any two extreme budget sets to
any average of the two. Our concern is to show that v(p, y) is quasiconvex in the vector of
prices and income (p, y). The key to the proof is to concentrate on the budget sets.
CONSUMER THEORY 31
Let B1 , B2 , and Bt be the budget sets available when prices and income are (p1 , y1 ),
(p2 , y2 ),and (pt , yt ), respectively, where pt ≡ tp1 + (1 − t)p2 and yt ≡ y1 + (1 − t)y2 .
Then,
B1 = {x | p1 · x ≤ y1 },
B2 = {x | p2 · x ≤ y2 },
Bt = {x | pt · x ≤ yt }.
Suppose we could show that every choice the consumer can possibly make when he
faces budget Bt is a choice that could have been made when he faced either budget B1 or
budget B2 . It then would be the case that every level of utility he can achieve facing Bt is a
level he could have achieved either when facing B1 or when facing B2 . Then, of course, the
maximum level of utility that he can achieve over Bt could be no larger than at least one
of the following: the maximum level of utility he can achieve over B1 , or the maximum
level of utility he can achieve over B2 . But if this is the case, then the maximum level of
utility achieved over Bt can be no greater than the largest of these two. If our supposition
is correct, therefore, we would know that
p1 ·x > y1
and
p2 ·x > y2 ,
respectively. Because t ∈ (0, 1), we can multiply the first of these by t, the second by
(1 − t), and preserve the inequalities to obtain
Adding, we obtain
or
pt ·x > yt .
∂v(p, y) ∂ L(x∗ , λ∗ )
= = −λ∗ xi∗ . (P.5)
∂pi ∂pi
∂v(p, y)/∂pi
− = xi∗ = xi (p, y),
∂v(p, y)/∂y
as desired.
EXAMPLE 1.2 In Example 1.1, the direct utility function is the CES form, u(x1 , x2 ) =
ρ ρ
(x1 + x2 )1/ρ , where 0 =ρ<1. There we found the Marshallian demands:
pr−1
1 y
x1 (p, y) = ,
p1 + pr2
r
pr−1
2 y
x2 (p, y) = , (E.1)
p1 + pr2
r
for r ≡ ρ/(ρ − 1). By (1.13), we can form the indirect utility function by sub-
stituting these back into the direct utility function. Doing that and rearranging, we
obtain
1/ρ
pr1 + pr2
=y r ρ
p1 + pr2
−1/r
= y pr1 + pr2 .
We should verify that (E.2) satisfies all the properties of an indirect utility function
detailed in Theorem 1.6. It is easy to see that v(p, y) is homogeneous of degree zero in
prices and income, because for any t > 0,
∂v(p, y) r −1/r
= p1 + pr2 > 0, (E.3)
∂y
∂v(p, y) (−1/r)−1 r−1
= − pr1 + pr2 ypi < 0, i = 1, 2. (E.4)
∂pi
To verify Roy’s identity, form the required ratio of (E.4) to (E.3) and recall (E.1) to obtain
(−1/r)−1 r−1
∂v(p, y)/∂pi − pr1 + pr2 ypi
(−1) = (−1) r −1/r
∂v(p, y)/∂y p1 + pr2
ypr−1
i
= = xi (p, y), i = 1, 2.
pr1 + pr2
x2
u
e*/p2
e3/p2
x2h(p, u) xh
p 1/p 2
u
x1
x1h(p, u) e3/p 1 e*/p 1 e1/p 1 e2/p 1
income, and sought the maximum level of utility the consumer could achieve. To construct
the expenditure function, we again fix prices, but we ask a different sort of question about
the level of utility the consumer achieves. Specifically, we ask: what is the minimum level of
money expenditure the consumer must make facing a given set of prices to achieve a given
level of utility? In this construction, we ignore any limitations imposed by the consumer’s
income and simply ask what the consumer would have to spend to achieve some particular
level of utility.
To better understand the type of problem we are studying, consider Fig. 1.15 and
contrast it with Fig. 1.13. Each of the parallel straight lines in Fig. 1.15 depicts all bundles x
that require the same level of total expenditure to acquire when facing prices p = (p1 , p2 ).
Each of these isoexpenditure curves is defined implicity by e = p1 x1 + p2 x2 , for a dif-
ferent level of total expenditure e > 0. Each therefore will have the same slope, −p1 /p2 ,
but different horizontal and vertical intercepts, e/p1 and e/p2 , respectively. Isoexpenditure
curves farther out contain bundles costing more; those farther in give bundles costing less.
If we fix the level of utility at u, then the indifference curve u(x) = u gives all bundles
yielding the consumer that same level of utility.
There is no point in common with the isoexpenditure curve e3 and the indiffer-
ence curve u, indicating that e3 dollars is insufficient at these prices to achieve utility u.
However, each of the curves e1 , e2 , and e∗ has at least one point in common with u, indi-
cating that any of these levels of total expenditure is sufficient for the consumer to achieve
utility u. In constructing the expenditure function, however, we seek the minimum expen-
diture the consumer requires to achieve utility u, or the lowest possible isoexpenditure
curve that still has at least one point in common with indifference curve u. Clearly, that
will be level e∗ , and the least cost bundle that achieves utility u at prices p will be the bun-
dle xh = (x1h (p, u), x2h (p, u)). If we denote the minimum expenditure necessary to achieve
utility u at prices p by e(p, u), that level of expenditure will simply be equal to the cost of
bundle xh , or e(p, u) = p1 x1h (p, u) + p2 x2h (p, u) = e∗ .
CONSUMER THEORY 35
for all p
0 and all attainable utility levels u. For future reference, let U = {u(x) | x ∈
Rn+ } denote the set of attainable utility levels. Thus, the domain of e(·) is Rn++ ×U .
Note that e(p, u) is well-defined because for p ∈ Rn++ , x ∈ Rn+ , p · x ≥ 0. Hence,
the set of numbers {e|e = p · x for some x with u(x) ≥ u} is bounded below by zero.
Moreover because p
0, this set can be shown to be closed. Hence, it contains a smallest
number. The value e(p, u) is precisely this smallest number. Note that any solution vector
for this minimisation problem will be non-negative and will depend on the parameters p
and u. Notice also that if u(x) is continuous and strictly quasiconcave, the solution will be
unique, so we can denote the solution as the function xh (p, u) ≥ 0. As we have seen, if
xh (p, u) solves this problem, the lowest expenditure necessary to achieve utility u at prices
p will be exactly equal to the cost of the bundle xh (p, u), or
We have seen how the consumer’s utility maximisation problem is intimately related
to his observable market demand behaviour. Indeed, the very solutions to that problem –
the Marshallian demand functions – tell us just how much of every good we should observe
the consumer buying when he faces different prices and income. We shall now interpret the
solution, xh (p, u), of the expenditure-minimisation problem as another kind of ‘demand
function’ – but one that is not directly observable.
Consider the following mental experiment. If we fix the level of utility the consumer
is permitted to achieve at some arbitrary level u, how will his purchases of each good
behave as we change the prices he faces? The kind of ‘demand functions’ we are imagin-
ing here are thus utility-constant ones. We completely ignore the level of the consumer’s
money income and the utility levels he actually can achieve. In fact, we know that when a
consumer has some level of income and we change the prices he faces, there will ordinarily
be some change in his purchases and some corresponding change in the level of utility he
achieves. To imagine how we might then construct our hypothetical demand functions, we
must imagine a process by which whenever we lower some price, and so confer a utility
gain on the consumer, we compensate by reducing the consumer’s income, thus conferring
a corresponding utility loss sufficient to bring the consumer back to the original level of
utility. Similarly, whenever we increase some price, causing a utility loss, we must imag-
ine compensating for this by increasing the consumer’s income sufficiently to give a utility
gain equal to the loss. Because they reflect the net effect of this process by which we
match any utility change due to a change in prices by a compensating utility change from
a hypothetical adjustment in income, the hypothetical demand functions we are describing
are often called compensated demand functions. However, because John Hicks (1939)
was the first to write about them in quite this way, these hypothetical demand functions
are most commonly known as Hicksian demand functions. As we illustrate below, the
36 CHAPTER 1
x2
1 0
h 1 0 p 1/p 2
x 2 (p 1 , p 2 , u)
u
x1
h 0
x 1 (p 1, p 20, u) x 1h(p 11, p 20, u)
(a)
p1
p 10
p 11
h 0
x 1 (p 1, p 2 , u)
x1
h 0 0 h 1 0
x 1 (p 1, p 2 , u) x 1 (p 1, p 2, u)
(b)
consumer faces in Fig. 1.16 involves a level of expenditure exactly equal to the minimum
level necessary at the given prices to achieve the utility level in question.
Thus, the expenditure function defined in (1.14) contains within it some impor-
tant information on the consumer’s Hicksian demands. Although the analytic importance
of this construction will only become evident a bit later, we can take note here of the
remarkable ease with which that information can be extracted from a knowledge of
the expenditure function. The consumer’s Hicksian demands can be extracted from the
expenditure function by means of simple differentiation. We detail this and other important
properties of the expenditure function in the following theorem.
∂e(p0 , u0 )
= xih (p0 , u0 ), i = 1, . . . , n.
∂pi
Proof: To prove property 1, note that the lowest value in U is u(0) because u(·) is strictly
increasing on Rn+ . Consequently, e(p, u(0)) = 0 because x = 0 attains utility u(0) and
requires an expenditure of p · 0 = 0.
Property 2, continuity, follows once again from Theorem A2.21 (the theorem of the
maximum).
Although property 3 holds without any further assumptions, we shall be content to
demonstrate it under the additional hypotheses that xh (p, u)
0 is differentiable ∀ p
0,
u > u(0), and that u(·) is differentiable with ∂u(x)/∂xi > 0, ∀ i on Rn++ .
Now, because u(·) is continuous and strictly increasing, and p
0, the constraint
in (1.14) must be binding. For if u(x1 ) > u, there is a t ∈ (0, 1) close enough to 1 such
that u(tx1 ) > u. Moreover, u ≥ u(0) implies u(x1 ) > u(0), so that x1 =0. Therefore, p ·
(tx1 )< p · x1 , because p · x1 > 0. Consequently, when the constraint is not binding, there
is a strictly cheaper bundle that also satisfies the constraint. Hence, at the optimum, the
constraint must bind. Consequently, we may write (1.14) instead as
Now for p
0 and u > u(0), we have that x∗ = xh (p, u)
0 solves (P.1). So, by
Lagrange’s theorem, there is a λ∗ such that
∂ L(x∗ , λ∗ ) ∂u(x∗ )
= pi − λ∗ = 0, i = 1, . . . , n. (P.3)
∂xi ∂xi
Note then that because pi and ∂u(x∗ )/∂xi are positive, so, too, is λ∗ . Under our addi-
tional hypotheses, we can now use the Envelope theorem to show that e(p, u) is strictly
increasing in u.
By the Envelope theorem, the partial derivative of the minimum-value function
e(p, u) with respect to u is equal to the partial derivative of the Lagrangian with respect to
u, evaluated at (x∗ , λ∗ ). Hence,
∂e(p, u) ∂ L(x∗ , λ∗ )
= = λ∗ > 0.
∂u ∂u
Because this holds for all u > u(0), and because e(·) is continuous, we may conclude that
for all p
0, e(p, u) is strictly increasing in u on U (which includes u(0)).
That e is unbounded in u can be shown to follow from the fact that u(x) is continuous
and strictly increasing. You are asked to do so in Exercise 1.34.
Because property 4 follows from property 7, we shall defer it for the moment.
Property 5 will be left as an exercise.
For property 6, we must prove that e(p, u) is a concave function of prices. We begin
by recalling the definition of concavity. Let p1 and p2 be any two positive price vectors,
let t ∈ [0, 1], and let pt = tp1 + (1 − t)p2 be any convex combination of p1 and p2 . Then
the expenditure function will be concave in prices if
To see that this is indeed the case, simply focus on what it means for expenditure to be
minimised at given prices. Suppose in particular that x1 minimises expenditure to achieve
u when prices are p1 , that x2 minimises expenditure to achieve u when prices are p2 , and
that x∗ minimises expenditure to achieve u when prices are pt . Then the cost of x1 at prices
p1 must be no more than the cost at prices p1 of any other bundle x that achieves utility
u. Similarly, the cost of x2 at prices p2 must be no more than the cost at p2 of any other
bundle x that achieves utility u. Now, if, as we have said,
p1 ·x1 ≤ p1 ·x
CONSUMER THEORY 39
and
p2 ·x2 ≤ p2 ·x
for all x that achieve u, then these relations must also hold for x∗ , because x∗ achieves u
as well. Therefore, simply by virtue of what it means to minimise expenditure to achieve
u at given prices, we know that
p1 ·x1 ≤ p1 ·x∗
and
p2 ·x2 ≤ p2 ·x∗ .
But now we are home free. Because t ≥ 0 and (1 − t) ≥ 0, we can multiply the first of
these by t, the second by (1 − t), and add them. If we then substitute from the definition
of pt , we obtain
The left-hand side is just the convex combination of the minimum levels of expenditure
necessary at prices p1 and p2 to achieve utility u, and the right-hand side is the minimum
expenditure needed to achieve utility u at the convex combination of those prices. In short,
this is just the same as (P.5), and tells us that
as we intended to show.
To prove property 7, we again appeal to the Envelope theorem but now differentiate
with respect to pi . This gives
∂e(p, u) ∂ L(x∗ , λ∗ )
= = xi∗ ≡ xih (p, u),
∂pi ∂pi
as required. Because xh (p, u) ≥ 0, this also proves property 4. (See Exercise 1.37 for a
proof of 7 that does not require any additional assumptions. Try to prove property 4 without
additional assumptions as well.)
EXAMPLE 1.3 Suppose the direct utility function is again the CES form, u(x1 , x2 ) =
ρ ρ
(x1 + x2 )1/ρ , where 0 =ρ<1. We want to derive the corresponding expenditure func-
tion in this case. Because preferences are monotonic, we can formulate the expenditure
minimisation problem (1.15)
ρ ρ 1/ρ
min p1 x1 + p2 x2 s.t. x1 + x2 − u = 0, x1 ≥ 0, x2 ≥ 0,
x1 ,x2
40 CHAPTER 1
ρ ρ 1/ρ
L(x1 , x2 , λ) = p1 x1 + p2 x2 − λ x1 + x2 −u . (E.1)
Assuming an interior solution in both goods, the first-order conditions for a minimum
subject to the constraint ensure that the solution values x1 , x2 , and λ satisfy the equations
∂L ρ ρ (1/ρ)−1 ρ−1
= p1 − λ x1 + x2 x1 = 0, (E.2)
∂x1
∂L ρ ρ (1/ρ)−1 ρ−1
= p2 − λ x1 + x2 x2 = 0, (E.3)
∂x2
∂L ρ ρ 1/ρ
= x1 + x2 − u = 0. (E.4)
∂λ
1/(ρ−1)
p1
x1 = x2 , (E.5)
p2
ρ ρ 1/ρ
u = x1 + x2 . (E.6)
1/ρ
1/ρ
ρ/(ρ−1)
ρ p1 ρ p1 ρ/(ρ−1)
u = x2 + x2 = x2 ×1 .
p2 p2
ρ/(ρ−1) −1/ρ
p1 ρ/(ρ−1) ρ/(ρ−1) −1/ρ 1/(ρ−1)
x2 = u +1 = u p1 + p2 p2
p2
(1/r)−1 r−1
= u pr1 + pr2 p2 . (E.7)
The solutions (E.7) and (E.8) depend on the parameters of the minimisation problem, p
and u. These are the Hicksian demands, so we can denote (E.7) and (E.8)
(1/r)−1 r−1
x1h (p, u) = u pr1 + pr2 p1 , (E.9)
(1/r)−1
x2h (p, u) = u pr1 + pr2 pr−1
2 . (E.10)
To form the expenditure function, we invoke equation (1.15) and substitute from
(E.9) and (E.10) into the objective function in (E.1) to obtain
Equation (E.11) is the expenditure function we sought. We leave as an exercise the task of
verifying that it possesses the usual properties.
Next, fix (p, u) and let y = e(p, u). By the definition of e, this says that at prices
p, income y is the smallest income that allows the consumer to attain at least the level of
utility u. Consequently, at prices p, if the consumer’s income were in fact y, then he could
attain at least the level of utility u. Because v(p, y) is the largest utility level attainable at
prices p and with income y, this implies that v(p, y) ≥ u. Consequently, the definitions of
v and e also imply that
The next theorem demonstrates that under certain familiar conditions on preferences,
both of these inequalities, in fact, must be equalities.
Until now, if we wanted to derive a consumer’s indirect utility and expenditure func-
tions, we would have had to solve two separate constrained optimisation problems: one
a maximisation problem and the other a minimisation problem. This theorem, however,
points to an easy way to derive either one from knowledge of the other, thus requiring us
to solve only one optimisation problem and giving us the choice of which one we care to
solve.
To see how this would work, let us suppose first that we have solved the utility-
maximisation problem and formed the indirect utility function. One thing we know about
the indirect utility function is that it is strictly increasing in its income variable. But then,
holding prices constant and viewing it only as a function of income, it must be possible to
CONSUMER THEORY 43
invert the indirect utility function in its income variable. From before,
so we can apply that inverse function (call it v−1 (p : t)) to both sides of this and obtain
Whatever that expression on the right-hand side of (1.18) turns out to be, we know it will
correspond exactly to the expression for the consumer’s expenditure function – the expres-
sion we would eventually obtain if we solved the expenditure-minimisation problem, then
substituted back into the objective function.
Suppose, instead, that we had chosen to solve the expenditure-minimisation problem
and form the expenditure function, e(p, u). In this case, we know that e(p, u) is strictly
increasing in u. Again supposing prices constant, there will be an inverse of the expendi-
ture function in its utility variable, which we can denote e−1 (p : t). Applying this inverse
to both sides of the first item in Theorem 1.8, we find that the indirect utility function can
be solved for directly and will be that expression in p and y that results when we evaluate
the utility inverse of the expenditure function at any level of income y,
Equations (1.18) and (1.19) illustrate again the close relationship between utility
maximisation and expenditure minimisation. The two are conceptually just opposite sides
of the same coin. Mathematically, both the indirect utility function and the expenditure
function are simply the appropriately chosen inverses of each other.
EXAMPLE 1.4 We can illustrate these procedures by drawing on findings from the pre-
vious examples. In Example 1.2, we found that the CES direct utility function gives the
indirect utility function,
−1/r
v(p, y) = y pr1 + pr2 (E.1)
for any p and income level y. For an income level equal to e(p, u) dollars, therefore, we
must have
−1/r
v(p, e(p, u)) = e(p, u) pr1 + pr2 . (E.2)
Next, from the second item in Theorem 1.8, we know that for any p and u,
for the expenditure function. A quick look back at Example 1.3 confirms this is the
same expression for the expenditure function obtained by directly solving the consumer’s
expenditure-minimisation problem.
Suppose, instead, we begin with knowledge of the expenditure function and want
to derive the indirect utility function. For the CES direct utility function, we know from
Example 1.3 that
1/r
e(p, u) = u pr1 + pr2 (E.6)
for any p and utility level u. Then for utility level v(p, y), we will have
1/r
e(p, v(p, y)) = v(p, y) pr1 + pr2 . (E.7)
for the indirect utility function. A glance at Example 1.2 confirms that (E.10) is what we
obtained by directly solving the consumer’s utility-maximisation problem.
We can pursue this relationship between utility maximisation and expenditure min-
imisation a bit further by shifting our attention to the respective solutions to these two
problems. The solutions to the utility-maximisation problem are the Marshallian demand
functions. The solutions to the expenditure-minimisation problem are the Hicksian demand
functions. In view of the close relationship between the two optimisation problems them-
selves, it is natural to suspect there is some equally close relationship between their
CONSUMER THEORY 45
respective solutions. The following theorem clarifies the links between Hicksian and
Marshallian demands.
The first relation says that the Marshallian demand at prices p and income y is equal
to the Hicksian demand at prices p and the utility level that is the maximum that can be
achieved at prices p and income y. The second says that the Hicksian demand at any prices
p and utility level u is the same as the Marshallian demand at those prices and an income
level equal to the minimum expenditure necessary at those prices to achieve that utility
level.
Roughly, Theorem 1.9 says that solutions to (1.12) are also solutions to (1.14), and
vice versa. More precisely, if x∗ solves (1.12) at (p, y), the theorem says that x∗ solves
(1.14) at (p, u), where u = u(x∗ ). Conversely, if x∗ solves (1.14) at (p, u), then x∗ solves
(1.12) at (p, y), where y = p · x∗ . Fig. 1.17 illustrates the theorem. There, it is clear that
x∗ can be viewed either as the solution to (1.12) or the solution to (1.14). It is in this sense
that x∗ has a dual nature.
Proof: We will complete the proof of the first, leaving the second as an exercise.
Note that by Assumption 1.2, u(·) is continuous and strictly quasiconcave, so that
the solutions to (1.12) and (1.14) exist and are unique. Consequently, the Marshallian and
Hicksian demand fuctions are well-defined.
To prove the first relation, let x0 = x(p0 , y0 ), and let u0 = u(x0 ). Then v(p0 , y0 ) =
u by definition of v(·), and p0 · x0 = y0 because, by Assumption 1.2, u(·) is strictly
0
y/p2
x*
u(x*) = u
x1
y/p1
46 CHAPTER 1
because u(x0 ) = u0 and p0 ·x0 = y0 , this implies that x0 solves (1.14) when (p, u) =
(p0 , u0 ). Hence, x0 = xh (p0 , u0 ) and so x(p0 , y0 ) = xh (p0 , v(p0 , y0 )).
EXAMPLE 1.5 Let us confirm Theorem 1.9 for a CES consumer. From Example 1.3, the
Hicksian demands are
(1/r)−1 r−1
xih (p, u) = u pr1 + pr2 pi , i = 1, 2. (E.1)
The final expression on the right-hand side of (E.3) gives the Marshallian demands we
derived in Example 1.1 by solving the consumer’s utility-maximisation problem. This
confirms the first item in Theorem 1.9.
To confirm the second, suppose we know the Marshallian demands from
Example 1.1,
ypr−1
i
xi (p, y) = , i = 1, 2, (E.4)
pr1 + pr2
e(p, u)pr−1
i
xi (p, e(p, u)) =
pr1 + pr2
1/r pr−1
i
= u pr1 + pr2 (E.6)
pr1 + pr2
r (1/r)−1
= upr−1
i p1 + pr2 , i = 1, 2.
CONSUMER THEORY 47
x2
y/p2
x2*
x1
x1* y e(p, v(p, y))
p1 p1
(a)
p1
p1
The final expression on the right-hand side of (E.6) gives the Hicksian demands derived in
Example 1.3 by directly solving the consumer’s expenditure minimisation problem.
To conclude this section, we can illustrate the four relations in Theorems 1.8 and
1.9. In Fig. 1.18(a), a consumer with income y facing prices p achieves maximum utility
u by choosing x1∗ and x2∗ . That same u-level indifference curve therefore can be viewed
as giving the level of utility v(p, y), and, in Fig. 1.18(b), point (p1 , x1∗ ) will be a point
on the Marshallian demand curve for good 1. Consider next the consumer’s expenditure-
minimisation problem, and suppose we seek to minimise expenditure to achieve utility u.
Then, clearly, the lowest isoexpenditure curve that achieves u at prices p will coincide with
the budget constraint in the previous utility-maximisation problem, and the expenditure
minimising choices will again be x1∗ and x2∗ , giving the point (p1 , x1∗ ) in Fig. 1.18(b) as a
point on the consumer’s Hicksian demand for good 1.
Considering the two problems together, we can easily see from the coincident inter-
cepts of the budget constraint and isoexpenditure line that income y is an amount of
48 CHAPTER 1
money equal to the minimum expenditure necessary to achieve utility v(p, y) or that
y = e(p, v(p, y)). Utility level u is both the maximum achievable at prices p and income
y, so that u = v(p, y), and the maximum achievable at prices p and an income equal to the
minimum expenditure necessary to achieve u, so that u = v(p, e(p, u)). Finally, notice that
(p1 , x1∗ ) must be a point on all three of the following: (1) the Hicksian demand for good 1
at prices p and utility level u, (2) the Hicksian demand for good 1 at prices p and utility
level v(p, y), and (3) the Marshallian demand for good 1 at prices p and income y. Thus,
x1 (p, y) = x1h (p, v(p, y)) and x1h (p, u) = x1 (p, e(p, u)), as we had hoped.
By real income, we mean the maximum number of units of some commodity the
consumer could acquire if he spent his entire money income. Real income is intended
CONSUMER THEORY 49
to reflect the consumer’s total command over all resources by measuring his potential
command over a single real commodity. If y is the consumer’s money income, then the
ratio y/pj is called his real income in terms of good j and will be measured in units of
good j, because
y $
= = units of j.
pj $/unit of j
The simplest deduction we can make from our model of the utility-maximising con-
sumer is that only relative prices and real income affect behaviour. This is sometimes
expressed by saying that the consumer’s demand behaviour displays an absence of money
illusion. To see this, simply recall the discussion of Fig. 1.14. There, equiproportionate
changes in money income and the level of all prices leave the slope (relative prices) and
both intercepts of the consumer’s budget constraint (real income measured in terms of any
good) unchanged, and so call for no change in demand behaviour. Mathematically, this
amounts to saying that the consumer’s demand functions are homogeneous of degree zero
in prices and income. Because the only role that money has played in constructing our
model is as a unit of account, it would indeed be strange if this were not the case.
For future reference, we bundle this together with the observation that consumer
spending will typically exhaust income, and we give names to both results.
Now, because the budget sets at (p, y) and (tp, ty) are the same, each of x(p, y) and
x(tp, ty) was feasible when the other was chosen. Hence, the previous equality and the
strict quasiconcavity of u imply that
or that the demand for every good, xi (p, y), i = 1, . . . , n, is homogeneous of degree zero
in prices and income.
50 CHAPTER 1
p1 pn−1 y
x(p, y) = x(tp, ty) = x ,..., , 1, .
pn pn pn
In words, demand for each of the n goods depends only on n − 1 relative prices and the
consumer’s real income.
x2 x2 x2
x1 x1 x1
x10 x11 x10 x11 x11 x10
bought. Each of these cases is fully consistent with our model. What, then – if anything –
does the theory predict about how someone’s demand behaviour responds to changes in
(relative) prices?
Let us approach it intuitively first. When the price of a good declines, there are
at least two conceptually separate reasons why we expect some change in the quantity
demanded. First, that good becomes relatively cheaper compared to other goods. Because
all goods are desirable, even if the consumer’s total command over goods were unchanged,
we would expect him to substitute the relatively cheaper good for the now relatively more
expensive ones. This is the substitution effect (SE). At the same time, however, whenever
a price changes, the consumer’s command over goods in general is not unchanged. When
the price of any one good declines, the consumer’s total command over all goods is
effectively increased, allowing him to change his purchases of all goods in any way he
sees fit. The effect on quantity demanded of this generalised increase in purchasing power
is called the income effect (IE).
Although intuition tells us we can in some sense decompose the total effect (TE) of a
price change into these two separate conceptual categories, we will have to be a great deal
more precise if these ideas are to be of any analytical use. Different ways to formalise the
intuition of the income and substitution effects have been proposed. We shall follow that
proposed by Hicks (1939).
The Hicksian decomposition of the total effect of a price change starts with the
observation that the consumer achieves some level of utility at the original prices before
any change has occurred. The formalisation given to the intuitive notion of the substitution
effect is the following: the substitution effect is that (hypothetical) change in consumption
that would occur if relative prices were to change to their new levels but the maximum util-
ity the consumer can achieve were kept the same as before the price change. The income
effect is then defined as whatever is left of the total effect after the substitution effect.
Notice that because the income effect is defined as a residual, the total effect is always
completely explained by the sum of the substitution and the income effect. At first, this
might seem a strange way to do things, but a glance at Fig. 1.20 should convince you of at
least two things: its reasonable correspondence to the intuitive concepts of the income and
substitution effects, and its analytical ingenuity.
Look first at Fig. 1.20(a), and suppose the consumer originally faces prices p01 and
p2 and has income y. He originally buys quantities x10 and x20 and achieves utility level u0 .
0
Suppose the price of good 1 falls to p11 <p01 and that the total effect of this price change
on good 1 consumption is an increase to x11 , and the total effect on good 2 is a decrease
to x21 . To apply the Hicksian decomposition, we first perform the hypothetical experiment
of allowing the price of good 1 to fall to the new level p11 while holding the consumer to
the original u0 level indifference curve. It is as if we allowed the consumer to face the
new relative prices but reduced his income so that he faced the dashed hypothetical budget
constraint and asked him to maximise against it. Under these circumstances, the consumer
would increase his consumption of good 1 – the now relatively cheaper good – from x10
to x1s , and would decrease his consumption of good 2 – the now relatively more expen-
sive good – from x20 to x2s . These hypothetical changes in consumption are the Hicksian
52 CHAPTER 1
x2
u0
y/p20 u1
p 10/p 20
(a)
x 20 p 11/p 20
TE
x 21 SE
IE p 11/p 20
x 2s
x1
x 10 x 1s x 11
p1
SE IE
TE
p10
p1
(b)
1
p1
SE IE x 1(p 1, p 20, y)
x1
substitution effects on good 1 and good 2, and we regard them as due ‘purely’ to the
change in relative prices with no change whatsoever in the well-being of the consumer.
Look now at what is left of the total effect to explain. After hypothetical changes from
x10 and x20 to x1s and x2s , the changes from x1s and x2s to x11 and x21 remain to be explained.
Notice, however, that these are precisely the consumption changes that would occur if, at
the new prices and the original level of utility u0 , the consumer were given an increase in
real income shifting his budget constraint from the hypothetical dashed one out to the final,
post price-change line tangent to u1 . It is in this sense that the Hicksian income effect cap-
tures the change in consumption due ‘purely’ to the income-like change that accompanies
a price change.
CONSUMER THEORY 53
Look now at Fig. 1.20(b), which ignores what is happening with good 2 and focuses
exclusively on good 1. Clearly, (p01 , x10 ) and (p11 , x11 ) are points on the Marshallian demand
curve for good 1. Similarly, (p01 , x10 ) and (p11 , x1s ) are points on the Hicksian demand curve
for good 1, relative to the original utility level u0 . We can see that the Hicksian demand
curve picks up precisely the pure Hicksian substitution effect of an own-price change. (Get
it?) The Marshallian demand curve picks up the total effect of an own-price change. The
two diverge from one another precisely because of, and in an amount equal to, the Hicksian
income effect of an own-price change.
The Hicksian decomposition gives us a neat analytical way to isolate the two distinct
forces working to change demand behaviour following a price change. We can take these
same ideas and express them much more precisely, much more generally, and in a form
that will prove more analytically useful. The relationships between total effect, substitution
effect, and income effect are summarised in the Slutsky equation. The Slutsky equation is
sometimes called the ‘Fundamental Equation of Demand Theory’, so what follows merits
thinking about rather carefully.
Throughout the remainder of this chapter, Assumption 1.2 will be in effect, and,
moreover, we will freely differentiate whenever necessary.
Proof: The proof of this remarkable theorem is quite easy, though you must follow it quite
carefully to avoid getting lost. We begin by recalling one of the links between Hicksian
and Marshallian demand functions. From Theorem 1.9, we know that
for any prices and level of utility u∗ . Because this holds for all p
0, we can differentiate
both sides with respect to pj and the equality is preserved. The Hicksian demand on the
left-hand side, because it depends only directly on prices, is straightforward to differenti-
ate. The Marshallian demand on the right-hand side, however, depends directly on prices
through its price argument, but it also depends indirectly on prices through the expenditure
function in its income argument. We will have to apply the chain rule to differentiate the
right-hand side. Keeping this in mind, we obtain
Now if we look at (P.1) carefully, and remember the significance of the original level
of utility u∗ , we can make some critical substitutions. By assumption, u∗ is the utility the
consumer achieves facing p and y. Therefore, u∗ = v(p, y). The minimum expenditure at
prices p and utility u∗ therefore will be the same as the minimum expenditure at prices p
and utility v(p, y). From Theorem 1.8, however, we know that the minimum expenditure
at prices p and maximum utility that can be achieved at prices p and income y is equal to
income y. We have, therefore, that
In addition, Theorem 1.7 tells us that the partial with respect to pj of the expenditure
function in (P.1) is just the Hicksian demand for good j at utility u∗ . Because u∗ = v(p, y),
this must also be the Hicksian demand for good j at utility v(p, y), or
∂e(p, u∗ )
= xjh (p, u∗ ) = xjh (p, v(p, y)).
∂pj
But look at the right-most term here. We know from Theorem 1.9 that the Hicksian demand
at p and the maximum utility achieved at p and y is in turn equal to the Marshallian
demand at p and y! Thus, we have that
∂e(p, u∗ )
= xj (p, y). (P.3)
∂pj
[Beware here. Take note that we have shown the price partial of the expenditure function
in (P.1) to be the Marshallian demand for good j, not good i.]
To complete the proof, substitute from (P.2) and (P.3) into (P.1) to obtain
Slutsky equations provide neat analytical expressions for substitution and income
effects. They also give us an ‘accounting framework’, detailing how these must combine
to explain any total effect of a given price change. Yet by themselves, the Slutsky relations
do not answer any of the questions we set out to address. In fact, you might think that
all this has only made it harder to deduce implications for observable behaviour from our
theory. After all, the only thing we have done so far is decompose an observable total
effect into (1) an observable income effect and (2) an unobservable substitution effect.
CONSUMER THEORY 55
For example, consider what Slutsky tells us about the special case of an own-price change.
From Theorem 1.11, we have that
The term on the left is the slope of the Marshallian demand curve for good i – the response
of quantity demanded to a change in own price – and this is what we want to explain. To
do that, however, we apparently need to know something about the first term on the right.
This, however, is the slope of a Hicksian demand curve, and Hicksian demand curves
are not directly observable. What can we know about Hicksian demand curves when we
cannot even see them?
Surprisingly, our theory tells us quite a bit about Hicksian demands, and so quite a
bit about substitution terms – whether we can see them or not. Whatever we learn about
substitution terms then can be translated into knowledge about observable Marshallian
demands via the Slutsky equations. This is how the Slutsky equations will help us, and this
will be our strategy. We begin with a preliminary result on own-price effects that gives a
hint of what is to come.
∂xih (p, u)
≤ 0, i = 1, . . . , n.
∂pi
Proof: This theorem tells us that Hicksian demand curves must always be as we have shown
them in Fig. 1.16 and elsewhere: namely, negatively (non-positively) sloped with respect
to their own price. The proof is easy.
The derivative property of the expenditure function, Theorem 1.7, part 7, tells us that
for any p and u,
∂e(p, u)
= xih (p, u).
∂pi
We now have everything we need to spell out a modern version of the so-called
Law of Demand. Classical economists like Edgeworth and Marshall assumed ‘utility’
56 CHAPTER 1
was something measurable, and they believed in the Principle of Diminishing Marginal
Utility. Classical statements of the Law of Demand were therefore rather emphatic: ‘If
price goes down, quantity demanded goes up.’ This seemed generally to conform to obser-
vations of how people behave, but there were some troubling exceptions. The famous
Giffen’s paradox was the most outstanding of these. Although few in number, it seemed
as though there were at least some goods for which a decrease in price was followed by a
decrease in quantity demanded. This violated accepted doctrine, and classical theory could
not explain it.
Modern theory makes fewer assumptions on preferences than classical theory did.
In this sense, it is less restrictive and more widely applicable. Indeed, it is even capable
of resolving Giffen’s paradox. Look back at Fig. 1.19(c) and notice that the quantity of x1
demanded does indeed decline as its own price declines. Nothing rules this out, so there is
nothing paradoxical about Giffen’s paradox in the context of modern theory. However, we
do pay a price for greater generality: the modern Law of Demand must be more equivocal
than its classical precursor.
In stating the law, we use some familiar terminology. A good is called normal if
consumption of it increases as income increases, holding prices constant. A good is called
inferior if consumption of it declines as income increases, holding prices constant.
Proof: This follows easily from Theorem 1.12, if you use Theorem 1.11. You should do it
yourself, so we leave it as an exercise.
We actually know a great deal more about the Hicksian substitution terms than is
contained in Theorem 1.12 and more about the Marshallian demands than is contained
in Theorem 1.13. To move beyond the simple statements we have already made, we will
have to probe the system of substitution terms a bit deeper. We first establish that ‘cross-
substitution terms’ are symmetric.
In proving Theorem 1.12, we noted that the first-order price partial derivatives of
the expenditure function give us the Hicksian demand functions, so the second-order price
partials of the expenditure function will give us the first-order price partials of the Hicksian
demands. Thus,
∂ ∂e(p, u) ∂ h
= xi (p, u)
∂pj ∂pi ∂pj
or
for all i and j. By Young’s theorem, the order of differentiation of the expenditure function
makes no difference, so
∂ 2 e(p, u) ∂ 2 e(p, u)
= .
∂pi ∂pj ∂pj ∂pi
called the substitution matrix, contain all the Hicksian substitution terms. Then the
matrix σ (p, u) is negative semidefinite.
Proof: The proof of this is immediate when we recall from the proof of the previous
theorem that each term in this matrix is equal to one of the second-order price partial
derivatives of the expenditure function. In particular, we have seen that ∂xih (p, u)/∂pj =
∂ 2 e(p, u)/∂pj ∂pi for all i and j, so in matrix form we must have
⎛ ⎞ ⎛ 2 ⎞
∂x1h (p, u) ∂x1h (p, u) ∂ e(p, u) ∂ 2 e(p, u)
⎜ ∂p ··· ⎟ ⎜ ··· ⎟
⎜ ∂pn ⎟ ⎜ ∂p21 ∂pn ∂p1 ⎟
⎟ ⎜ ⎟
1
⎜ .. .. .. ⎜ .. .. .. ⎟.
⎜ . . . ⎟=⎜ . . . ⎟
⎜ ⎟ ⎜ ⎟
⎝ ∂xh (p, u) ∂xh (p, u) ⎠ ⎝ ∂ 2 e(p, u) ∂ 2 e(p, u) ⎠
n n
··· ···
∂p1 ∂pn ∂p1 ∂pn ∂p2n
The matrix on the right is simply the Hessian matrix of second-order price partials of the
expenditure function. From Theorem 1.7, the expenditure function is concave in prices.
From Theorem A2.4, the Hessian matrix of a concave function is negative semidefinite.
Because the two matrices are equal, the substitution matrix will therefore also be negative
semidefinite.
Having spent so much time exploring the properties of the unobservable Hicksian
demand system, we are finally in a position to use that knowledge to say something rather
concrete about the consumer’s observable demand behaviour. We had a glimpse of how
this might be done when we considered the ‘Law of Demand’. There, we asked what the
model of consumer behaviour implied for the unobservable, own-substitution effects, and
then used the Slutsky relation to translate that into a statement on the relations that must
hold between own-price and income responses in the consumer’s observable Marshallian
demand functions. In view of what we have now learned about the entire system of sub-
stitution terms, we need not limit ourselves to statements about own-price and income
changes. We can, in fact, use our knowledge of the substitution matrix to make a compre-
hensive deduction about the effects of all price and income changes on the entire system
of observable Marshallian demands.
and form the entire n×n Slutsky matrix of price and income responses as follows:
s(p, y) =
⎛ ⎞
∂ x1 (p, y) ∂x1 (p, y) ∂x1 (p, y) ∂x1 (p, y)
⎜ + x1 (p, y) ··· + xn (p, y) ⎟
⎜ ∂p 1 ∂y ∂pn ∂y ⎟
⎜ . .. .. ⎟
⎜ .. . . ⎟.
⎜ ⎟
⎝ ∂xn (p, y) ∂xn (p, y) ∂xn (p, y) ∂xn (p, y) ⎠
+ x1 (p, y) ··· + xn (p, y)
∂p1 ∂y ∂pn ∂y
Proof: The proof of this is very simple. Let u∗ be the maximum utility the consumer
achieves at prices p and income y, so u∗ = v(p, y). Solving for the ijth substitution term
from the Slutsky equation in Theorem 1.11, we obtain
If now we form the matrix s(p, y), it is clear from this that each element of that matrix is
exactly equal to the corresponding element of the Hicksian substitution matrix σ (p, u∗ ).
By Theorem 1.14, the substitution matrix is symmetric for all u, and by Theorem 1.15 it is
negative semidefinite for all u, so it will be both symmetric and negative semidefinite at u∗ ,
too. Because the two matrices are equal, the Slutsky matrix s(p, y) must also be symmetric
and negative semidefinite.
Theorems 1.10 and 1.16 can be used as the starting point for testing the theory we
have developed, or for applying it empirically. The requirements that consumer demand
satisfy homogeneity and budget balancedness, and that the associated Slutsky matrix be
symmetric and negative semidefinite, provide a set of restrictions on allowable values for
the parameters in any empirically estimated Marshallian demand system – if that system is
to be viewed as belonging to a price-taking, utility-maximising consumer. Are there other
testable restrictions implied by the theory? This is a question we shall take up in the next
chapter, but first we consider some important elasticity relations.
n
y= pi xi (p, y).
i=1
Because this equality holds for all p and y, we know that if any single price or the con-
sumer’s income changes, it must hold both before and after the change. All consumer
demand responses to price and income changes therefore must add up, or aggregate, in a
way that preserves the equality of the budget constraint after the change. There are many
such comparative statics experiments that we can perform on the budget constraint to
determine how demand responses must aggregate together. Sometimes, these are expressed
directly in terms of relations that must hold among various derivatives of the demand sys-
tem. We will instead present them here in terms of relations that must hold among various
price and income elasticities of demand. This will enable us to cast the results we obtain
in an equivalent, but perhaps more intuitive and more readily useful way. We begin with
some definitions for the record.
∂xi (p, y) y
ηi ≡ ,
∂y xi (p, y)
∂xi (p, y) pj
ij ≡
∂pj xi (p, y)
and let
pi xi (p, y)
n
si ≡ so that si ≥ 0 and si = 1.
y
i=1
The symbol ηi denotes the income elasticity of demand for good i, and measures
the percentage change in the quantity of i demanded per 1 per cent change in income. The
symbol ij denotes the price elasticity of demand for good i, and measures the percentage
change in the quantity of i demanded per 1 per cent change in the price pj . If j = i, ii
is called the own-price elasticity of demand for good i.6 If j =i, ij is called the cross-
price elasticity of demand for good i with respect to pj . The symbol si denotes the income
share, or proportion of the consumer’s income, spent on purchases of good i. These must
of course be non-negative and sum to 1.
6 Note that this has not been defined here, as is sometimes done, to guarantee that the own-price elasticity will be
a positive number whenever demand is negatively sloped with respect to own price.
CONSUMER THEORY 61
y = p · x(p, y) (P.1)
n
∂xi
1= pi .
∂y
i=1
Multiply and divide each element in the summation by xi y, rearrange, and get
n
pi xi ∂xi y
1= .
y ∂y xi
i=1
n
1= si ηi .
i=1
Cournot aggregation says that the share-weighted own- and cross-price elasticities
must always sum in a particular way. To prove 2, we examine the effect of changing a
single price, pj . Differentiating both sides of (P.1) with respect to pj , we obtain
n
∂xi ∂xj
0= pi + xj + pj ,
∂pj ∂pj
i =j
where we have differentiated the jth term separately from the others to emphasise that the
product rule must be used when differentiating the term pj xj (p, y). That noted, we can
combine terms and rearrange to get
n
∂xi
−xj = pi .
∂pj
i=1
62 CHAPTER 1
−pj xj pi ∂xi
n
= pj ;
y y ∂pj
i=1
−pj xj pi xi ∂xi pj
n
= .
y y ∂pj xi
i=1
Marshallian Demands
Homogeneity x(p, y) = x(tp, ty) for all (p, y), and t > 0
∂xi (p, y) ∂xi (p, y)
Symmetry + xj (p, y)
∂pj ∂y
∂xj (p, y) ∂xj (p, y) for all (p, y), and
= + xi (p, y)
∂pi ∂y
i, j = 1, . . . , n
Negative
semidefiniteness zT s(p, y)z ≤ 0 for all (p, y), and z
Budget balancedness ·nx(p, y) = y
p for all (p, y),
Engel aggregation i=1 si ηi = 1
n
Cournot aggregation i=1 si εij = −sj for j = 1, . . . , n
Hicksian Demands
Homogeneity xh (tp, u) = xh (p, u) for all (p, u), and t > 0
∂xih (p, y) ∂xjh (p, y)
Symmetry = for i, j = 1, . . . , n
∂pj ∂pi
Negative
semidefiniteness zT σ (p, u)z ≤ 0 for all p, u, and z
Relating the Two
∂xi (p, y)
Slutsky equation for all (p, y), u = v(p, y),
∂pj
∂xih (p, u) ∂xi (p, y)
= − xj (p, y) and i, j = 1, . . . , n
∂pj ∂y
n
−sj = si ij , j = 1, . . . , n.
i=1
Theorems 1.10 through 1.17, together, give us an accounting of some of the logi-
cal implications of utility-maximising behaviour. Homogeneity tells us how demand must
respond to an overall, equiproportionate change in all prices and income simultaneously,
and budget balancedness requires that demand always exhaust the consumer’s income. The
Slutsky equations give us qualitative information, or ‘sign restrictions’, on how the system
of demand functions must respond to very general kinds of price changes, as well as giv-
ing us analytical insight into the unobservable components of the demand response to a
price change: the income and substitution effects. Finally, the aggregation relations provide
information on how the quantities demanded – first in response to an income change alone,
then in response to a single price change – must all ‘hang together’ across the system of
demand functions. In the next chapter, we will ask whether there are other implications of
the theory we have developed. We end by pulling together all we have learned so far into
Fig. 1.21.
1.6 EXERCISES
1.1 Let X = R2+ . Verify that X satisfies all five properties required of a consumption set in Assumption
1.1.
1.2 Let be a preference relation. Prove the following:
(a) ⊂
(b) ∼⊂
(c) ∪ ∼=
(d) ∩ ∼= ∅
1.3 Give a proof or convincing argument for each of the following claims made in the text.
(a) Neither nor ∼ is complete.
(b) For any x1 and x2 in X, only one of the following holds: x1 x2 , or x2 x1 , or x1 ∼ x2 .
1.4 Prove that if is a preference relation, then the relation is transitive and the relation ∼ is
transitive. Also show that if x1 ∼ x2 x3 , then x1 x3 .
1.5 If is a preference relation, prove the following: For any x0 ∈ X,
(a) ∼ (x0 ) = (x0 ) ∩ (x0 )
(b) (x0 ) = ∼ (x0 )∪ (x0 )
(c) ∼ (x0 )∩ (x0 ) = ∅
(d) ∼ (x0 )∩ ≺ (x0 ) = ∅
64 CHAPTER 1
1.25 A consumer with convex, monotonic preferences consumes non-negative amounts of x1 and x2 .
(1/2)−α
(a) If u(x1 , x2 ) = x1α x2 represents those preferences, what restrictions must there be on the
value of parameter α? Explain.
(b) Given those restrictions, calculate the Marshallian demand functions.
1.26 A consumer of two goods faces positive prices and has a positive income. His utility function is
u(x1 , x2 ) = x1 .
(a) Prove that if u(x) ≥ u(y) the quasiconcavity of u(·) and its differentiability at x imply that the
derivative of u((1 − t)x + ty) with respect to t must be non-negative at t = 0.
(b) Compute the derivative of u((1 − t)x + ty) with respect to t evaluated at t = 0 and show that it
is ∇u(x) · (y − x).
1.29 An infinitely lived agent owns 1 unit of a commodity that he consumes over his lifetime. The com-
modity is perfectly storable and he will receive no more than he has now. Consumption of the
commodity in period t is denoted xt , and his lifetime utility function is given by
∞
u(x0 , x1 , x2 , . . .) = β t ln(xt ), where 0 < β < 1.
t=0
1.38 Verify that the expenditure function obtained from the CES direct utility function in Example 1.3
satisfies all the properties given in Theorem 1.7.
1.39 Complete the proof of Theorem 1.9 by showing that xh (p, u) = x(p, e(p, u)).
1.40 Use Roy’s identity and Theorem A2.6 to give an alternative proof that xi (p, y) is homogeneous of
degree zero in prices and income.
1.41 Prove that Hicksian demands are homogeneous of degree zero in prices.
1.42 Prove the modern Law of Demand given in Theorem 1.13. Prove that the converse of each statement
in the Law of Demand is not true.
1.43 For expositional purposes, we derived Theorems 1.14 and 1.15 separately, but really the second
one implies the first. Show that when the substitution matrix σ (p, u) is negative semidefinite, all
own-substitution terms will be non-positive.
1.44 In a two-good case, show that if one good is inferior, the other good must be normal.
1.45 Fix x0 ∈ Rn+ . Define the Slutsky-compensated demand function at x0 , xs (p, x0 ), by xs (p, x0 ) =
x(p, p · x0 ). Thus, Slutsky-compensated demand at x0 is that which would be made as prices
change and the consumer’s income is compensated so that he can always afford bundle x0 . Let
x0 = x(p0 , y0 ). Show that
where u0 = u(x0 ). Thus, the slopes of Hicksian and Slutsky-compensated demands are the same.
Consequently, the Slutsky matrix is the matrix of slopes of Slutsky-compensated demands, and this
is how it originally received its name.
1.46 We can derive yet another set of relations that must hold between price and income elasticities in
the consumer’s demand system. This one follows directly from homogeneity, and in fact can be
considered simply a restatement of that principle. Prove that nj=1 ij + ηi = 0, i = 1, . . . , n.
1.47 Suppose that u(x) is a linear homogeneous utility function.
(a) Show that the expenditure function is multiplicatively separable in p and u and can be written in
the form e(p, u) = e(p, 1)u.
(b) Show that the marginal utility of income depends on p, but is independent of y.
1.48 Suppose that the expenditure function is multiplicatively separable in p and u so that e(p, u) =
k(u)g(p), where k(·) is some positive monotonic function of a single variable, and g : Rn+ →R+ .
Show that the income elasticity of (Marshallian) demand for every good is equal to unity.
1.49 You are given the following information about the demand functions and expenditure patterns of a
consumer who spends all his income on two goods: (1) At current prices, the same amount is spent
on both goods; (2) at current prices, the own-price elasticity of demand for good 1 is equal to −3.
(a) At current prices, what is the elasticity of demand for good 2 with respect to the price of good 1?
(b) Can statements (1) and (2) both hold at all prices? Why or why not?
68 CHAPTER 1
1.50 Someone consumes a single good x, and his indirect utility function is
ȳη y1−η p0
v(p, y) = G A(p) + , where A(p) = x(ξ, ȳ)dξ,
1−η p
1.51 Consider the utility function, u(x1 , x2 ) = (x1 )1/2 + (x2 )1/2 .
(a) Compute the demand functions, xi (p1 , p2 , y), i = 1, 2.
(b) Compute the substitution term in the Slutsky equation for the effects on x1 of changes in p2 .
(c) Classify x1 and x2 as (gross) complements or substitutes.
1.52 Suppose η and η̄ are lower and upper bounds, respectively, on the income elasticity of demand for
good xi over all prices and incomes. Then
∂xi (p, y) y
η≤ ≤ η̄
∂y xi (p, y)
over all prices and incomes. Show that for any y and y0 ,
η
η̄
y xi (p, y) y
≤ ≤ .
y0 xi (p, y0 ) y0
1.53 Agents A and B have the following expenditure functions. In each case, state whether or not the
observable market behaviour of the two agents will be identical. Justify your answers.
(a) eA (p, u) and eB (p, u) = eA (p, 2u).
(b) eA (p, u) = k(u)g(p), where k (u) > 0, and eB (p, u) = 2eA (p, u).
1.54 The n-good Cobb-Douglas utility function is
n
u(x) = A xiαi ,
i=1
n
where A > 0 and i=1 αi = 1.
(a) Derive the Marshallian demand functions.
(b) Derive the indirect utility function.
(c) Compute the expenditure function.
(d) Compute the Hicksian demands.
CONSUMER THEORY 69
1.55 Suppose
n
u(x) = fi (xi )
i=1
is strictly quasiconcave with fi (xi ) > 0 for all i. The consumer faces fixed prices p
0 and has
income y > 0. Assume x(p, y)
0.
(a) Show that if one good displays increasing marginal utility at x(p, y), all other goods must display
diminishing marginal utility there.
(b) Prove that if one good displays increasing marginal utility and all others diminishing marginal
utility at x(p, y), then one good is normal and all other goods are inferior.
(c) Show that if all goods display diminishing marginal utility at x(p, y), then all goods are normal.
1.56 What restrictions must the αi , f (y), w(p1 , p2 ), and z(p1 , p2 ) satisfy if each of the following is to be a
legitimate indirect utility function?
α
(a) v(p1 , p2 , p3 , y) = f (y)pα1 1 pα2 2 p3 3
(b) v(p1 , p2 , y) = w(p1 , p2 ) + z(p1 , p2 )/y
1.57 The Stone-Geary utility function has the form
n
u(x) = (xi − ai )bi ,
i=1
n
where bi ≥ 0 and i=1 bi = 1. The ai ≥ 0 are often interpreted as ‘subsistence’ levels of the
respective commodities.
(a) Derive the associated expenditure and indirect utility functions. Note that the former is
linear in
utility, whereas the latter is proportional to the amount of ‘discretionary income’, y − ni=1 pi ai .
(b) Show that bi measures the share of this ‘discretionary income’ that will be spent on ‘discre-
tionary’ purchases of good xi in excess of the subsistence level ai .
1.58 The Stone-Geary expenditure function you derived in part (a) of the preceding exercise is a special
case of the Gorman polar form:
where a(p) and b(p) are both linear homogeneous and concave. Show that for a consumer with this
expenditure function, the income elasticity of demand for every good approaches zero as y→0, and
approaches unity as y→∞.
1.59 If e(p, u) = z(p1 , p2 )pm
3 u, where m > 0, what restrictions must z(p1 , p2 ) satisfy for this to be a
legitimate expenditure function?
1.60 Suppose x1 (p, y) and x2 (p, y) have equal income elasticity at (p0 , y0 ). Show that ∂x1 /∂p2 =
∂x2 /∂p1 at (p0 , y0 ).
70 CHAPTER 1
1.61 Show that the Slutsky relation can be expressed in elasticity form as
ij = ijh − sj ηi ,
where ijh is the elasticity of the Hicksian demand for xi with respect to price pj , and all other terms
are as defined in Definition 1.6.
1.62 According to Hicks’ Third Law:
n
∂xih (p, u)
pj = 0, i = 1, . . . , n,
∂pj
j=1
n
ijh = 0, i = 1, . . . , n.
j=1
Prove this and verify it for a consumer with the n-good Cobb-Douglas utility function in Exercise
1.54.
1.63 The substitution matrix of a utility-maximising consumer’s demand system at prices (8, p) is
a b
.
2 −1/2
Find a, b, and p.
1.64 True or false?
(a) When the ratio of goods consumed, xi /xj , is independent of the level of income for all i and j,
then all income elasticities are equal to 1.
(b) When income elasticities are all constant and equal, they must all be equal to 1.
(c) If the utility function is homothetic, the marginal utility of income is independent of prices and
depends only on income.
1.65 Show that the utility function is homothetic if and only if all demand functions are multiplicatively
separable in prices and income and of the form x(p, y) = φ(y)x(p, 1).
1.66 A consumer with income y0 faces prices p0 and enjoys utility u0 = v(p0 , y0 ). When prices change
to p1 , the cost of living is affected. To gauge the impact of these price changes, we may define a cost
of living index as the ratio
e(p1 , u0 )
I(p0 , p1 , u0 ) ≡ .
e(p0 , u0 )
(a) Show that I(p0 , p1 , u0 ) is greater than (less than) unity as the outlay necessary to maintain base
utility, u0 , rises (falls).
CONSUMER THEORY 71
(b) Suppose consumer income also changes from y0 to y1 . Show that the consumer will be better off
(worse off) in the final period whenever y1 /y0 is greater (less) than I(p0 , p1 , u0 ).
1.67 A cost of living index is introduced in the previous exercise. Suppose the consumer’s direct utility
√
function is u(x1 , x2 ) = x1 + x2 .
(a) Let base prices be p0 = (1, 2), base income be y0 = 10, and suppose p1 = (2, 1). Compute the
index I.
(b) Let base and final period prices be as in part (a), but now let base utility be u0 . Show that the
value of the index I will vary with the base utility.
(c) It can be shown that when consumer preferences are homothetic, I will be independent of the
base utility for any prices p0 and p1 . Can you show it?
1.68 Show that the share of income spent on good xi can always be measured by ∂ ln[e(p, u∗ )]/∂ ln(pi ),
where u∗ ≡ v(p, y).
CHAPTER 2
TOPICS IN CONSUMER
THEORY
In this chapter, we explore some additional topics in consumer theory. We begin with
duality theory and investigate more completely the links among utility, indirect utility, and
expenditure functions. Then we consider the classic ‘integrability problem’ and ask what
conditions a function of prices and income must satisfy in order that it qualify as a demand
function for some utility-maximising consumer. The answer to this question will provide
a complete characterisation of the restrictions our theory places on observable demand
behaviour. We then examine ‘revealed preference’, an alternative approach to demand
theory. Finally, we conclude our treatment of the individual consumer by looking at the
problem of choice under uncertainty.
illustrated in Fig. 2.1(a). Notice that A(p0 , u0 ) is a closed convex set containing all points
on and above the hyperplane, p0 · x = E(p0 , u0 ). Now choose different prices p1 , keep u0
fixed, and construct the closed convex set,
Imagine proceeding like this for all prices p 0 and forming the infinite intersection,
A(u0 ) ≡ A(p, u0 ) = {x ∈ Rn+ | p · x ≥ E(p, u0 ) for all p 0}. (2.1)
p0
The shaded area in Fig. 2.1(b) illustrates the intersection of a finite number of the
A(p, u0 ), and gives some intuition about what A(u0 ) will look like. It is easy to imagine
that as more and more prices are considered and more sets are added to the intersection, the
shaded area will more closely resemble a superior set for some quasiconcave real-valued
function. One might suspect, therefore, that these sets can be used to construct something
x2 x2
x1 x1
(a) (b)
Figure 2.1. (a) The closed half-space A(p0 , u0 ). (b) The intersection of a
finite collection of the sets A(p, u0 ).
TOPICS IN CONSUMER THEORY 75
very much like a direct utility function representing nice convex, monotonic preferences.
This is indeed the case and is demonstrated by the following theorem.
You might be wondering why we have chosen to define u(x) the way we have. After
all, there are many ways one can employ E(p, u) to assign numbers to each x ∈ Rn+ . To
understand why, forget this definition of u(x) and for the moment suppose that E(p, u)
is in fact the expenditure function generated by some utility function u(x). How might
we recover u(x) from knowledge of E(p, u)? Note that by the definition of an expendi-
ture function, p · x ≥ E(p, u(x)) for all prices p 0, and, typically, there will be equality
for some price. Therefore, because E is strictly increasing in u, u(x) is the largest value
of u such that p · x ≥ E(p, u) for all p 0. That is, u(x) is the largest value of u such
that x ∈ A(u). Consequently, the construction we have given is just right for recovering
the utility function that generated E(p, u) when in fact E(p, u) is an expenditure function.
But the preceding considerations give us a strategy for showing that it is: first, show that
u(x) defined as in the statement of Theorem 2.1 is a utility function satisfying our axioms.
(This is the content of Theorem 2.1.) Second, show that E is in fact the expenditure func-
tion generated by u(x). (This is the content of Theorem 2.2.) We now give the proof of
Theorem 2.1.
Proof: Note that by the definition of A(u), we may write u(x) as
The first thing that must be established is that u(x) is well-defined. That is, it must be
shown that the set {u ≥ 0 | p · x ≥ E(p, u) ∀ p 0} contains a largest element. We shall
sketch the argument. First, this set, call it B(x), must be bounded above because E(p, u)
is unbounded above and increasing in u. Thus, B(x) possesses an upper bound and hence
also a least upper bound, û. It must be shown that û ∈ B(x). But this follows because B(x)
is closed, which we will not show.
Having argued that u(x) is well-defined, let us consider the claim that it is increasing.
Consider x1 ≥ x2 . Then
p · x1 ≥ p · x 2 ∀ p 0, (P.1)
76 CHAPTER 2
Consequently, u(x2 ) satisfies the condition: x1 ∈ A(u(x2 )). But u(x1 ) is the largest u
satisfying x1 ∈ A(u). Hence, u(x1 ) ≥ u(x2 ), which shows that u(x) is increasing.
The unboundedness of u(·) on Rn+ can be shown by appealing to the increasing,
concavity, homogeneity, and differentiability properties of E(·) in p, and to the fact that its
domain in u is all of Rn+ . We shall not give the proof here (although it can be gleaned from
the proof of Theorem 2.2 below).
To show that u(·) is quasiconcave, we must show that for all x1 , x2 , and
convex combinations xt , u(xt ) ≥ min[u(x1 ), u(x2 )]. To see this, suppose that u(x1 ) =
min[u(x1 ), u(x2 )]. Because E is strictly increasing in u, we know that E(p, u(x1 )) ≤
E(p, u(x2 )) and that therefore
p · x1 ≥ E(p, u(x1 )) ∀ p 0,
p · x ≥ E(p, u(x ))
2 2
∀ p 0.
Theorem 2.1 tells us we can begin with an expenditure function and use it to
construct a direct utility function representing some convex, monotonic preferences. We
actually know a bit more about those preferences. If we begin with them and derive the
associated expenditure function, we end up with the function E(·) we started with!
That is, E(p, u) is the expenditure function generated by derived utility u(x).
Proof: Fix p0 0 and u0 ≥ 0 and suppose x ∈ Rn+ satisfies u(x) ≥ u0 . Note that because
u(·) is derived from E as in Theorem 2.1, we must then have
p · x ≥ E(p, u(x)) ∀ p 0.
p · x ≥ E(p, u0 ) ∀ p 0. (P.1)
We would like to show that the first inequality in (P.3) is an equality. To do so, it suffices
to find a single x0 ∈ Rn+ such that
because this would clearly imply that the minimum on the right-hand side of (P.3) could
not be greater than E(p0 , u0 ).
To establish (P.4), note that by Euler’s theorem (Theorem A2.7), because E is
differentiable and homogeneous of degree 1 in p,
∂E(p, u)
E(p, u) = ·p ∀ p 0, (P.5)
∂p
where we use ∂E(p, u)/∂p ≡ (∂E(p, u)/∂p1 , . . . , ∂E(p, u)/∂pn ) to denote the vector of
price-partial derivatives of E. Also, because E(p, u) is concave in p, Theorem A2.4 implies
that for all p 0,
∂E(p0 , u0 )
E(p, u0 ) ≤ E(p0 , u0 ) + · (p − p0 ). (P.6)
∂p
78 CHAPTER 2
But evaluating (P.5) at (p0 , u0 ) and combining this with (P.6) implies that
E(p, u0 ) ≤ ∂E(p0 , u0 )/∂p · p ∀ p 0. (P.7)
p · x0 ≥ E(p, u0 ) ∀ p 0. (P.8)
So, by the definition of u(·), we must have u(x0 ) ≥ u0 . Furthermore, evaluating (P.5) at
(p0 , u0 ) yields E(p0 , u0 ) = p0 · x0 . Thus, we have established (P.4) for this choice of x0 ,
and therefore we have shown that
The last two theorems tell us that any time we can write down a function of prices
and utility that satisfies properties 1 to 7 of Theorem 1.7, it will be a legitimate expenditure
function for some preferences satisfying many of the usual axioms. We can of course then
differentiate this function with respect to product prices to obtain the associated system
of Hicksian demands. If the underlying preferences are continuous and strictly increasing,
we can invert the function in u, obtain the associated indirect utility function, apply Roy’s
identity, and derive the system of Marshallian demands as well. Every time, we are assured
that the resulting demand systems possess all properties required by utility maximisation.
For theoretical purposes, therefore, a choice can be made. One can start with a direct
utility function and proceed by solving the appropriate optimisation problems to derive
the Hicksian and Marshallian demands. Or one can begin with an expenditure function
and proceed to obtain consumer demand systems by the generally easier route of inversion
and simple differentiation.
Let e(p, u) be the expenditure function generated by u(x). As we know, the con-
tinuity of u(x) is enough to guarantee that e(p, u) is well-defined. Moreover, e(p, u) is
continuous.
Going one step further, consider the utility function, call it w(x), generated by e(·)
in the now familiar way, that is,
A look at the proof of Theorem 2.1 will convince you that w(x) is increasing and
quasiconcave. Thus, regardless of whether or not u(x) is quasiconcave or increasing, w(x)
will be both quasiconcave and increasing. Clearly, then, u(x) and w(x) need not coincide.
How then are they related?
It is easy to see that w(x) ≥ u(x) for all x ∈ Rn+ . This follows because by the defini-
tion of e(·), we have e(p, u(x)) ≤ p · x ∀ p 0. The desired inequality now follows from
the definition of w(x).
Thus, for any u ≥ 0, the level-u superior set for u(x), say S(u), will be contained in
the level-u superior set for w(x), say, T(u). Moreover, because w(x) is quasiconcave, T(u)
is convex.
Now consider Fig. 2.2. If u(x) happens to be increasing and quasiconcave, then
the boundary of S(u) yields the negatively sloped, convex indifference curve u(x) = u
in Fig. 2.2(a). Note then that each point on that boundary is the expenditure-minimising
bundle to achieve utility u at some price vector p 0. Consequently, if u(x0 ) = u, then for
some p0 0, we have e(p0 , u) = p0 · x0 . But because e(·) is strictly increasing in u, this
means that w(x0 ) ≤ u = u(x0 ). But because w(x0 ) ≥ u(x0 ) always holds, we must then
have w(x0 ) = u(x0 ). Because u was arbitrary, this shows that in this case, w(x) = u(x) for
all x. But this is not much of a surprise in light of Theorems 2.1 and 2.2 and the assumed
quasiconcavity and increasing properties of u(x).
The case depicted in Fig. 2.2(b) is more interesting. There, u(x) is neither increasing
nor quasiconcave. Again, the boundary of S(u) yields the indifference curve u(x) = u.
Note that some bundles on the indifference curve never minimise the expenditure
required to obtain utility level u regardless of the price vector. The thick lines in Fig. 2.2(c)
show those bundles that do minimise expenditure at some positive price vector. For those
bundles x on the thick line segments in Fig. 2.2(c), we therefore have as before that w(x) =
u(x) = u. But because w(x) is quasiconcave and increasing, the w(x) = u indifference
curve must be as depicted in Fig. 2.2(d). Thus, w(x) differs from u(x) only as much as is
required to become strictly increasing and quasiconcave.
Given the relationship between their indifference curves, it is clear that if some
bundle maximises u(x) subject to p · x ≤ y, then the same bundle maximises w(x) sub-
ject to p · x ≤ y. (Careful, the converse is false.) Consequently, any observable demand
behaviour that can be generated by a non-increasing, non-quasiconcave utility function,
like u(x), can also be generated by an increasing, quasiconcave utility function, like w(x).
80 CHAPTER 2
x2
x2
S(u) T(u)
S(u)
u(x) u
u(x) u
x1 x1
0
(a) (b)
x2 x2
u(x) u u(x) u
w(x) u
x1 x1
0 0
(c) (d)
x2
Budget Line
u(x) u
x*
w(x) u
x1
0 y/p1
(e)
It is in this sense that the assumptions of monotonicity and convexity of preferences have
no observable implications for our theory of consumer demand.1
Thus, (2.2) provides a means for recovering the utility function u(x) from knowledge of
only the indirect utility function it generates. The following theorem gives one version of
this result, although the assumptions are not the weakest possible.
Proof: According to the discussion preceding Theorem 2.3, the left-hand side of (T.1) never
exceeds the right-hand side. Therefore, it suffices to show that for each x 0, there is
some p 0 such that
1 Before ending this discussion, we give a cautionary note on the conclusion regarding monotonicity. The fact
that the demand behaviour generated by u(x) in the preceding second case could be captured by the increasing
function w(x) relies on the assumption that the consumer only faces non-negative prices. For example, if with
two goods, one of the prices, say, p2 were negative, then we may have a situation such as that in Fig. 2.2(e),
where x∗ is optimal for the utility function u(x) but not for the increasing function w(x). Thus, if prices can be
negative, monotonicity is not without observable consequences.
82 CHAPTER 2
∂u(x0 )
− λ0 p0i = 0 i = 1, . . . , n (P.2)
∂xi
and
p0 · x0 = y0 . (P.3)
Consequently, (x0 , λ0 ) satisfy the first-order conditions for the consumer’s maximisation
problem max u(x) s.t. p0 · x = y0 . Moreover, by Theorem 1.4, because u(x) is quasicon-
cave, these conditions are sufficient to guarantee that x0 solves the consumer’s problem
when p = p0 and y = y0 . Therefore, u(x0 ) = v(p0 , y0 ) = v(p0 , p0 · x0 ). Consequently,
(P.1) holds for (p0 , x0 ), but because x0 was arbitrary, we may conclude that for every
x 0, (P.1) holds for some p 0.
As in the case of expenditure functions, one can show by using (T.1) that if some
function V(p, y) has all the properties of an indirect utility function given in Theorem 1.6,
then V(p, y) is in fact an indirect utility function. We will not pursue this result here,
however. The interested reader may consult Diewert (1974).
Finally, we note that (T.1) can be written in another form, which is sometimes more
convenient. Note that because v(p, y) is homogeneous of degree zero in (p, y), we have
v(p, p · x) = v(p/(p · x), 1) whenever p · x > 0. Consequently, if x 0 and p∗ 0 min-
p ≡ p∗ /(p∗ · x) 0 minimises v(p, 1) for p ∈ Rn++
imises v(p, p · x) for p ∈ Rn++ , then
such that p · x = 1. Moreover, v(p , p∗ · x) = v(
∗ p, 1). Thus, we may rewrite (T.1) as
Whether we use (T.1) or (T.1 ) to recover u(x) from v(p, y) does not matter. Simply
choose that which is more convenient. One disadvantage of (T.1) is that it always possesses
multiple solutions because of the homogeneity of v (i.e., if p∗ solves (T.1), then so does
tp∗ for all t > 0). Consequently, we could not, for example, apply Theorem A2.22 (the
Envelope theorem) as we shall have occasion to do in what follows. For purposes such as
these, (T.1 ) is distinctly superior.
EXAMPLE 2.1 Let us take a particular case and derive the direct utility function. Suppose
that v(p, y) = y(pr1 + pr2 )−1/r . From the latter part of Example 1.2, we know this satisfies
all necessary properties of an indirect utility function. We will use (T.1 ) to recover u(x).
Setting y = 1 yields v(p, 1) = (pr1 + pr2 )−1/r . The direct utility function therefore will be
the minimum-value function,
−1/r
u(x1 , x2 ) = min pr1 + pr2 s.t. p1 x1 + p2 x2 = 1.
p1 ,p2
TOPICS IN CONSUMER THEORY 83
First, solve the minimisation problem and then evaluate the objective function
at the solution to form the minimum-value function. The first-order conditions for the
Lagrangian require that the optimal p∗1 and p∗2 satisfy
Substituting from (E.4) into (E.3) and using (E.4) again, after a bit of algebra, gives the
solutions
1/(r−1)
x1
p∗1 = r/(r−1) r/(r−1)
, (E.5)
x1 + x2
1/(r−1)
x2
p∗2 = r/(r−1) r/(r−1)
. (E.6)
x1 + x2
Substituting these into the objective function and forming u(x1 , x2 ), we obtain
r/(r−1) r/(r−1) −1/r
x1 + x2
u(x1 , x2 ) = r/(r−1) r/(r−1) r
x1 + x2
r/(r−1) r/(r−1) 1−r −1/r
= x1 + x2
This is the CES direct utility function we started with in Example 1.2, as it should be.
The last duality result we take up concerns the consumer’s inverse demand func-
tions. Throughout the chapter, we have concentrated on the ordinary Marshallian demand
functions, where quantity demanded is expressed as a function of prices and income.
84 CHAPTER 2
THEOREM 2.4 (Hotelling, Wold) Duality and the System of Inverse Demands
Let u(x) be the consumer’s direct utility function. Then the inverse demand function for
good i associated with income y = 1 is given by
∂u(x)/∂xi
pi (x) = n .
j=1 xj (∂u(x)/∂xj )
Proof: By the definition of p(x), we have u(x) = v(p(x), 1) and [p(x)] · x = 1 for all x.
Consequently, by the discussion preceding Theorem 2.3 and the normalisation argument,
Consider now the Lagrangian associated with the minimisation problem in (P.1),
∂u(x) ∂ L(p∗ , λ∗ )
= = λ∗ p∗i , i = 1, . . . , n, (P.2)
∂xi ∂xi
where p∗ = p(x), and λ∗ is the optimal value of the Lagrange multiplier. Assuming
∂u(x)/∂xi > 0, we have then that λ∗ > 0.
Multiplying (P.2) by xi and summing over i gives
n
∂u(x)
n
∗
xi =λ p∗i xi
∂xi
i=1 i=1
n
= λ∗ pi (x)xi
i=1
= λ∗ , (P.3)
because [p(x)] · x = 1. Combining (P.2) and (P.3) and recalling that p∗i = pi (x) yields the
desired result.
TOPICS IN CONSUMER THEORY 85
EXAMPLE 2.2 Let us take the case of the CES utility function once again. If u(x1 , x2 ) =
ρ ρ
(x1 + x2 )1/ρ , then
ρ−1 ρ ρ −1
p1 = x1 x1 + x2 ,
ρ−1 ρ ρ −1
p2 = x2 x1 + x2 .
Notice carefully that these are precisely the solutions (E.5) and (E.6) to the first-order
conditions in Example 2.1, after substituting for r ≡ ρ/(ρ − 1). This is no coincidence.
In general, the solutions to the consumer’s utility-maximisation problem give Marshallian
demand as a function of price, and the solutions to its dual, the (normalised) indirect utility-
minimisation problem, give inverse demands as functions of quantity.
2.2 INTEGRABILITY
In Chapter 1, we showed that a utility-maximising consumer’s demand function must sat-
isfy homogeneity of degree zero, budget balancedness, symmetry, and negative semidefi-
niteness, along with Cournot and Engel aggregation. But, really, there is some redundancy
in these conditions. In particular, we know from Theorem 1.17 that both aggregation results
follow directly from budget balancedness. There is another redundancy as well. Of the
remaining four conditions, only budget balancedness, symmetry, and negative semidef-
initeness are truly independent: homogeneity of degree zero is implied by the others. In
fact, homogeneity is implied by budget balancedness and symmetry alone, as the following
theorem demonstrates.
Proof: Recall from the proof of Theorem 1.17 that when budget balancedness holds, we
may differentiate the budget equation with respect to prices and income to obtain for,
i = 1, . . . , n,
n
∂xj (p, y)
pj = −xi (p, y), (P.1)
∂pi
j=1
86 CHAPTER 2
and
n
∂xj (p, y)
pj = 1. (P.2)
∂y
j=1
Fix p and y, then let fi (t) = xi (tp, ty) for all t > 0. We must show that fi (t) is constant
in t or that fi (t) = 0 for all t > 0.
Differentiating fi with respect to t gives
n
∂xi (tp, ty) ∂xi (tp, ty)
fi (t) = pj + y. (P.3)
∂pj ∂y
j=1
Now by budget balancedness, tp · x(tp, ty) = ty, so that dividing by t > 0, we may
write
n
y= pj xj (tp, ty). (P.4)
j=1
n
∂xi (tp, ty) ∂xi (tp, ty)
fi (t) = pj + xj (tp, ty) .
∂pj ∂y
j=1
But the term in square brackets is the ijth entry of the Slutsky matrix, which, by
assumption, is symmetric. Consequently we may interchange i and j within those brackets
and maintain equality. Therefore,
n
∂xj (tp, ty) ∂xj (tp, ty)
fi (t) = pj + xi (tp, ty)
∂pi ∂y
j=1
n
n
∂xj (tp, ty) ∂xj (tp, ty)
= pj + xi (tp, ty) pj
∂pi ∂y
j=1 j=1
n
1
∂xj (tp, ty)
n
1 ∂xj (tp, ty)
= tpj + xi (tp, ty) tpj
t ∂pi t ∂y
j=1 j=1
1 1
= [−xi (tp, ty)] + xi (tp, ty) [1]
t t
= 0,
where the second-to-last equality follows from (P.1) and (P.2) evaluated at (tp, ty).
TOPICS IN CONSUMER THEORY 87
We would like to know whether or not this list is exhaustive. That is, are these the
only implications for observable behaviour that flow from our utility-maximisation model
of consumer behaviour? Are there perhaps other, additional implications that we have so
far not discovered? Remarkably, it can be shown that this list is in fact complete – there
are no other independent restrictions imposed on demand behaviour by the theory of the
utility-maximising consumer.
But how does one even begin to prove such a result? The solution method is inge-
nious, and its origins date back to Antonelli (1886). The idea is this: suppose we are given
a vector-valued function of prices and income, and that we are then somehow able to con-
struct a utility function that generates precisely this same function as its demand function.
Then, clearly, that original function must be consistent with our theory of the utility-
maximising consumer because it is in fact the demand function of a consumer with the
utility function we constructed. Antonelli’s insight was to realise that if the vector-valued
function of prices and income we start with satisfies just the three preceding conditions,
then there must indeed exist a utility function that generates it as its demand function. The
problem of recovering a consumer’s utility function from his demand function is known as
the integrability problem.
The implications of this are significant. According to Antonelli’s insight, if a func-
tion of prices and income satisfies the three preceding conditions, it is the demand function
for some utility-maximising consumer. We already know that only if a function of prices
and income satisfies those same conditions will it be the demand function for a utility-
maximising consumer. Putting these two together, we must conclude that those three
conditions – and those three conditions alone – provide a complete and definitive test
of our theory of consumer behaviour. That is, demand behaviour is consistent with the
theory of utility maximisation if and only if it satisfies budget balancedness, negative
semidefiniteness, and symmetry. This impressive result warrants a formal statement.
by some increasing, quasiconcave utility function if (and only if, when utility is continuous,
strictly increasing, and strictly quasiconcave) it satisfies budget balancedness, symmetry,
and negative semidefiniteness.
88 CHAPTER 2
We now sketch a proof of Antonelli’s result. However, we shall take the modern
approach to this problem as developed by Hurwicz and Uzawa (1971). Their strategy of
proof is a beautiful illustration of the power of duality theory.
Proof: (Sketch) Since we have already demonstrated the ‘only if ’ part, it suffices to prove
the ‘if’ part of the statement. So suppose some function x(p, y) satisfies budget balanced-
ness, symmetry, and negative semidefiniteness. We must somehow show that there is a
utility function that generates x(·) as its demand function.
Consider an arbitrary expenditure function, e(p, u), generated by some increas-
ing quasiconcave utility function u(x), and suppose that u(x) generates the Marshallian
demand function xm (p, y). At this stage, there need be no relation between x(·) and
e(·), x(·) and u(·), or x(·) and xm (·).
But just for the sake of argument, suppose that x(·) and e(·) happen to be related as
follows:
∂e(p, u)
= xi (p, e(p, u)), ∀ (p, u), i = 1, . . . , n. (P.1)
∂pi
Can we then say anything about the relationship between x(p, y) and the utility function
u(x) from which e(p, u) was derived? In fact, we can. If (P.1) holds, then x(p, y) is the
demand function generated by the utility function u(x). That is, x(p, y) = xm (p, y).
We now sketch why this is so. Note that if Shephard’s lemma were applicable, the
left-hand side of (P.1) would be equal to xh (p, u), so that (P.1) would imply
Moreover, if Theorem 1.9 were applicable, the Hicksian and Marshallian demand func-
tions would be related as
But now recall that, as an expenditure function, for each fixed p, e(p, u) assumes every
non-negative number as u varies over its domain. Consequently, (P.4) is equivalent to
as claimed. (Despite the fact that perhaps neither Shephard’s lemma nor Theorem 1.9 can
be applied, the preceding conclusion can be established.)
Thus, if the function x(p, y) is related to an expenditure function according to (P.1),
then x(p, y) is the demand function generated by some increasing, quasiconcave utility
function (i.e., that which, according to Theorem 2.1, generates the expenditure function).
TOPICS IN CONSUMER THEORY 89
We therefore have reduced our task to showing that there exists an expenditure function
e(p, u) related to x(p, y) according to (P.1).
Now, finding an expenditure function so that (P.1) holds is no easy task. Indeed,
(P.1) is known in the mathematics literature as a system of partial differential equa-
tions. Although such systems are often notoriously difficult to actually solve, there is an
important result that tells us precisely when a solution is guaranteed to exist. And, for our
purposes, existence is enough.
However, before stating this result, note the following. If (P.1) has a solution e(p, u),
then upon differentiating both sides by pj , we would get
∂ 2 e(p, u) ∂xi (p, e(p, u)) ∂e(p, u) ∂xi (p, e(p, u))
= + .
∂pj ∂pi ∂pj ∂pj ∂y
By Shephard’s lemma, using (P.2), and letting y = e(p, u), this can be written as
Now note that the left-hand side of (P.5) is symmetric in i and j by Young’s theorem.
Consequently, (P.5) implies that the right-hand side must be symmetric in i and j as well.
Therefore, symmetry of the right-hand side in i and j is a necessary condition for the
existence of a solution to (P.1).
Remarkably, it turns out that this condition is also sufficient for the existence of
a solution. According to Frobenius’ theorem, a solution to (P.1) exists if and only if
the right-hand side of (P.5) is symmetric in i and j. Take a close look at the right-hand
side of (P.5). It is precisely the ijth term of the Slutsky matrix associated with x(p, y).
Consequently, because that Slutsky matrix satisfies symmetry, a function e(p, u) satisfying
(P.1) is guaranteed to exist.
But will this function be a true expenditure function? Frobenius’ theorem is silent
on this issue. However, by Theorem 2.2, it will be an expenditure function if it has all the
properties of an expenditure function listed in Theorem 1.7. We now attempt to verify each
of those properties.
First, note that because e(p, u) satisfies (P.1), and because x(p, y) is non-negative,
e(p, u) is automatically increasing in p, and Shephard’s lemma is guaranteed by construc-
tion. Moreover, one can ensure it is continuous in (p, u), strictly increasing and unbounded
in u ∈ R+ , and that e(·, u) = 0 when u = 0. As you are asked to show in Exercise 2.4,
because (P.1) and budget balancedness are satisfied, e(·) must be homogeneous of degree 1
in p. Thus, the only remaining property of an expenditure function that must be established
is concavity in p.
By Theorem A 2.4, e(·) will be concave in p if and only if its Hessian matrix
with respect to p is negative semidefinite. But according to (P.5), this will be the case
if and only if the Slutsky matrix associated with x(p, y) is negative semidefinite, which, by
assumption, it is.
90 CHAPTER 2
Altogether we have established the following: A solution e(·) to (P.1) exists and is
an expenditure function if and only if x(p, y) satisfies budget balancedness, symmetry, and
negative semidefiniteness. This is precisely what we set out to show.
Although we have stressed the importance of this result for the theory itself, there
are practical benefits as well. For example, if one wishes to estimate a consumer’s
demand function based on a limited amount of data, and one wishes to impose as a
restriction that the demand function be utility-generated, one is now free to specify any
functional form for demand as long as it satisfies budget balancedness, symmetry, and
negative semidefiniteness. As we now know, any such demand function is guaranteed to
be utility-generated.
To give you a feel for how one can actually recover an expenditure function from a
demand function, we consider an example involving three goods.
EXAMPLE 2.3 Suppose there are three goods and that a consumer’s demand behaviour is
summarised by the functions
αi y
xi (p1 , p2 , p3 , y) = , i = 1, 2, 3,
pi
∂e(p1 , p2 , p3 , u) αi e(p1 , p2 , p3 , u)
= , i = 1, 2, 3.
∂pi pi
∂ ln(e(p1 , p2 , p3 , u)) αi
= , i = 1, 2, 3. (E.1)
∂pi pi
Now, if you were asked to find f (x) when told that f (x) = α/x, you would have no trouble
deducing that f (x) = α ln(x)+constant. But (E.1) says just that, where f = ln(e). The only
additional element to keep in mind is that when partially differentiating with respect to,
say, p1 , all the other variables – p2 , p3 , and u – are treated as constants. With this in mind,
TOPICS IN CONSUMER THEORY 91
it is easy to see that the three equations (E.1) imply the following three:
where the ci (·) functions are like the constant added before to f (x). But we must choose
the ci (·) functions so that all three of these equalities hold simultaneously. With a little
thought, you will convince yourself that (E.2) then implies
α
e(p, u) = c(u)pα1 1 pα2 2 p3 3 .
Because we must ensure that e(·) is strictly increasing in u, we may choose c(u) to
be any strictly increasing function. It does not matter which, because the implied demand
behaviour will be independent of such strictly increasing transformations. For example,
we may choose c(u) = u, so that our final solution is
We leave it to you to check that this function satisfies the original system of par-
tial differential equations and that it has all the properties required of an expenditure
function.
The basic idea is simple: if the consumer buys one bundle instead of another afford-
able bundle, then the first bundle is considered to be revealed preferred to the second. The
presumption is that by actually choosing one bundle over another, the consumer conveys
important information about his tastes. Instead of laying down axioms on a person’s pref-
erences as we did before, we make assumptions about the consistency of the choices that
are made. We make this all a bit more formal in the following.
p0 · x1 ≤ p0 · x0 ⇒ p1 · x0 > p1 · x1 .
To better understand the implications of this definition, look at Fig. 2.3. In both parts,
the consumer facing p0 chooses x0 , and facing p1 chooses x1 . In Fig. 2.3(a), the consumer’s
choices satisfy WARP. There, x0 is chosen when x1 could have been, but was not, and when
x1 is chosen, the consumer could not have afforded x0 . By contrast, in Fig. 2.3(b), x0 is
again chosen when x1 could have been, yet when x1 is chosen, the consumer could have
chosen x0 , but did not, violating WARP.
Now, suppose a consumer’s choice behaviour satisfies WARP. Let x(p, y) denote the
choice made by this consumer when faced with prices p and income y. Note well that this
is not a demand function because we have not mentioned utility or utility maximisation –
it just denotes the quantities the consumer chooses facing p and y. To keep this point clear
in our minds, we refer to x(p, y) as a choice function. In addition to WARP, we make one
x2 x2
x0
x1 x0 x1
p0 p0
p1 p1
x1 x1
(a) (b)
other assumption concerning the consumer’s choice behaviour, namely, that for p 0,
the choice x(p, y) satisfies budget balancedness, i.e., p · x(p, y) = y. The implications of
these two apparently mild requirements on the consumer’s choice behaviour are rather
remarkable.
The first consequence of WARP and budget balancedness is that the choice function
x(p, y) must be homogeneous of degree zero in (p, y). To see this, suppose x0 is chosen
when prices are p0 and income is y0 , and suppose x1 is chosen when prices are p1 = tp0
and income is y1 = ty0 for t > 0. Because y1 = ty0 , when all income is spent, we must
have p1 · x1 = tp0 · x0 . First, substitute tp0 for p1 in this, divide by t, and get
p0 · x1 = p0 · x0 . (2.3)
p1 · x1 = p1 · x0 . (2.4)
If x0 and x1 are distinct bundles for which (2.3) holds, then WARP implies that the left-
hand side in (2.4) must be strictly less than the right-hand side – a contradiction. Thus,
these bundles cannot be distinct, and the consumer’s choice function therefore must be
homogeneous of degree zero in prices and income.
Thus, the choice function x(p, y) must display one of the additional properties of
a demand function. In fact, as we now show, x(p, y) must display yet another of those
properties as well.
In Exercise 1.45, the notion of Slutsky-compensated demand was introduced. Let us
consider the effect here of Slutsky compensation for the consumer’s choice behaviour. In
case you missed the exercise, the Slutsky compensation is relative to some pre-specified
bundle, say x0 . The idea is to consider the choices the consumer makes as prices vary
arbitrarily while his income is compensated so that he can just afford the bundle x0 . (See
Fig. 2.4.) Consequently, at prices p, his income will be p · x0 . Under these circumstances,
his choice behaviour will be given by x(p, p · x0 ).
x0
p
p1
x1
94 CHAPTER 2
Now fix p0 0, y0 > 0, and let x0 = x(p0 , y0 ). Then if p1 is any other price vector
and = x(p1 , p1 · x0 ), WARP implies that
x1
p0 · x0 ≤ p0 · x1 . (2.5)
Indeed, if x1 = x0 , then (2.5) holds with equality. And if x1 =x0 , then because x1 was
chosen when x0 was affordable (i.e., at prices p1 and income p1 · x0 ), WARP implies that
x1 is not affordable whenever x0 is chosen. Consequently, the inequality in (2.5) would be
strict.
Now, note that by budget balancedness:
p1 · x0 = p1 · x(p1 , p1 · x0 ). (2.6)
Subtracting (2.5) from (2.6) then implies that for all prices p1 ,
Because (2.7) holds for all prices p1 , let p1 = p0 + tz, where t > 0, and z ∈ Rn is arbitrary.
Then (2.7) becomes
is maximised on [0, t̄) at t = 0. Thus, we must have f (0) ≤ 0. But taking the derivative of
f (t) and evaluating at t = 0 gives (assuming that x(·) is differentiable):
Now, because z ∈ Rn was arbitrary, (2.10) says that the matrix whose ijth entry is
must be negative semidefinite. But this matrix is precisely the Slutsky matrix associated
with the choice function x(p, y)!
Thus, we have demonstrated that if a choice function satisfies WARP and budget
balancedness, then it must satisfy two other properties implied by utility maximisation,
namely, homogeneity of degree zero and negative semidefiniteness of the Slutsky matrix.
If we could show, in addition, that the choice function’s Slutsky matrix was sym-
metric, then by our integrability result, that choice function would actually be a demand
function because we would then be able to construct a utility function generating it.
Before pursuing this last point further, it is worthwhile to point out that if x(p, y)
happens to be a utility-generated demand function then x(p, y) must satisfy WARP. To
see this, suppose a utility-maximising consumer has strictly monotonic and strictly convex
preferences. Then we know there will be a unique bundle demanded at every set of prices,
and that bundle will always exhaust the consumer’s income. (See Exercise 1.16.) So let x0
maximise utility facing prices p0 , let x1 maximise utility facing p1 , and suppose p0 · x1 ≤
p0 · x0 . Because x1 , though affordable, is not chosen, it must be because u(x0 ) > u(x1 ).
Therefore, when x1 is chosen facing prices p1 , it must be that x0 is not available or that
p1 · x0 > p1 · x1 . Thus, p0 · x1 ≤ p0 · x0 implies p1 · x0 > p1 · x1 , so WARP is satisfied.
But again what about the other way around? What if a consumer’s choice function
always satisfies WARP? Must that behaviour have been generated by utility maximisation?
Put another way, must there exist a utility function that would yield the observed choices
as the outcome of the utility-maximising process? If the answer is yes, we say the utility
function rationalises the observed behaviour.
As it turns out, the answer is yes – and no. If there are only two goods, then WARP
implies that there will exist some utility function that rationalises the choices; if, however,
there are more than two goods, then even if WARP holds there need not be such a function.
The reason for the two-good exception is related to the symmetry of the Slutsky
matrix and to transitivity.
It turns out that in the two-good case, budget balancedness together with homogene-
ity imply that the Slutsky matrix must be symmetric. (See Exercise 2.9.) Consequently,
because WARP and budget balancedness imply homogeneity as well as negative semidef-
initeness, then in the case of two goods, they also imply symmetry of the Slutsky matrix.
Therefore, for two goods, our integrability theorem tells us that the choice function must
be utility-generated.
An apparently distinct, yet ultimately equivalent, explanation for the two-good
exception is that with two goods, the pairwise ranking of bundles implied through revealed
preference turns out to have no intransitive cycles. (You are, in fact, asked to show this
in Exercise 2.9.) And when this is so, there will be a utility representation generating
the choice function. Thus, as we mentioned earlier in the text, there is a deep con-
nection between the symmetry of the Slutsky matrix and the transitivity of consumer
preferences.
For more than two goods, WARP and budget balancedness imply neither symmetry
of the Slutsky matrix nor the absence of intransitive cycles in the revealed preferred to
relation. Consequently, for more than two goods, WARP and budget balancedness are not
equivalent to the utility-maximisation hypothesis.
96 CHAPTER 2
This leads naturally to the question: how must we strengthen WARP to obtain a
theory of revealed preference that is equivalent to the theory of utility maximisation? The
answer lies in the ‘Strong Axiom of Revealed Preference’.
The Strong Axiom of Revealed Preference (SARP) is satisfied if, for every
sequence of distinct bundles x0 , x1 , . . . , xk , where x0 is revealed preferred to x1 , and x1 is
revealed preferred to x2 , . . . , and xk−1 is revealed preferred to xk , it is not the case that xk
is revealed preferred to x0 . SARP rules out intransitive revealed preferences and therefore
can be used to induce a complete and transitive preference relation, , for which there will
then exist a utility function that rationalises the observed behaviour. We omit the proof of
this and instead refer the reader to Houthakker (1950) for the original argument, and to
Richter (1966) for an elegant proof.
It is not difficult to show that if a consumer chooses bundles to maximise a strictly
quasiconcave and strictly increasing utility function, his demand behaviour must satisfy
SARP (see Exercise 2.11). Thus, a theory of demand built only on SARP, a restriction
on observable choice, is essentially equivalent to the theory of demand built on util-
ity maximisation. Under both SARP and the utility-maximisation hypothesis, consumer
demand will be homogeneous and the Slutsky matrix will be negative semidefinite and
symmetric.
In our analysis so far, we have focused on revealed preference axioms and consumer
choice functions. In effect, we have been acting as though we had an infinitely large collec-
tion of price and quantity data with which to work. To many, the original allure of revealed
preference theory was the promise it held of being able to begin with actual data and work
from the implied utility functions to predict consumer behaviour. Because real-world data
sets will never contain more than a finite number of sample points, more recent work on
revealed preference has attempted to grapple directly with some of the problems that arise
when the number of observations is finite.
To that end, Afriat (1967) introduced the Generalised Axiom of Revealed
Preference (GARP), a slightly weaker requirement than SARP, and proved an analogue
of the integrability theorem (Theorem 2.6). According to Afriat’s theorem, a finite set of
observed price and quantity data satisfy GARP if and only if there exists a continuous,
increasing, and concave utility function that rationalises the data. (Exercise 2.12 explores
a weaker version of Afriat’s theorem.) However, with only a finite amount of data, the
consumer’s preferences are not completely pinned down at bundles ‘out-of-sample’. Thus,
there can be many different utility functions that rationalise the (finite) data.
But, in some cases, revealed preference does allow us to make certain ‘out-of-
sample’ comparisons. For instance, consider Fig. 2.5. There we suppose we have observed
the consumer to choose x0 at prices p0 and x1 at prices p1 . It is easy to see that x0 is
revealed preferred to x1 . Thus, for any utility function that rationalises these data, we
must have u(x0 ) > u(x1 ), by definition. Now suppose we want to compare two bun-
dles such as x and y, which do not appear in our sample. Because y costs less than x1
when x1 was chosen, we may deduce that u(x0 ) > u(x1 ) > u(y). Also, if more is pre-
ferred to less, the utility function must be increasing, so we have u(x) ≥ u(x0 ). Thus,
we have u(x) ≥ u(x0 ) > u(x1 ) > u(y) for any increasing utility function that rationalises
the observed data, and so we can compare our two out-of-sample bundles directly and
TOPICS IN CONSUMER THEORY 97
x0
x1
p1
y
p0
x1
conclude u(x) > u(y) for any increasing utility function that could possibly have generated
the data we have observed.
But things do not always work out so nicely. To illustrate, say we observe the con-
sumer to buy the single bundle x1 = (1, 1) at prices p1 = (2, 1). The utility function
u(x) = x12 x2 rationalises the choice we observe because the indifference curve through
x1 is tangent there to the budget constraint 2x1 + x2 = 3, as you can easily verify. At the
same time, the utility function v(x) = x1 (x2 + 1) will also rationalise the choice of x1 at p1
as this utility function’s indifference curve through x1 will also be tangent at x1 to the same
budget constraint. This would not be a problem if u(x) and v(x) were merely monotonic
transforms of one another – but they are not. For when we compare the out-of-sample bun-
dles x = (3, 1) and y = (1, 7), in the one case, we get u(3, 1) > u(1, 7), telling us the con-
sumer prefers x to y, and in the other, we get v(3, 1) < v(1, 7), telling us he prefers y to x.
So for a given bundle y, can we find all bundles x such that u(x) > u(y) for every
utility function rationalises the data set? A partial solution has been provided by Varian
(1982). Varian described a set of bundles such that every x in the set satisfies u(x) > u(y)
for every u(·) that rationalises the data. Knoblauch (1992) then showed that Varian’s set is
a complete solution – that is, it contains all such bundles.
Unfortunately, consumption data usually contain violations of GARP. Thus, the
search is now on for criteria to help decide when those violations of GARP are unim-
portant enough to ignore and for practical algorithms that will construct appropriate utility
functions on data sets with minor violations of GARP.
2.4 UNCERTAINTY
Until now, we have assumed that decision makers act in a world of absolute certainty. The
consumer knows the prices of all commodities and knows that any feasible consumption
bundle can be obtained with certainty. Clearly, economic agents in the real world cannot
always operate under such pleasant conditions. Many economic decisions contain some
element of uncertainty. When buying a car, for example, the consumer must consider the
98 CHAPTER 2
future price of petrol, expenditure on repairs, and the resale value of the car several years
later – none of which is known with certainty at the time of the decision. Decisions like this
involve uncertainty about the outcome of the choice that is made. Whereas the decision
maker may know the probabilities of different possible outcomes, the final result of the
decision cannot be known until it occurs.
At first glance, uncertainty may seem an intractable problem, yet economic theory
has much to contribute. The principal analytical approach to uncertainty is based on the
pathbreaking work of von Neumann and Morgenstern (1944).
2.4.1 PREFERENCES
Earlier in the text, the consumer was assumed to have a preference relation over all con-
sumption bundles x in a consumption set X. To allow for uncertainty we need only shift
perspective slightly. We will maintain the notion of a preference relation but, instead of
consumption bundles, the individual will be assumed to have a preference relation over
gambles.
To formalise this, let A = {a1 , . . . , an } denote a finite set of outcomes. The ai ’s
might well be consumption bundles, amounts of money (positive or negative), or anything
at all. The main point is that the ai ’s themselves involve no uncertainty. On the other hand,
we shall use the set A as the basis for creating gambles.
For example, let A = {1, −1}, where 1 is the outcome ‘win one dollar’, and −1 is
the outcome ‘lose one dollar’. Suppose that you have entered into the following bet with
a friend. If the toss of a fair coin comes up heads, she pays you one dollar, and you pay
her one dollar if it comes up tails. From your point of view, this gamble will result in one
of the two outcomes in A: 1 (win a dollar) or −1 (lose a dollar), and each of these occurs
with a probability of one-half because the coin is fair.
More generally, a simple gamble assigns a probability, pi , to each of the outcomes
ai , in A. Of course, because the pi ’s are probabilities, they must be non-negative, and
because the gamble must result in some outcome in A, the pi ’s must sum to one. We denote
this simple gamble by (p1 ◦ a1 , . . . , pn ◦ an ). We define the set of simple gambles GS as
follows.
n
GS ≡ ( p1 ◦ a1 , . . . , pn ◦ an ) | pi ≥ 0, pi = 1 .
i=1
TOPICS IN CONSUMER THEORY 99
When one or more of the pi ’s is zero, we shall drop those components from the
expression when it is convenient to do so. For example, the simple gamble (α ◦ a1 , 0 ◦
a2 , . . . , 0 ◦ an−1 , (1 − α) ◦ an ) would be written as (α ◦ a1 , (1 − α) ◦ an ). Note that GS
contains A because for each i, (1 ◦ ai ), the gamble yielding ai with probability one, is in
GS . To simplify the notation further, we shall write ai instead of (1 ◦ ai ) to denote this
gamble yielding outcome ai with certainty.
Returning to our coin-tossing example where A = {1, −1}, each individual, then,
was faced with the simple gamble ( 12 ◦ 1, 12 ◦ −1). Of course, not all gambles are simple.
For example, it is quite common for state lotteries to give as prizes tickets for the next
lottery! Gambles whose prizes are themselves gambles are called compound gambles.
Note that there is no limit to the level of compounding that a compound gamble
might involve. Indeed, the example of the state lottery is a particularly extreme case in
point. Because each state lottery ticket might result in another lottery ticket as a prize,
each ticket involves infinitely many levels of compounding. That is, by continuing to win
lottery tickets as prizes, it can take any number of plays of the state lottery before the
outcome of your original ticket is realised.
For simplicity only, we shall rule out infinitely layered compound gambles like the
state lottery. The compound gambles we shall consider must result in an outcome in A after
finitely many randomisations.
Let G then, denote the set of all gambles, both simple and compound. Although it is
possible to give a more formal description of the set of compound gambles, and therefore
of G , for our purposes this is not necessary. Quite simply, a gamble can be viewed as a
lottery ticket, which itself might result in one of a number of other (perhaps quite distinct)
lottery tickets, and so on. But ultimately, after finitely many lotteries have been played,
some outcome in A must result. So, if g is any gamble in G , then g = (p1 ◦ g1 , . . . , pk ◦ gk ),
for some k ≥ 1 and some gambles gi ∈ G , where the gi ’s might be compound gambles,
simple gambles, or outcomes. Of course, the pi ’s must be non-negative and they must sum
to one.2
The objects of choice in decision making under uncertainty are gambles. Analogous
to the case of consumer theory, we shall suppose that the decision maker has preferences,
, over the set of gambles, G . We shall proceed by positing a number of axioms, called
axioms of choice under uncertainty, for the decision maker’s preference relation, . As
before, ∼ and denote the indifference and strict preference relations induced by . The
first few axioms will look very familiar and so require no discussion.
AXIOM 1: Completeness. For any two gambles, g and g in G , either g g , or g g.
2 Fora formal definition of G , proceed as follows. Let G0 = A, and for each j = 1, 2, . . ., let Gj = {(p1 ◦ g1 , . . . ,
pk ◦ gk ) | k ≥ 1; pi ≥ 0 and gi ∈ Gj−1 ∀ i = 1, . . . , k; and ki=1 pi = 1}. Then G = ∪∞j=0 Gj .
100 CHAPTER 2
Exercise 2.16.) So let us assume without loss of generality that the elements of A have
been indexed so that a1 a2 · · · an .
It seems plausible then that no gamble is better than that giving a1 with certainty, and
no gamble is worse than that giving an with certainty (although we are not directly assum-
ing this). That is, for any gamble g, it seems plausible that (α ◦ a1 , (1 − α) ◦ an ) g,
when α = 1, and g (α ◦ a1 , (1 − α) ◦ an ) when α = 0. The next axiom says that if
indifference does not hold at either extreme, then it must hold for some intermediate
value of α.
AXIOM 3: Continuity. For any gamble g in G , there is some probability, α ∈ [0, 1], such
that g ∼ (α ◦ a1 , (1 − α) ◦ an ).
Axiom G3 has implications that at first glance might appear unreasonable. For exam-
ple, suppose that A = {$1000, $10, ‘death’}. For most of us, these outcomes are strictly
ordered as follows: $1000 $10 ‘death’. Now consider the simple gamble giving $10
with certainty. According to G3, there must be some probability α rendering the gamble
(α◦ $1000, (1 − α) ◦ ‘death’) equally attractive as $10. Thus, if there is no probability α
at which you would find $10 with certainty and the gamble (α◦ $1000, (1 − α) ◦ ‘death’)
equally attractive, then your preferences over gambles do not satisfy G3.
Is, then, Axiom G3 an unduly strong restriction to impose on preferences? Do not
be too hasty in reaching a conclusion. If you would drive across town to collect $1000 –
an action involving some positive, if tiny, probability of death – rather than accept a $10
payment to stay at home, you would be declaring your preference for the gamble over the
small sum with certainty. Presumably, we could increase the probability of a fatal traffic
accident until you were just indifferent between the two choices. When that is the case, we
will have found the indifference probability whose existence G3 assumes.
The next axiom expresses the idea that if two simple gambles each potentially yield
only the best and worst outcomes, then that which yields the best outcome with the higher
probability is preferred.
AXIOM 4: Monotonicity. For all probabilities α, β ∈ [0, 1],
(α ◦ a1 , (1 − α) ◦ an ) (β ◦ a1 , (1 − β) ◦ an )
if and only if α ≥ β.
Note that monotonicity implies a1 an , and so the case in which the decision maker
is indifferent among all the outcomes in A is ruled out.
Although most people will usually prefer gambles that give better outcomes higher
probability, as monotonicity requires, it need not always be so. For example, to a safari
hunter, death may be the worst outcome of an outing, yet the possibility of death adds to
the excitement of the venture. An outing with a small probability of death would then be
preferred to one with zero probability, a clear violation of monotonicity.
The next axiom states that the decision maker is indifferent between one gamble and
another if he is indifferent between their realisations, and their realisations occur with the
same probabilities.
TOPICS IN CONSUMER THEORY 101
3 In some treatments, Axioms G5 and G6 are combined into a single ‘independence’ axiom. (See Exercise 2.20.)
102 CHAPTER 2
playing the slot machines many times during their stay and taking the single once and
for all gamble defined by the effective probabilities over winnings and losses. On the other
hand, many decisions under uncertainty are undertaken outside of Las Vegas, and for many
of these, Axiom G6 is reasonable.
n
u(g) = pi u(ai ),
i=1
Thus, to say that u has the expected utility property is to say that it assigns to each
gamble the expected value of the utilities that might result, where each utility that might
result is assigned its effective probability.5 Of course, the effective probability that g yields
utility u(ai ) is simply the effective probability that it yields outcome ai , namely, pi .
4 The function u(·) represents whenever gg if and only if u(g) ≥ u(g ). See Definition 1.5.
1 , . . . , xn with probabilities p1 , . . . , pn , respectively, is
5 The expected value of a function x taking on the values x
defined to be equal to ni=1 pi xi . Here, the u(ai )’s play the role of the xi ’s, so that we are considering the expected
value of utility.
TOPICS IN CONSUMER THEORY 103
n
u(p1 ◦ a1 , . . . , pn ◦ an ) = pi u(ai ), ∀ probability vectors (p1 , . . . , pn ).
i=1
Proof: As in our proof of the existence of a utility function representing the consumer’s
preferences in Chapter 1, the proof here will be constructive.
So, consider an arbitrary gamble, g, from G . Define u(g) to be the number satisfying
g ∼ (u(g) ◦ a1 , (1 − u(g)) ◦ an ).
By G3, such a number must exist, and you are asked to show in Exercise 2.19 that by
G4 this number is unique. This then defines a real-valued function, u, on G . (Incidentally,
by definition, u(g) ∈ [0, 1] for all g.)
It remains to show that u represents , and that it has the expected utility property.
We shall begin with the first of these.
So let g, g ∈ G be arbitrary gambles. We claim that the following equivalences hold
g g (P.1)
104 CHAPTER 2
if and only if
if and only if
To see this, note that (P.1) iff (P.2) because is transitive, and g ∼ (u(g) ◦ a1 , (1 − u(g)) ◦
an ), and g ∼ (u(g ) ◦ a1 , (1 − u(g )) ◦ an ), both by the definition of u. Also, (P.2) iff (P.3)
follows directly from monotonicity (Axiom G4).
Consequently, g g if and only if u(g) ≥ u(g ), so that u represents on G .
To complete the proof, we must show that u has the expected utility property. So
let g ∈ G be an arbitrary gamble, and let gs ≡ (p1 ◦ a1 , . . . , pn ◦ an ) ∈ GS be the simple
gamble it induces. We must show that
n
u(g) = pi u(ai ).
i=1
n
u(gs ) = pi u(ai ). (P.4)
i=1
Let qi denote the simple gamble on the right in (P.5). That is, qi ≡ (u(ai ) ◦ a1 , (1−
u(ai )) ◦ an ) for every i = 1, . . . , n. Consequently, qi ∼ ai for every i, so that by the
substitution axiom, G5,
We now wish to derive the simple gamble induced by the compound gamble g .
Note that because each qi can result only in one of the two outcomes a1 or an , g must
result only in one of those two outcomes as well. What is the effective probability that g
assigns to a1 ? Well, a1 results if for any i, qi occurs (probability pi ) and a1 is the result
of gamble qi (probability u(ai )). Thus, for each i, there is a probability of pi u(ai ) that
a1 will result. Because the occurrences i
nof the q ’s are mutually exclusive, the effective
probability that a1 results is the sum i=1 pi u(ai ). Similarly, the effective probability of
TOPICS IN CONSUMER THEORY 105
an is ni=1 pi (1 − u(ai )), which is equal to 1 − ni=1 pi u(ai ), because the pi ’s sum to one.
Therefore, the simple gamble induced by g is
n
n
gs ≡ pi u(ai ) ◦ a1 , 1 − pi u(ai ) ◦ an .
i=1 i=1
By the reduction axiom, G6, it must be the case that g ∼ gs . But the transitivity of ∼
together with (P.6) then imply that
n
n
gs ∼ pi u(ai ) ◦ a1 , 1 − pi u(ai ) ◦ an . (P.7)
i=1 i=1
However, by definition (and Exercise 2.19), u(gs ) is the unique number satisfying
n
u(gs ) = pi u(ai ),
i=1
as desired.
The careful reader might have noticed that Axiom G1 was not invoked in the process
of proving Theorem 2.7. Indeed, it is redundant given the other axioms. In Exercise 2.22,
you are asked to show that G2, G3, and G4 together imply G1. Consequently, we could
have proceeded without explicitly mentioning completeness at all. On the other hand,
assuming transitivity and not completeness would surely have raised unnecessary ques-
tions in the reader’s mind. To spare you that kind of stress, we opted for the approach
presented here.
The upshot of Theorem 2.7 is this: if an individual’s preferences over gambles satisfy
Axioms G1 through G6, then there are utility numbers that can be assigned to the outcomes
in A so that the individual prefers one gamble over another if and only if the one has a
higher expected utility than the other.
The proof of Theorem 2.7 not only establishes the existence of a utility function with
the expected utility property, but it also shows us the steps we might take in constructing
such a function in practice. To determine the utility of any outcome ai , we need only ask
the individual for the probability of the best outcome that would make him indifferent
between a best–worst gamble of the form (α ◦ a1 , (1 − α) ◦ an ) and the outcome ai with
certainty. By repeating this process for every ai ∈ A, we then could calculate the utility
associated with any gamble g ∈ G as simply the expected utility it generates. And if the
individual’s preferences satisfy G1 through G6, Theorem 2.7 guarantees that the utility
function we obtain in this way represents her preferences.
106 CHAPTER 2
EXAMPLE 2.4 Suppose A = {$10, $4, −$2}, where each of these represent thousands of
dollars. We can reasonably suppose that the best outcome is $10 and the worst is −$2.
To construct the VNM utility function used in the proof of Theorem 2.7, we first have
to come up with indifference probabilities associated with each of the three outcomes. We
accomplish this by composing best–worst gambles that offer $10 and −$2 with as yet
unknown probabilities summing to 1. Finally, we ask the individual the following question
for each of the three outcomes: ‘What probability for the best outcome will make you
indifferent between the best–worst gamble we have composed and the outcome ai with
certainty?’ The answers we get will be the utility numbers we assign to each of the three
ultimate outcomes. Suppose we find that
Note carefully that under this mapping, the utility of the best outcome must always be
1 and that of the worst outcome must always be zero. However, the utility assigned to
intermediate outcomes, such as $4 in this example, will depend on the individual’s attitude
towards taking risks.
Having obtained the utility numbers for each of the three possible outcomes, we now
have every bit of information we need to rank all gambles involving them. Consider, for
instance,
Which of these will the individual prefer? Assuming that his preferences over gambles sat-
isfy G1 through G6, we may appeal to Theorem 2.7. It tells us that we need only calculate
the expected utility of each gamble, using the utility numbers generated in (E.1) through
(E.3), to find out. Doing that, we find
Because g1 has the greater expected utility, it must be the preferred gamble! In similar
fashion, using only the utility numbers generated in (E.1) through (E.3), we can rank any
of the infinite number of gambles that could be constructed from the three outcomes in A.
Just think some more about the information we have uncovered in this example.
Look again at the answer given when asked to compare $4 with certainty to the best–worst
gamble in (E.2). The best–worst gamble g offered there has an expected value of E(g) =
(.6)($10)+ (.4)(−$2) = $5.2. This exceeds the expected value $4 he obtains under the
simple gamble offering $4 with certainty, yet the individual is indifferent between these
TOPICS IN CONSUMER THEORY 107
two gambles. Because we assume that his preferences are monotonic, we can immediately
conclude that he would strictly prefer the same $4 with certainty to every best–worst gam-
ble offering the best outcome with probability less than .6. This of course includes the one
offering $10 and −$2 with equal probabilities of .5, even though that gamble and $4 with
certainty have the same expected value of $4. Thus, in some sense, this individual prefers
to avoid risk. This same tendency is reflected in his ranking of g1 and g2 in (E.4) and (E.5),
as well. There he prefers g1 to g2 , even though the former’s expected value, E(g1 ) = $8.80,
is less than the latter’s, E(g2 ) = $8.98. Here, g2 is avoided because, unlike g1 , it includes
too much risk of the worst outcome. Later, we will get more precise about risk avoidance
and its measurement, but this example should help you see that a VNM utility function
summarises important aspects about an individual’s willingness to take risks.
Let us step back a moment to consider what this VNM utility function really does
and how it relates to the ordinary utility function under certainty. In the standard case, if
the individual is indifferent between two commodity bundles, both receive the same utility
number, whereas if one bundle is strictly preferred to another, its utility number must be
larger. This is true, too, of the VNM utility function u(g), although we must substitute the
word ‘gamble’ for ‘commodity bundle’.
However, in the consumer theory case, the utility numbers themselves have only
ordinal meaning. Any strictly monotonic transformation of one utility representation
yields another one. On the other hand, the utility numbers associated with a VNM utility
representation of preferences over gambles have content beyond ordinality.
To see this, suppose that A = {a, b, c}, where a b c, and that satisfies G1
through G6. By G3 and G4, there is an α ∈ (0, 1) satisfying
b ∼ (α ◦ a, (1 − α) ◦ c).
Note well that the probability number α is determined by, and is a reflection of,
the decision maker’s preferences. It is a meaningful number. One cannot double it, add a
constant to it, or transform it in any way without also changing the preferences with which
it is associated.
Now, let u be some VNM utility representation of . Then the preceding indifference
relation implies that
u(b) = u(α ◦ a, (1 − α) ◦ c)
= αu(a) + (1 − α)u(c),
where the second equality follows from the expected utility property of u. But this equality
can be rearranged to yield
determined by the decision maker’s preferences, so, too, then is the preceding ratio of
utility differences.
We conclude that the ratio of utility differences has inherent meaning regarding
the individual’s preferences and they must take on the same value for every VNM util-
ity representation of . Therefore, VNM utility representations provide distinctly more
than ordinal information about the decision maker’s preferences, for otherwise, through
suitable monotone transformations, such ratios could assume many different values.
Clearly, then, a strictly increasing transformation of a VNM utility representation
might not yield another VNM utility representation. (Of course, it still yields a utility
representation, but that representation need not have the expected utility property.) This
then raises the following question: what is the class of VNM utility representations of
a given preference ordering? From the earlier considerations, these must preserve the
ratios of utility differences. As the next result shows, this property provides a complete
characterisation.
THEOREM 2.8 VNM Utility Functions are Unique up to Positive Affine Transformations
Suppose that the VNM utility function u(·) represents . Then the VNM utility function,
v(·), represents those same preferences if and only if for some scalar α and some scalar
β > 0,
v(g) = α + βu(g),
Proof: Sufficiency is obvious (but do convince yourself), so we only prove necessity here.
Moreover, we shall suppose that g is a simple gamble. You are asked to show that if u and
v are linearly related for all simple gambles, then they are linearly related for all gambles.
As before, we let
where a1 · · · an , and a1 an .
Because u(·) represents , we have u(a1 ) ≥ · · · ≥ u(ai ) ≥ · · · ≥ u(an ), and u(a1 ) >
u(an ). So, for every i = 1, . . . , n, there is a unique αi ∈ [0, 1] such that
u(ai ) = u(αi ◦ a1 , (1 − αi ) ◦ an ),
TOPICS IN CONSUMER THEORY 109
ai ∼ (αi ◦ a1 , (1 − αi ) ◦ an ). (P.2)
v(ai ) = v(αi ◦ a1 , (1 − αi ) ◦ an ).
And, because v(·) has the expected utility property, this implies that
whenever ai an . However, (P.5) holds even when ai ∼ an because in this case u(ai ) =
u(an ) and v(ai ) = v(an ). Hence, (P.5) holds for all i = 1, . . . , n.
Rearranging, (P.5) can be expressed in the form
where
u(a1 )v(an ) − v(a1 )u(an ) v(a1 ) − v(an )
α≡ and β≡ .
u(a1 ) − u(an ) u(a1 ) − u(an )
Notice that both α and β are constants (i.e., independent of i), and that β is strictly positive.
So, for any gamble g, if (p1 ◦ a1 , . . . , pn ◦ an ) is the simple gamble induced by g,
then
n
v(g) = pi v(ai )
i=1
n
= pi (α + βu(ai ))
i=1
n
= α+β pi u(ai )
i=1
= α + βu(g),
110 CHAPTER 2
where the first and last equalities follow because v(·) and u(·) have the expected utility
property and the second equality follows from (P.6).
Just before the statement of Theorem 2.8, we stated that the class of VNM utility
representations of a single preference relation is characterised by the constancy of ratios
of utility differences. This in fact follows from Theorem 2.8, as you are asked to show in
an exercise.
Theorem 2.8 tells us that VNM utility functions are not completely unique, nor are
they entirely ordinal. We can still find an infinite number of them that will rank gambles in
precisely the same order and also possess the expected utility property. But unlike ordinary
utility functions from which we demand only an order-preserving numerical scaling, here
we must limit ourselves to transformations that multiply by a positive number and/or add
a constant term if we are to preserve the expected utility property as well. Yet the less
than complete ordinality of the VNM utility function must not tempt us into attaching
undue significance to the absolute level of a gamble’s utility, or to the difference in utility
between one gamble and another. With what little we have required of the agent’s binary
comparisons between gambles in the underlying preference relation, we still cannot use
VNM utility functions for interpersonal comparisons of well-being, nor can we measure
the ‘intensity’ with which one gamble is preferred to another.
6 With this framework, it is possible to prove an expected utility theorem along the lines of Theorem 2.7 by
suitably modifying the axioms to take care of the fact that A is no longer a finite set.
TOPICS IN CONSUMER THEORY 111
alternatives as follows:
n
u(g) = pi u(wi ),
i=1
n
u(E(g)) = u pi wi .
i=1
The first of these is the VNM utility of the gamble, and the second is the VNM utility of
the gamble’s expected value. If preferences satisfy Axioms G1 to G6, we know the agent
prefers the alternative with the higher expected utility. When someone would rather receive
the expected value of a gamble with certainty than face the risk inherent in the gamble
itself, we say they are risk averse. Of course, people may exhibit a complete disregard of
risk, or even an attraction to risk, and still be consistent with Axioms G1 through G6. We
catalogue these various possibilities, and define terms precisely, in what follows.
As remarked after Definition 2.3, a VNM utility function on G is completely deter-
mined by the values it assumes on the set of outcomes, A. Consequently, the characteristics
of an individual’s VNM utility function over the set of simple gambles alone provides a
complete description of the individual’s preferences over all gambles. Because of this, it is
enough to focus on the behaviour of u on GS to capture an individual’s attitudes towards
risk. This, and the preceding discussion, motivate the following definition.
Each of these attitudes toward risk is equivalent to a particular property of the VNM
utility function. In the exercises, you are asked to show that the agent is risk averse, risk
neutral, or risk loving over some subset of gambles if and only if his VNM utility function
is strictly concave, linear, or strictly convex, respectively, over the appropriate domain of
wealth.
7 A simple gamble is non-degenerate if it assigns strictly positive probability to at least two distinct wealth levels.
112 CHAPTER 2
To help see the first of these claims, let us consider a simple gamble involving two
outcomes:
g ≡ (p ◦ w1 , (1 − p) ◦ w2 ).
Now suppose the individual is offered a choice between receiving wealth equal to
E(g) = pw1 + (1 − p)w2 with certainty or receiving the gamble g itself. We can assess
the alternatives as follows:
Now look at Fig. 2.6. There we have drawn a chord between the two points
R = (w1 , u(w1 )) and S = (w2 , u(w2 )), and located their convex combination, T = pR +
(1 − p)S. The abscissa of T must be E(g) and its ordinate must be u(g). (Convince your-
self of this.) We can then locate u(E(g)) on the vertical axis using the graph of the function
u(w) as indicated. The VNM utility function in Fig. 2.6 has been drawn strictly concave in
wealth over the relevant region. As you can see, u(E(g)) > u(g), so the individual is risk
averse.
In Fig. 2.6, the individual prefers E(g) with certainty to the gamble g itself. But there
will be some amount of wealth we could offer with certainty that would make him indif-
ferent between accepting that wealth with certainty and facing the gamble g. We call this
amount of wealth the certainty equivalent of the gamble g. When a person is risk averse
and strictly prefers more money to less, it is easy to show that the certainty equivalent is
less than the expected value of the gamble, and you are asked to do this in the exercises.
In effect, a risk-averse person will ‘pay’ some positive amount of wealth to avoid the gam-
ble’s inherent risk. This willingness to pay to avoid risk is measured by the risk premium.
S
u(w)
u(E(g))
u(g)
T
R
w
w1 CE E(g) w2
Figure 2.6. Risk aversion and strict concavity of the VNM utility
function.
TOPICS IN CONSUMER THEORY 113
The certainty equivalent and the risk premium, both illustrated in Fig. 2.6, are defined in
what follows.
EXAMPLE 2.5 Suppose u(w) ≡ ln(w). Because this is strictly concave in wealth, the indi-
vidual is risk averse. Let g offer 50–50 odds of winning or losing some amount of wealth,
h, so that if the individual’s initial wealth is w0 ,
where we note that E(g) = w0 . The certainty equivalent for g must satisfy
1/2
ln(CE) = (1/2) ln(w0 + h) + (1/2) ln(w0 − h) = ln w20 − h2 .
1/2 1/2
Thus, CE= w20 − h2 < E(g) and P = w0 − w20 − h2 > 0.
Many times, we not only want to know whether someone is risk averse, but also
how risk averse they are. Ideally, we would like a summary measure that allows us both
to compare the degree of risk aversion across individuals and to gauge how the degree of
risk aversion for a single individual might vary with the level of their wealth. Because risk
aversion and concavity of the VNM utility function in wealth are equivalent, the seemingly
most natural candidate for such a measure would be the second derivative, u (w), a basic
measure of a function’s ‘curvature’. We might think that the greater the absolute value of
this derivative, the greater the degree of risk aversion.
But this will not do. Although the sign of the second derivative does tell us whether
the individual is risk averse, risk loving, or risk neutral, its size is entirely arbitrary.
Theorem 2.8 showed that VNM utility functions are unique up to affine transformations.
This means that for any given preferences, we can obtain virtually any size second deriva-
tive we wish through multiplication of u(·) by a properly chosen positive constant. With
this and other considerations in mind, Arrow (1970) and Pratt (1964) have proposed the
following measure of risk aversion.
−u (w)
Ra (w) ≡ .
u (w)
114 CHAPTER 2
Note that the sign of this measure immediately tells us the basic attitude towards
risk: Ra (w) is positive, negative, or zero as the agent is risk averse, risk loving, or risk
neutral, respectively. In addition, any positive affine transformation of utility will leave the
measure unchanged: adding a constant affects neither the numerator nor the denominator;
multiplication by a positive constant affects both numerator and denominator but leaves
their ratio unchanged.
To demonstrate the effectiveness of the Arrow-Pratt measure of risk aversion, we
now show that consumers with larger Arrow-Pratt measures are indeed more risk averse in
a behaviourally significant respect: they have lower certainty equivalents and are willing
to accept fewer gambles.
To see this, suppose there are two consumers, 1 and 2, and that consumer 1 has VNM
utility function u(w), and consumer 2’s VNM utility function is v(w). Wealth, w, can take
on any non-negative number. Let us now suppose that at every wealth level, w, consumer
1’s Arrow-Pratt measure of risk aversion is larger than consumer 2’s. That is,
u (w) v (w)
R1a (w) = − > − = R2a (w) for all w ≥ 0, (2.12)
u (w) v (w)
where we are assuming that both u and v are always strictly positive.
For simplicity, assume that v(w) takes on all values in [0, ∞). Consequently, we may
define h : [0, ∞)−→R as follows:
u (v−1 (x))
h (x) = > 0, and
v (v−1 (x))
u (v−1 (x))[u (v−1 (x))/u (v−1 (x)) − v (v−1 (x))/v (v−1 (x))]
h (x) = <0
[v (v−1 (x))]2
for all x > 0, where the first inequality follows because u , v > 0, and the second follows
from (2.12). Therefore, h is a strictly increasing, strictly concave function.
Consider now a gamble (p1 ◦ w1 , . . . , pn ◦ wn ) over wealth levels. We can use (2.13)
and the fact that h is strictly concave to show that consumer 1’s certainty equivalent for
this gamble is lower than consumer 2’s.
To see this, let ŵi denote consumer i’s certainty equivalent for the gamble. That is,
n
pi u(wi ) = u(ŵ1 ), (2.14)
i=1
n
pi v(wi ) = v(ŵ2 ). (2.15)
i=1
TOPICS IN CONSUMER THEORY 115
n
u(ŵ1 ) = pi h(v(wi ))
i=1
n
<h pi v(wi )
i=1
= h(v(ŵ2 ))
= u(ŵ2 ),
where the inequality, called Jensen’s inequality, follows because h is strictly concave,
and the final two equalities follow from (2.15) and (2.13), respectively. Consequently,
u(ŵ1 ) < u(ŵ2 ), so that because u is strictly increasing, ŵ1 < ŵ2 as desired.
We may conclude that consumer 1’s certainty equivalent for any given gamble is
lower than 2’s. And from this it easily follows that if consumers 1 and 2 have the same
initial wealth, then consumer 2 (the one with the globally lower Arrow-Pratt measure)
will accept any gamble that consumer 1 will accept. (Convince yourself of this.) That is,
consumer 1 is willing to accept fewer gambles than consumer 2.
Finally, note that in passing, we have also shown that (2.12) implies that consumer
1’s VNM utility function is more concave than consumer 2’s in the sense that (once again
putting x = v(w) in (2.13))
EXAMPLE 2.6 Consider an investor who must decide how much of his initial wealth w
to put into a risky asset. The risky asset can have any of the positive or negative rates of
return ri with probabilities pi , i = 1, . . . , n. If β is the amount of wealth to be put into
the risky asset, final wealth under outcome i will be (w − β) + (1 + ri )β = w + βri . The
investor’s problem is to choose β to maximise the expected utility of wealth. We can write
this formally as the single-variable optimisation problem
n
max pi u(w + βri ) s. t. 0 ≤ β ≤ w. (E.1)
β
i=1
We first determine under what conditions a risk-averse investor will decide to put
no wealth into the risky asset. In this case, we would have a corner solution where the
objective function in (E.1) reaches a maximum at β ∗ = 0, so its first derivative must
be non-increasing there. Differentiating expected utility in (E.1) with respect to β, then
evaluating at β ∗ = 0, we therefore must have
n
n
pi u (w + β ∗ ri )ri = u (w) pi ri ≤ 0.
i=1 i=1
The sum on the right-hand side is just the expected return on the risky asset. Because u (w)
must be positive, the expected return must be non-positive. Because you can easily verify
that the concavity of u in wealth is sufficient to ensure the concavity of (E.1) in β, we
conclude that a risk-averse individual will abstain completely from the risky asset if and
only if that asset has a non-positive expected return. Alternatively, we can say that a risk-
averse investor will always prefer to place some wealth into a risky asset with a strictly
positive expected return.
Now assume that the risky asset has a positive expected return. As we have seen, this
means we can rule out β ∗ = 0. Let us also suppose that β ∗ < w. The first- and second-
order conditions for an interior maximum of (E.1) tell us that
n
pi u (w + β ∗ ri )ri = 0 (E.2)
i=1
and
n
pi u (w + β ∗ ri )ri2 < 0, (E.3)
i=1
‘inferior’ goods. We will show that this is so under DARA. Viewing β ∗ as a function of w,
differentiating (E.2) with respect to w, we find that
n
− pi u (w + β ∗ ri )ri
dβ ∗ i=1
= . (E.4)
dw
n
pi u (w + β ∗ ri )ri2
i=1
Risk aversion ensures that the denominator in (E.4) will be negative, so risky assets will
be ‘normal’ only when the numerator is also negative. DARA is sufficient to ensure this.
To see this, note that the definition of Ra (w + β ∗ ri ) implies
Under DARA, Ra (w) > Ra (w + β ∗ ri ) whenever ri > 0, and Ra (w) < Ra (w + β ∗ ri ) when-
ever ri < 0. Multiplying both sides of these inequalities by ri , we obtain in both
cases,
n
n
− pi u (w + β ∗ ri )ri < Ra (w) pi ri u (w + β ∗ ri ) = 0, (E.7)
i=1 i=1
EXAMPLE 2.7 A risk-averse individual with initial wealth w0 and VNM utility function
u(·) must decide whether and for how much to insure his car. The probability that he
will have an accident and incur a dollar loss of L in damages is α ∈ (0, 1). How much
insurance, x, should he purchase?
Of course, the answer depends on the price at which insurance is available. Let us
suppose that insurance is available at an actuarially fair price, i.e., one that yields insurance
companies zero expected profits. Now, if ρ denotes the rate at which each dollar of insur-
ance can be purchased, the insurance company’s expected profits per dollar of insurance
sold (assuming zero costs) will be α(ρ − 1) + (1 − α)ρ. Setting this equal to zero implies
that ρ = α.
118 CHAPTER 2
So, with the price per dollar of insurance equal to α, how much insurance should
our risk-averse individual purchase? Because he is an expected utility maximiser, he will
choose that amount of insurance, x, to maximise his expected utility,
Differentiating (E.1) with respect to x and setting the result to zero yields
But because the individual is risk averse, u < 0, so that the marginal utility of wealth is
strictly decreasing in wealth. Consequently, equality of the preceding marginal utilities of
wealth implies equality of the wealth levels themselves, i.e.,
w0 − αx − L + x = w0 − αx,
x = L.
2.5 EXERCISES
2.1 Show that budget balancedness and homogeneity of x(p, y) are unrelated conditions in the sense that
neither implies the other.
2.2 Suppose that x(p, y) ∈ Rn+ satisfies budget balancedness and homogeneity on Rn+1 ++ . Show that for
all (p, y) ∈ Rn+1
++ , s(p, y) · p = 0, where s(p, y) denotes the Slutsky matrix associated with x(p, y).
2.3 Derive the consumer’s direct utility function if his indirect utility function has the form v(p, y) =
β
ypα1 p2 for negative α and β.
2.4 Suppose that the function e(p, u) ∈ R+ , not necessarily an expenditure function, and x(p, y) ∈ Rn+ ,
not necessarily a demand function, satisfy the system of partial differential equations given in
Section 2.2. Show the following:
(a) If x(p, y) satisfies budget balancedness, then e(p, u) must be homogeneous of degree one in p.
(b) If e(p, u) is homogeneous of degree one in p and for each p, it assumes every non-negative value
as u varies, then x(p, y) must be homogeneous of degree zero in (p, y).
TOPICS IN CONSUMER THEORY 119
α
2.5 Consider the solution, e(p, u) = upα1 1 pα2 2 p3 3 at the end of Example 2.3.
(a) Derive the indirect utility function through the relation e(p, v(p, y)) = y and verify Roy’s
identity.
(b) Use the construction given in the proof of Theorem 2.1 to recover a utility function generat-
ing e(p, u). Show that the utility function you derive generates the demand functions given in
Example 2.3.
2.6 A consumer has expenditure function e(p1 , p2 , u) = up1 p2 /(p1 + p2 ). Find a direct utility function,
u(x1 , x2 ), that rationalises this person’s demand behaviour.
2.7 Derive the consumer’s inverse demand functions, p1 (x1 , x2 ) and p2 (x1 , x2 ), when the utility function
is of the Cobb-Douglas form, u(x1 , x2 ) = Ax1α x21−α for 0 < α < 1.
2.8 The consumer buys bundle xi at prices pi , i = 0, 1. Separately for parts (a) to (d), state whether these
indicated choices satisfy WARP:
(a) p0 = (1, 3), x0 = (4, 2); p1 = (3, 5), x1 = (3, 1).
(b) p0 = (1, 6), x0 = (10, 5); p1 = (3, 5), x1 = (8, 4).
(c) p0 = (1, 2), x0 = (3, 1); p1 = (2, 2), x1 = (1, 2).
(d) p0 = (2, 6), x0 = (20, 10); p1 = (3, 5), x1 = (18, 4).
2.9 Suppose there are only two goods and that a consumer’s choice function x(p, y) satisfies budget
balancedness, p · x(p, y) = y ∀ (p, y). Show the following:
(a) If x(p, y) is homogeneous of degree zero in (p, y), then the Slutsky matrix associated with x(p, y)
is symmetric.
(b) If x(p, y) satisfies WARP, then the ‘revealed preferred to’ relation, R, has no intransitive cycles.
(By definition, x1 Rx2 if and only if x1 is revealed preferred to x2 .)
2.10 Hicks (1956) offered the following example to demonstrate how WARP can fail to result in transitive
revealed preferences when there are more than two goods. The consumer chooses bundle xi at prices
pi , i = 0, 1, 2, where
⎛ ⎞ ⎛ ⎞
1 5
p0 = ⎝1 ⎠ x0 = ⎝19 ⎠
2 9
⎛ ⎞ ⎛ ⎞
1 12
p1 = ⎝1 ⎠ x1 = ⎝12 ⎠
1 12
⎛ ⎞ ⎛ ⎞
1 27
p2 = ⎝2 ⎠ x2 = ⎝11 ⎠ .
1 1
(a) Show that these data satisfy WARP. Do it by considering all possible pairwise comparisons of
the bundles and showing that in each case, one bundle in the pair is revealed preferred to the
other.
(b) Find the intransitivity in the revealed preferences.
120 CHAPTER 2
2.11 Show that if a consumer chooses bundles to maximise a strictly quasiconcave and strictly increasing
utility function, his demand behaviour satisfies SARP.
2.12 This exercise guides you through a proof of a simplified version of Afriat’s Theorem. Suppose that
a consumer is observed to demand bundle x1 when the price vector is p1 , and bundle x2 when the
price vector is p2 , . . ., and bundle xK when the price vector is pK . This produces the finite data set
D = {(x1 , p1 ), (x2 , p2 ), . . . , (xK , pK )}. We say that the consumer’s choice behaviour satisfies GARP
on the finite data set D if for every finite sequence, (xk1 , pk1 ), (xk2 , pk2 ) . . . , (xkm , pkm ), of points in
D, if pk1 · xk1 ≥ pk1 · xk2 , pk2 · xk2 ≥ pk2 · xk3 , . . . , pkm−1 · xkm−1 ≥ pkm−1 · xkm , then pkm · xkm ≤
pkm · xk1 .
In other words, GARP holds if whenever xk1 is revealed preferred to xk2 , and xk2 is revealed
preferred to xk3 , . . . , and xkm−1 is revealed preferred to xkm , then xk1 is at least as expensive as
xkm when xkm is chosen. (Note that SARP is stronger, requiring that xk1 be strictly more expensive
than xkm .)
Assume throughout this question that the consumer’s choice behaviour satisfies GARP on the
data set D = {(x1 , p1 ), (x2 , p2 ), . . . , (xK , pK )} and that pk ∈ Rn++ for every k = 1, . . . , K.
For each k = 1, 2, . . . , n, define
φ(xk ) = min pk1 · (xk2 − xk1 ) + pk2 · (xk3 − xk2 ) + . . . + pkm · (xk − xkm ),
k1 ,...,km
where the minimum is taken over all sequences k1 , . . . , km of distinct elements of {1, 2, . . . , K}
such that pkj · (xkj+1 − xkj ) ≤ 0 for every j = 1, 2, . . . , m − 1, and such that pkm · (xk − xkm ) ≤ 0.
Note that at least one such sequence always exists, namely the ‘sequence’ consisting of one number,
k1 = k. Note also that there are only finitely many such sequences because their elements are distinct.
Hence, the minimum above always exists.
(a) Prove that for all k, j ∈ {1, . . . , K}, φ(xk ) ≤ φ(xj ) + pj · (xk − xj ) whenever pj · xk ≤ pj · xj .
We next use the non-positive function φ(·) to define a utility function u : Rn+ → R.
For every x ∈ Rn+ such that pk · (x − xk ) ≤ 0 for at least one k ∈ {1, . . . , K}, define u(x) ≤ 0
as follows:
u(x) = x1 + . . . + xn .
maximises the consumer’s utility among all bundles that are no more expensive than the chosen
bundle at the prices at which it was chosen. (Afriat’s Theorem proves that utility function can, in
addition, be chosen to be continuous and concave.)
(f) Prove a converse. That is, suppose that a strictly increasing utility function rationalises a finite
data set. Prove that the consumer’s behaviour satisfies GARP on that data set.
(a) Suppose that a choice function x(p, y) ∈ Rn+ is homogeneous of degree zero in (p, y). Show that
WARP is satisfied ∀ (p, y) iff it is satisfied on {(p, 1) | p ∈ Rn++ }.
(b) Suppose that a choice function x(p, y) ∈ Rn+ satisfies homogeneity and budget balancedness.
Suppose further that whenever p1 is not proportional to p0 , we have (p1 )T s(p0 , y)p1 < 0. Show
that x(p, y) satisfies WARP.
2.14 Consider the problem of insuring an asset against theft. The value of the asset is $D, the insurance
cost is $I per year, and the probability of theft is p. List the four outcomes in the set A associated
with this risky situation. Characterise the choice between insurance and no insurance as a choice
between two gambles, each involving all four outcomes in A, where the gambles differ only in the
probabilities assigned to each outcome.
2.15 We have assumed that an outcome set A has a finite number of elements, n. Show that as long as
n ≥ 2, the space G will always contain an infinite number of gambles.
2.16 Using Axioms G1 and G2, prove that at least one best and at least one worst outcome must exist in
any finite set of outcomes, A = {a1 , . . . , an } whenever n ≥ 1.
2.17 Let A = {a1 , a2 , a3 }, where a1 a2 a3 . The gamble g offers a2 with certainty. Prove that if g ∼
(α ◦ a1 , (1 − α) ◦ a3 ), the α must be strictly between zero and 1.
2.18 In the text, it was asserted that, to a safari hunter, death may be the worst outcome of an outing, yet
an outing with the possibility of death is preferred to one where death is impossible. Characterise
the outcome set associated with a hunter’s choice of outings, and prove this behaviour violates the
combined implications of Axioms G3 and G4.
2.19 Axiom G3 asserts the existence of an indifference probability for any gamble in G . For a given
gamble g ∈ G , prove that the indifference probability is unique using G4.
2.20 Consider the following ‘Independence Axiom’ on a consumer’s preferences, , over gambles: If
(p1 ◦ a1 , . . . , pn ◦ an ) ∼ (q1 ◦ a1 , . . . , qn ◦ an ),
then for every α ∈ [0, 1], and every simple gamble (r1 ◦ a1 , . . . , rn ◦ an ),
(Note this axiom says that when we combine each of two gambles with a third in the same way, the
individual’s ranking of the two new gambles is independent of which third gamble we used.) Show
that this axiom follows from Axioms G5 and G6.
122 CHAPTER 2
2.21 Using the definition of risk aversion given in the text, prove that an individual is risk averse over
gambles involving non-negative wealth levels if and only if his VNM utility function is strictly
concave on R+ .
2.22 Suppose that is a binary relation over gambles in G satisfying Axioms G2, G3, and G4. Show that
satisfies G1 as well.
2.23 Let u and v be utility functions (not necessarily VNM) representing on G . Show that v is a positive
affine transformation of u if and only if for all gambles g1 , g2 , g3 ∈ G , with no two indifferent, we
have
2.24 Reconsider Example 2.7 and show that the individual will less than fully insure if the price per unit
of insurance, ρ, exceeds the probability of incurring an accident, α.
2.25 Consider the quadratic VNM utility function U(w) = a + bw + cw2 .
(a) What restrictions if any must be placed on parameters a, b, and c for this function to display risk
aversion?
(b) Over what domain of wealth can a quadratic VNM utility function be defined?
(c) Given the gamble
2.31 Prove that for any VNM utility function, the condition u (w) > 0 is necessary but not sufficient for
DARA.
2.32 If a VNM utility function displays constant absolute risk aversion, so that Ra (w) = α for all w, what
functional form must it have?
2.33 Suppose a consumer’s preferences over wealth gambles can be represented by a twice differentiable
VNM utility function. Show that the consumer’s preferences over gambles are independent of his
initial wealth if and only if his utility function displays constant absolute risk aversion.
2.34 Another measure of risk aversion offered by Arrow and Pratt is their relative risk aversion mea-
sure, Rr (w) ≡ Ra (w)w. In what sense is Rr (w) an ‘elasticity’? If u(w) displays constant relative risk
aversion, what functional form must it have?
2.35 An investor must decide how much of initial wealth w to allocate to a risky asset with unknown rate
of return r, where each outcome ri occurs with probability pi , i = 1, . . . , n. Using the framework of
Example 2.6, prove that if the investor’s preferences display increasing absolute risk aversion, the
risky asset must be an ‘inferior’ good.
2.36 Let Si be the set of all probabilities of winning such that individual i will accept a gamble of winning
or losing a small amount of wealth, h. Show that for any two individuals i and j, where Ria (w) >
j
Ra (w), it must be that Si ⊂Sj . Conclude that the more risk averse the individual, the smaller the set
of gambles he will accept.
2.37 An infinitely lived agent must choose his lifetime consumption plan. Let xt denote consumption
spending in period t, yt denote income expected in period t, and r > 0, the market rate of interest
at which the agent can freely borrow or lend. The agent’s intertemporal utility function takes the
additively separable form
∞
u∗ (x0 , x1 , x2 , . . .) = β t u(xt ),
t=0
where u(x) is increasing and strictly concave, and 0 < β < 1. The intertemporal budget constraint
requires that the present value of expenditures not exceed the present value of income:
∞
t ∞
t
1 1
xt ≤ yt .
1+r 1+r
t=0 t=0
(a) If y0 = 1, y1 = 1, and β = 1/(1 + r), solve for optimal consumption in each period and
calculate the level of lifetime utility the agent achieves.
Suppose, now, that the agent again knows that income in the initial period will be y0 = 1.
However, there is uncertainty about what next period’s income will be. It could be high, yH 1 =
3/2; or it could be low, yL1 = 1/2. He knows it will be high with probability 1/2. His problem
now is to choose the initial period consumption, x0 ; the future consumption if income is high, x1H ;
and the future consumption if income is low, x1L , to maximise (intertemporal) expected utility.
(b) Again, assuming that β = 1/(1 + r), formulate the agent’s optimisation problem and solve for
the optimal consumption plan and the level of lifetime utility.
(c) How do you account for any difference or similarity in your answers to parts (a) and (b)?
CHAPTER 3
THEORY OF THE FIRM
The second important actor on the microeconomic stage is the individual firm. We begin
this chapter with aspects of production and cost that are common to all firms. Then we
consider the behaviour of perfectly competitive firms – a very special but very important
class. You will see we can now move rather quickly through much of this material because
there are many formal similarities between producer theory and the consumer theory we
just completed.
There are good reasons for this tenacity. From an empirical point of view, assuming
firms profit maximise leads to predictions of firm behaviour which are time and again
borne out by the evidence. From a theoretical point of view, there is first the virtue of
simplicity and consistency with the hypothesis of self-interested utility maximisation on
the part of consumers. Also, many alternative hypotheses, such as sales or market-share
maximisation, may be better viewed as short-run tactics in a long-run, profit-maximising
strategy, rather than as ultimate objectives in themselves. Finally, there are identifiable
market forces that coerce the firm towards profit maximisation even if its owners or
managers are not themselves innately inclined in that direction. Suppose that some
firm did not maximise profit. Then if the fault lies with the managers, and if at least a
working majority of the firm’s owners are non-satiated consumers, those owners have
a clear common interest in ridding themselves of that management and replacing it with a
profit-maximising one. If the fault lies with the owners, then there is an obvious incentive
for any non-satiated entrepreneur outside the firm to acquire it and change its ways.
Like the hypothesis of utility maximisation for consumers, profit maximisation is
the single most robust and compelling assumption we can make as we begin to examine
and ultimately predict firm behaviour. In any choice the firm must make, we therefore
will always suppose its decision is guided by the objective of profit maximisation. Which
course of action best serves that goal will depend on the circumstances the firm faces –
first, with respect to what is technologically possible; second, with respect to conditions
on its input markets; and, finally, with respect to conditions on its product market. Clear
thinking on firm behaviour will depend on carefully distinguishing between the firm’s
objective, which always remains the same, and its constraints, which are varied and depend
on market realities beyond its control.
3.2 PRODUCTION
Production is the process of transforming inputs into outputs. The fundamental reality
firms must contend with in this process is technological feasibility. The state of tech-
nology determines and restricts what is possible in combining inputs to produce output,
and there are several ways we can represent this constraint. The most general way is to
think of the firm as having a production possibility set, Y ⊂ Rm , where each vector
y = (y1 , . . . , ym ) ∈ Y is a production plan whose components indicate the amounts of
the various inputs and outputs. A common convention is to write elements of y ∈ Y so that
yi < 0 if resource i is used up in the production plan, and yi > 0 if resource i is produced
in the production plan.
The production possibility set is by far the most general way to characterise the firm’s
technology because it allows for multiple inputs and multiple outputs. Often, however, we
will want to consider firms producing only a single product from many inputs. For that, it
is more convenient to describe the firm’s technology in terms of a production function.
When there is only one output produced by many inputs, we shall denote the amount
of output by y, and the amount of input i by xi , so that with n inputs, the entire vector of
THEORY OF THE FIRM 127
inputs is denoted by x = (x1 , . . . , xn ). Of course, the input vector as well as the amount of
output must be non-negative, so we require x ≥ 0, and y ≥ 0.
A production function simply describes for each vector of inputs the amount of out-
put that can be produced. The production function, f , is therefore a mapping from Rn+
into R+ . When we write y = f (x), we mean that y units of output (and no more) can be
produced using the input vector x. We shall maintain the following assumption on the
production function f .1
Continuity of f ensures that small changes in the vector of inputs lead to small
changes in the amount of output produced. We require f to be strictly increasing to ensure
that employing strictly more of every input results in strictly more output. The strict quasi-
concavity of f is assumed largely for reasons of simplicity. Similar to our assumption that
consumer’s preferences were strictly convex (so that u is strictly quasiconcave), we could
do without it here without much change in the results we will present. Nonetheless, we
can interpret the meaning of it. One interpretation is that strict quasiconcavity implies the
presence of at least some complementarities in production. Intuitively, two inputs, labour
and capital say, are to some degree complementary if very little production can take place
if one of the inputs is low, even if the other input is high. In this sense, both inputs together
are important for production. In such a situation, the average of two extreme production
vectors, one with high labour and low capital and the other with low labour and high capi-
tal, will produce strictly more output than at least one of the two extreme input vectors, and
perhaps even both. The assumption of strict quasiconcavity extends this idea to strict aver-
ages of all distinct pairs of input vectors. The last condition states that a positive amount
of output requires positive amounts of some of the inputs.
When the production function is differentiable, its partial derivative, ∂f (x)/∂xi , is
called the marginal product of input i and gives the rate at which output changes per
additional unit of input i employed. If f is strictly increasing and everywhere continuously
differentiable, then ∂f (x)/∂xi > 0 for ‘almost all’ input vectors. We will often assume for
simplicity that the strict inequality always holds.
For any fixed level of output, y, the set of input vectors producing y units of output
is called the y-level isoquant. An isoquant is then just a level set of f . We shall denote this
set by Q(y). That is,
For an input vector x, the isoquant through x is the set of input vectors producing the
same output as x, namely, Q(f (x)).
x1
–∂ f (x1)/∂ x 1
Slope at x1 =
∂ f (x1)/∂ x 2 Q(f (x1))
∂ f (x1)/∂ x 1
MRTS12(x1) = slope at x1 =
∂ f (x1)/∂ x 2
x1
∂f (x)/∂xi
MRTSij (x) = .
∂f (x)/∂xj
In the two-input case, as depicted in Fig. 3.1, MRTS12 (x1 ) is the absolute value of the slope
of the isoquant through x1 at the point x1 .
In general, the MRTS between any two inputs depends on the amounts of all inputs
employed. However, it is quite common, particularly in empirical work, to suppose that
inputs can be classified into a relatively small number of types, with degrees of substi-
tutability between those of a given type differing systematically from the degree of substi-
tutability between those of different types. Production functions of this variety are called
separable, and there are at least two major forms of separability. In the following defini-
tion, we use fi (x) as a shorthand for the marginal product of input i, i.e., for ∂f (x)/∂xi .
where fi and fj are the marginal products of inputs i and j. When S > 2, the production
function is called strongly separable if the MRTS between two inputs from any two groups,
including from the same group, is independent of all inputs outside those two groups:
−1
d ln MRTSij (x(r))
σij (x ) ≡
0
0 0 ,
d ln r r=x /x
j i
where x(r) is the unique vector of inputs x = (x1 , . . . , xn ) such that (i) xj /xi = r, (ii) xk =
xk0 for k = i, j, and (iii) f (x) = f (x0 ).2
The elasticity of substitution σij (x0 ) is a measure of the curvature of the i-j isoquant
through x0 at x0 . When the production function is quasiconcave, the elasticity of substitu-
tion can never be negative, so σij ≥ 0. In general, the closer it is to zero, the more ‘difficult’
is substitution between the inputs; the larger it is, the ‘easier’ is substitution between them.
When there are only two inputs we will write σ rather than σ12 . Let us consider a few
two-input examples. In Fig. 3.2(a), the isoquant is linear and there is perfect substitutability
between the inputs. There, σ is infinite. In Fig. 3.2(c), the two inputs are productive only in
fixed proportions with one another – substitution between them is effectively impossible,
and σ is zero. In Fig. 3.2(b), we have illustrated an intermediate case where σ is neither
zero nor infinite, and the isoquants are neither straight lines nor right angles. In general,
the closer σ is to zero, the more L-shaped the isoquants are and the more ‘difficult’ substi-
tution between inputs; the larger σ is, the flatter the isoquants and the ‘easier’ substitution
between them.
x2 x2 x2
→ 0 0
Q(y)
Q(y)
Q(y) Q(y)
Q(y)
Q(y)
x1 x1 x1
(a) (b) (c)
Figure 3.2. (a) σ is infinite and there is perfect substitutability between inputs. (b) σ
is finite but larger than zero, indicating less than perfect substitutability. (c) σ is zero
and there is no substitutability between inputs.
EXAMPLE 3.1 We are familiar with the CES utility function from demand theory.
Perhaps it is time we see where this name comes from by considering the CES production
function,
ρ ρ 1/ρ
y = x1 + x2
for 0 = ρ < 1.
To calculate the elasticity of substitution, σ, note first that the marginal rate of
technical substitution at an arbitrary point (x1 , x2 ) is
1−ρ
x2
MRTS12 (x1 , x2 ) = .
x1
Hence, in this example the ratio of the two inputs alone determines MRTS, regardless of
the quantity of output produced. Consequently, setting r = x2 /x1 ,
1
σ = ,
1−ρ
which is a constant. This explains the initials CES, which stand for constant elasticity of
substitution.
THEORY OF THE FIRM 131
With the CES form, the degree of substitutability between inputs is always the same,
regardless of the level of output or input proportions. It is therefore a somewhat restrictive
characterisation of the technology. On the other hand, different values of the parameter
ρ, and so different values of the parameter σ , can be used to represent technologies with
vastly different (though everywhere constant) substitutability between inputs. The closer
ρ is to unity, the larger is σ ; when ρ is equal to 1, σ is infinite and the production function
is linear, with isoquants resembling those in Fig. 3.2(a).
Other popular production functions also can be seen as special cases of specific CES
forms. In particular, it is easy to verify that
n 1/ρ
n
ρ
y= αi xi , where αi = 1,
i=1 i=1
is a CES form with σij = 1/(1 − ρ) for all i = j. It can be shown that as ρ → 0, σij → 1,
and this CES form reduces to the linear homogeneous Cobb-Douglas form,
n
y= xiαi .
i=1
y = min{x1 , . . . , xn }
Proof: Suppose first that α = 1, i.e., that f is homogeneous of degree one. Take any x1
0
and x2
0 and let y1 = f (x1 ) and y2 = f (x2 ). Then y1 , y2 > 0 because f (0) = 0 and f is
strictly increasing. Therefore, because f is homogeneous of degree one,
x1 x2
f =f = 1.
y1 y2
132 CHAPTER 3
as desired. 1
Suppose now that f is homogeneous of degree α ∈ (0, 1]. Then f α is homogeneous
1
of degree one and satisfies
Assumption 3.1. Hence, by what we have just proven, f is
α
1 α
concave. But then f = f α is concave since α ≤ 1.
x2
Q(y4)
Q(y3)
Q(y2)
Q(y1)
Q(y0)
x1
O
increased or decreased proportionally. In the two-input case, the distinction between these
two attributes of the production function is best grasped by considering Fig. 3.3. Returns
to varying proportions concern how output behaves as we move through the isoquant map
along the horizontal at x̄2 , keeping x2 constant and varying the amount of x1 . Returns to
scale have to do with how output behaves as we move through the isoquant map along a
ray such as OA, where the levels of x1 and x2 are changed simultaneously, always staying
in the proportion x2 /x1 = α.
Elementary measures of returns to varying proportions include the marginal product,
MPi (x) ≡ fi (x), and the average product, APi (x) ≡ f (x)/xi , of each input. The output
elasticity of input i, measuring the percentage response of output to a 1 per cent change in
input i, is given by μi (x) ≡ fi (x)xi /f (x) = MPi (x)/APi (x). Each of these is a local mea-
sure, defined at a point. The scale properties of the technology may be defined either
locally or globally. A production function is said to have globally constant, increasing, or
decreasing returns to scale according to the following definitions.
measure of returns to scale. One such measure, defined at a point, tells us the instantaneous
percentage change in output that occurs with a 1 per cent increase in all inputs. It is vari-
ously known as the elasticity of scale or the (overall) elasticity of output, and is defined as
follows.
Returns to scale are locally constant, increasing, or decreasing as μ(x) is equal to, greater
than, or less than one. The elasticity of scale and the output elasticities of the inputs are
related as follows:
n
μ(x) = μi (x).
i=1
EXAMPLE 3.2 Let us examine a production function with variable returns to scale:
−β −1
y = k 1 + x1−α x2 , (E.1)
where α > 0, β > 0, and k is an upper bound on the level of output, so that 0 ≤ y < k.
Calculating the output elasticities for each input, we obtain
−β −1 −α −β
μ1 (x) = α 1 + x1−α x2 x1 x2 ,
−α −β −1 −α −β
μ2 (x) = β 1 + x1 x2 x1 x2 ,
each of which clearly varies with both scale and input proportions. Adding the two gives
the following expression for the elasticity of scale:
−β −1 −α −β
μ(x) = (α + β) 1 + x1−α x2 x1 x2 ,
−β k
x1−α x2 = − 1. (E.2)
y
THEORY OF THE FIRM 135
Here it is clear that returns to each input, and overall returns to scale, decline mono-
tonically as output increases. At y = 0, μ∗ (y) = (α + β) > 0, and as y → k, μ∗ (y) → 0.
If α + β > 1, the production function exhibits increasing returns to scale for low levels of
output, 0 ≤ y < k[1 − 1/(α + β)], locally constant returns at y = k[1 − 1/(α + β)], and
decreasing returns for high levels of output, k[1 − 1/(α + β)] < y < k.
3.3 COST
The firm’s cost of output is precisely the expenditure it must make to acquire the inputs
used to produce that output. In general, the technology will permit every level of output
to be produced by a variety of input vectors, and all such possibilities can be summarised
by the level sets of the production function. The firm must decide, therefore, which of the
possible production plans it will use. If the object of the firm is to maximise profits, it will
necessarily choose the least costly, or cost-minimising, production plan for every level of
output. Note this will be true for all firms, whether monopolists, perfect competitors, or
any thing between.
To determine the least costly method of production, the firm must consider the terms
at which it can acquire inputs as well as the technological possibilities in production. These
in turn depend on the circumstances it faces on its input markets. For example, the firm
may face upward-sloping supply curves for some or all of its inputs, where the more it
hires, the higher the per-unit price it must pay. Alternatively, the firm may be a small,
insignificant force on its input markets, and so be able to hire as much or as little as it
wants without affecting the prevailing market prices. In this case, we say that the firm is
perfectly competitive on its input markets, because it has no power individually to affect
prices on those markets. In either case, these circumstances must be taken into account in
the firm’s decisions.
We will assume throughout that firms are perfectly competitive on their input mar-
kets and that therefore they face fixed input prices. Let w = (w1 , . . . , wn ) ≥ 0 be a vector
of prevailing market prices at which the firm can buy inputs x = (x1 , . . . , xn ). Because the
firm is a profit maximiser, it will choose to produce some level of output while using that
input vector requiring the smallest money outlay. One can speak therefore of ‘the’ cost
136 CHAPTER 3
of output y – it will be the cost at prices w of the least costly vector of inputs capable of
producing y.
∂f (x∗ )
wi = λ∗ , i = 1, . . . , n.
∂xi
Because wi > 0, i = 1, . . . , n, we may divide the preceding ith equation by the jth to obtain
∂f (x∗ )/∂xi wi
∗
= . (3.2)
∂f (x )/∂xj wj
Thus, cost minimisation implies that the marginal rate of substitution between any two
inputs is equal to the ratio of their prices.
From the first-order conditions, it is clear the solution depends on the parameters
w and y. Moreover, because w
0 and f is strictly quasiconcave, the solution to (3.1) is
unique. In Exercise 3.18, you are asked to show this, as well as that (3.1) always possesses
a solution. So we can write x∗ ≡ x(w, y) to denote the vector of inputs minimising the cost
of producing y units of output at the input prices w. The solution x(w, y) is referred to as
the firm’s conditional input demand, because it is conditional on the level of output y,
which at this point is arbitrary and so may or may not be profit maximising.
The solution to the cost-minimisation problem is illustrated in Fig. 3.4. With two
inputs, an interior solution corresponds to a point of tangency between the y-level isoquant
THEORY OF THE FIRM 137
x2
Q(y)
c (w, y) /w2
x 1(w, y)
{ x | w • x }
{ x | w • x c (w, y) }
x1
x 1(w, y) c(w, y)/w1
and an isocost line of the form w · x = α for some α > 0. If x1 (w, y) and x2 (w, y) are
solutions, then c(w, y) = w1 x1 (w, y) + w2 x2 (w, y).
EXAMPLE 3.3 Suppose the firm’s technology is the two-input CES form. Its cost-
minimisation problem (3.1) is then
ρ ρ 1/ρ
min w1 x1 + w2 x2 s.t. x1 + x2 ≥ y.
x1 ≥0,x2 ≥0
Assuming y > 0 and an interior solution, the first-order Lagrangian conditions reduce to
the two conditions
ρ−1
w1 x1
= , (E.1)
w2 x2
ρ ρ 1/ρ
y = x1 + x2 . (E.2)
Solving this for x2 and performing similar calculations to solve for x1 , we obtain the
conditional input demands:
To obtain the cost function, we substitute the solutions (E.3) and (E.4) back into the
objective function for the minimisation problem. Doing that yields
You may have noticed some similarities here with consumer theory. These similari-
ties are in fact exact when one compares the cost function with the expenditure function.
Indeed, consider their definitions.
Mathematically, the two functions are identical. Consequently, for every theorem we
proved about expenditure functions, there is an equivalent theorem for cost functions. We
shall state these theorems here, but we do not need to prove them. Their proofs are identical
to those given for the expenditure function.
∂c(w0 , y0 )
= xi (w0 , y0 ), i = 1, . . . , n.
∂wi
EXAMPLE 3.4 Consider a cost function with the Cobb-Douglas form, c(w, y) =
β
Awα1 w2 y. From property 7 of Theorem 3.2, the conditional input demands are obtained
THEORY OF THE FIRM 139
∂c(w, y) β αc(w, y)
x1 (w, y) = = αAwα−1 w2 y = , (E.1)
∂w1 1 w1
∂c(w, y) β−1 βc(w, y)
x2 (w, y) = = βAwα1 w2 y = . (E.2)
∂w2 w2
x1 (w, y) α w2
= .
x2 (w, y) β w1
This tells us that the proportions in which a firm with this cost function will use its inputs
depend only on relative input prices and are completely independent of the level or scale
of output.
Now define the input share, si ≡ wi xi (w, y)/c(w, y) as the proportion of total expen-
diture spent by the firm on input i. From (E.1) and (E.2), these are always constant and
s1 = α,
s2 = β.
associated with these technologies have some special properties. Some of these are
collected in what follows.
THEOREM 3.4 Cost and Conditional Input Demands when Production is Homothetic
(a) the cost function is multiplicatively separable in input prices and output and
can be written c(w, y) = h(y)c(w, 1), where h(y) is strictly increasing and
c(w, 1) is the unit cost function, or the cost of 1 unit of output;
(b) the conditional input demands are multiplicatively separable in input prices
and output and can be written x(w, y) = h(y)x(w, 1), where h (y) > 0 and
x(w, 1) is the conditional input demand for 1 unit of output.
Proof: Part 2 can be proved by mimicking the proof of part 1, so this is left as an exercise.
Part 1(b) follows from Shephard’s lemma, so we need only prove part 1(a).
Let F denote the production function. Because it is homothetic, it can be written as
F(x) = f (g(x)), where f is strictly increasing, and g is homogeneous of degree one.
For simplicity, we shall assume that the image of F is all of R+ . Consequently,
as you are asked to show in Exercise 3.5, f −1 (y) > 0 for all y > 0. So, for some
y > 0, let t = f −1 (1)/f −1 (y) > 0. Note then that f (g(x)) ≥ y ⇐⇒ g(x) ≥ f −1 (y) ⇐⇒
g(tx) ≥ tf −1 (y) = f −1 (1) ⇐⇒ f (g(tx)) ≥ 1. Therefore, we may express the cost function
associated with F as follows.
Because f strictly increasing implies that f −1 is as well, the desired result holds
for all y > 0. To see that it also holds for y = 0, recall that c(w, 0) = 0, and note that
g(0) = 0, where the first equality follows from F(0) = 0, and the second from the linear
homogeneity of g.
The general form of the cost function that we have been considering until now is
most properly viewed as giving the firm’s long-run costs. This is because we have sup-
posed throughout that in choosing its production plan to minimise cost, the firm may freely
choose the amount of every input it uses. In the short run, the firm does not have this luxury.
It must usually contend with the fact that it has made fixed commitments, say, in leasing
a plant of a particular size or machines of a particular type. When the firm is ‘stuck’ with
fixed amounts of certain inputs in the short run, rather than being free to choose those
inputs as optimally as it can in the long run, we should expect its costs in the short run to
differ from its costs in the long run. To examine the relation between these two types of
cost, let us begin by defining the firm’s short-run, or restricted, cost function.
The optimised cost of the variable inputs, w · x(w, w̄, y; x̄), is called total variable cost.
The cost of the fixed inputs, w̄ · x̄, is called total fixed cost.
Study the definition of short-run costs carefully. Notice it differs from the definition
of generalised or long-run costs only in that the fixed inputs enter as parameters rather
than as choice variables. It should be clear therefore that for a given level of output, long-
run costs, where the firm is free to choose all inputs optimally, can never be greater than
short-run costs, where the firm may choose some but not all inputs optimally.
This point is illustrated in Fig. 3.5 using isoquants and isocost curves. For sim-
plicity we suppose that w1 = 1, so that the horizontal intercepts measure the indicated
costs, and the unnecessary parameters of the cost functions have been suppressed. If in
the short run, the firm is stuck with x̄2 units of the fixed input, it must use input combi-
nations A, C, and E, to produce output levels y1 , y2 , and y3 , and incur short-run costs of
sc(y1 ), sc(y2 ), and sc(y3 ), respectively. In the long run, when the firm is free to choose
142 CHAPTER 3
x2
x 2
D
A C E
x2
x 2 y3
B
y2
y1 – 1
w2
x1
c(y1) sc(y1) c(y2) sc(y2) c(y3) sc(y3)
Figure 3.5. sc(w, w̄, y; x̄) ≥ c(w, w̄, y) for all output levels y.
both inputs optimally, it will use input combinations B, C, and D, and be able to achieve
long-run costs of c(y1 ), c(y2 ), and c(y3 ), respectively. Notice that sc(y1 ) and sc(y3 ) are
strictly greater than c(y1 ) and c(y3 ), respectively, and sc(y2 ) = c(y2 ).
Look again at Fig. 3.5. Is the coincidence of long-run and short-run costs at output
y2 really a coincidence? No, not really. Why are the two costs equal there? A quick glance
at the figure is enough to see it is because x̄2 units are exactly the amount of x2 the firm
would choose to use in the long run to produce y1 at the prevailing input prices – that x̄2
units is, in effect, the cost-minimising amount of the fixed input to produce y1 . Thus, there
can be no difference between long-run and short-run costs at that level of output. Notice
further that there is nothing peculiar about this relationship of x̄2 and y2 . Long-run and
short-run costs of y1 would coincide if the firm were stuck with x̄2 units of the fixed input,
and long-run and short-run costs of y3 would coincide if it were stuck with x̄2 units of
the fixed input. Each different level of the fixed input would give rise to a different short-
run cost function, yet in each case, short-run and long-run costs would coincide for some
particular level of output.
To explore this relationship a bit further, let x̄(y) denote the optimal choice of the
fixed inputs to minimise short-run cost of output y at the given input prices. Then we have
argued that
must hold for any y. Further, because we have chosen the fixed inputs to minimise short-
run costs, the optimal amounts x̄(y) must satisfy (identically) the first-order conditions for
a minimum:
Cost
sc(y; x 2)
c(w, y)
sc(y; x 2)
sc(y; x 2)
y
y1 y2 y3
for all fixed inputs i. Now differentiate the identity (3.3) and use (3.4) to see that
dc(w, w̄, y) ∂sc(w, w̄, y; x̄(y)) ∂sc(w, w̄, y; x̄(y)) ∂ x̄i (y)
= +
dy ∂y ∂ x̄i ∂y
i
=0
∂sc(w, w̄, y; x̄(y))
= . (3.5)
∂y
Let us tie the pieces together and see what we have managed to show. First, the
short-run cost-minimisation problem involves more constraints on the firm than the long-
run problem, so we know that sc(w, w̄, y; x̄) ≥ c(w, w̄, y) for all levels of output and levels
of the fixed inputs. Second, for every level of output, (3.3) tells us that short-run and long-
run costs will coincide for some short-run cost function associated with some level of
the fixed inputs. Finally, (3.5) tells us that the slope of this short-run cost function will
be equal to the slope of the long-run cost function in the cost–output plane. (Indeed, we
could have derived this directly by appealing to Theorem A2.22, the Envelope theorem.)
Now, if two functions take the same value at the same point in the plane, and if their
slopes are equal, then they are tangent. This, then, establishes a familiar proposition from
intermediate theory: the long-run total cost curve is the lower envelope of the entire family
of short-run total cost curves! This is illustrated in Fig. 3.6.
The principles are identical. If we begin with a production function and derive its cost
function, we can take that cost function and use it to generate a production function. If the
original production function is quasiconcave, the derived production function will be iden-
tical to it. If the original production function is not quasiconcave, the derived production
function is a ‘concavication’ of it. Moreover, any function with all the properties of a cost
function generates some production function for which it is the cost function.
This last fact marks one of the most significant developments in modern theory and
has had important implications for applied work. Applied researchers need no longer begin
their study of the firm with detailed knowledge of the technology and with access to rel-
atively obscure engineering data. Instead, they can estimate the firm’s cost function by
employing observable market input prices and levels of output. They can then ‘recover’
the underlying production function from the estimated cost function.
Again, we can make good use of the equivalence between cost functions and expen-
diture functions by stating the following theorem, which combines the analogues of
Theorems 2.1 and 2.2, and whose proof follows from theirs.
is increasing, unbounded above, and quasiconcave. Moreover, the cost function generated
by f is c.
Finally, we can also state an integrability-type theorem for input demand. The basic
question here is this: if x(w, y) summarises the conditional input demand behaviour of
some firm, under what conditions can we conclude that this behaviour is consistent with the
hypothesis that each level of output produced by the firm was produced at minimum cost?
As in the case of demand, the answer will depend on being able to recover a cost
function that generates the given input demands. That is, those demands will be consistent
with cost minimisation at each output level if and only if there is a cost function c satisfying
∂c(w, y)
= xi (w, y), i = 1, . . . , n.
∂wi
The following result should come as no surprise, and you are invited to convince
yourself of it by mimicking the sketched proof of the integrability theorem in the case of
consumer demand.
As usual, we suppose the overriding objective is to maximise profits. The firm there-
fore will choose that level of output and that combination of factors that solve the following
problem:
where f (x) is a production function satisfying Assumption 3.1. The solutions to this
problem tell us how much output the firm will sell and how much of which inputs it
will buy.
Once again, however, we may replace the inequality in the constraint by an equality,
because the production function is strictly increasing. Consequently, because y = f (x),
we may rewrite the maximisation problem in terms of a choice over the input vector
alone as
∂f (x∗ )
p = wi , for every i = 1, . . . , n.
∂xi
The term on the left-hand side, the product of the output price with the marginal
product of input i, is often referred to as the marginal revenue product of input i. It
gives the rate at which revenue increases per additional unit of input i employed. At the
optimum, this must equal the cost per unit of input i, namely, wi .
Assuming further that all the wi are positive, we may use the previous first-order
conditions to yield the following equality between ratios:
∂f (x∗ )/∂xi wi
= , for all i, j,
∂f (x∗ )/∂xj wj
or that the MRTS between any two inputs is equated to the ratio of their prices. This is
precisely the same as the necessary condition for cost-minimising input choice we obtained
in (3.2).
Indeed, it is possible to recast the firm’s profit-maximisation problem in a manner
that emphasises the necessity of cost minimisation. Instead of thinking about maximising
profits in one step as was done above, consider the following two-step procedure. First,
calculate for each possible level of output the (least) cost of producing it. Then choose that
output that maximises the difference between the revenues it generates and its cost.
THEORY OF THE FIRM 147
dmc(y) dmc(y*)
0 0
dy dy
Output
y y*
The first step in this procedure is a familiar one. The least cost of producing y units
of output is given by the cost function, c(w, y). The second step then amounts to solving
the following maximisation problem:
In Exercise 3.51, you are asked to verify that (3.7) and (3.8) are in fact equivalent.
If y∗ > 0 is the optimal output, it therefore satisfies the first-order condition,
dc(w, y∗ )
p− = 0,
dy
or output is chosen so that price equals marginal cost. Second-order conditions require that
marginal cost be non-decreasing at the optimum, or that d2 c(y∗ )/dy2 ≥ 0. Output choice
is illustrated in Fig. 3.7.
The usefulness of the profit function depends on certain preconditions being fulfilled.
Not the least among these is that a maximum of profits actually exists. This is not as
nitpicky as it may sound. To see this, let the technology exhibit increasing returns and
suppose that x and y = f (x ) maximise profits at p and w. With increasing returns,
Multiplying by p > 0, subtracting w · tx from both sides, rearranging, and using t > 1 and
the non-negativity of profits gives
This says higher profit can always be had by increasing inputs in proportion t > 1 –
contradicting our assumption that x and f (x ) maximised profit. Notice that in the spe-
cial case of constant returns, no such problem arises if the maximal level of profit happens
to be zero. In that case, though, the scale of the firm’s operation is indeterminate because
(y , x ) and (ty , tx ) give the same level of zero profits for all t > 0.
When the profit function is well-defined, it possesses several useful properties. Each
will by now seem quite sensible and familiar.
∂π(p, w) −∂π(p, w)
= y(p, w), and = xi (p, w), i = 1, 2, . . . , n.
∂p ∂wi
Proof: Proofs of each property follow familiar patterns and so most are left as exercises.
Here we just give a quick proof of convexity.
THEORY OF THE FIRM 149
Let y and x maximise profits at p and w, and let y and x maximise profits at p and
w .
Define pt ≡ tp + (1 − t)p and wt ≡ tw + (1 − t)w for 0 ≤ t ≤ 1, and let y∗ and x∗
maximise profits at pt and wt . Then
π(p, w) = py − w · x ≥ py∗ − w · x∗ ,
π(p , w ) = p y − w · x ≥ p y∗ − w · x∗ .
So, for 0 ≤ t ≤ 1,
proving convexity.
Note that by Hotelling’s lemma, output supply and input demands can be obtained
directly by simple differentiation. From this we can deduce restrictions on firm behaviour
following from the hypothesis of profit maximisation. These are collected together in the
following theorem.
2. Own-price effects:3
∂y(p, w)
≥ 0,
∂p
∂xi (p, w)
≤ 0 for all i = 1, . . . , n.
∂wi
Proof: Homogeneity of output supply and input demand follows from Hotelling’s lemma
and homogeneity of the profit function. Property 2 says output supply is increasing in
product price and input demands are decreasing in their own input price. To see this, invoke
Hotelling’s lemma and express the supply and demand functions as
∂π(p, w)
y(p, w) = ,
∂p
∂π(p, w)
xi (p, w) = (−1) , i = 1, . . . , n.
∂wi
Because these hold for all p and w, differentiate both sides to obtain
∂y(p, w) ∂ 2 π(p, w)
= ≥ 0,
∂p ∂p2
∂xi (p, w) ∂ 2 π(p, w)
= (−1) ≤ 0, i = 1, . . . , n.
∂wi ∂w2i
Each derivative on the right is a (signed) second-order own partial of π(p, w). Because
π(p, w) is convex in p and w, its second-order own partials are all non-negative, so the
indicated signs obtain, proving 2.
It should be clear by now that the substitution matrix in item 3 is equal to the Hessian
matrix of second-order partials of the profit function. This must be symmetric by Young’s
theorem and positive semidefinite by convexity of the profit function. (Beware: Note the
sign of every term involving an input demand function.)
Just as in the case of consumer demand, and conditional input demand, there is an
integrability theorem for input demand and output supply. The reader is invited to explore
this in an exercise.
THEORY OF THE FIRM 151
In Exercise 3.13, you will be asked to show that, when β < 1, this function exhibits
decreasing returns to scale. Suppose, therefore, that β < 1 and that 0 = ρ < 1.
Form the Lagrangian for the profit-maximisation problem in (3.6). By assuming an
interior solution, the first-order conditions reduce to
ρ ρ (β−ρ)/ρ ρ−1
−w1 + pβ x1 + x2 x1 = 0, (E.1)
ρ
ρ (β−ρ)/ρ ρ−1
−w2 + pβ x1 + x2 x2 = 0, (E.2)
ρ
ρ β/ρ
x1 + x2 − y = 0. (E.3)
Taking the ratio of (E.1) to (E.2) gives x1 = x2 (w1 /w2 )1/(ρ−1) . Substituting in (E.3) gives
ρ/(ρ−1) ρ/(ρ−1) −1/ρ 1/(ρ−1)
xi = y1/β w1 + w2 wi , i = 1, 2. (E.4)
Substituting these into (E.1) and solving for y gives the supply function,
ρ/(ρ−1) ρ/(ρ−1) β(ρ−1)/ρ(β−1)
y = (pβ)−β/(β−1) w1 + w2 . (E.5)
To form the profit function, substitute from these last two equations into the objective
function to obtain
β/r(β−1) −β/(β−1)
π( p, w) = p−1/(β−1) wr1 + wr2 β (1 − β), (E.7)
The profit function we have defined so far is really best thought of as the long-run
profit function, because we have supposed the firm is free to choose its output and all
input levels as it sees fit. As we did for the cost function, we can construct a short-run or
152 CHAPTER 3
restricted profit function to describe firm behaviour when some of its inputs are variable
and some are fixed.
The restricted profit function can be a powerful tool for several reasons. First, in
many applications, it is most reasonable to suppose that at least some of the firm’s inputs
are in fixed supply. Under usual assumptions on technology, existence of these fixed
inputs generally eliminates the indeterminacy and unboundedness of maximum firm prof-
its. Finally, most properties of the general profit function with respect to output and input
prices are preserved with respect to output price and prices of the variable inputs.
The solutions y(p, w, w̄, x̄) and x(p, w, w̄, x̄) are called the short-run, or restricted, output
supply and variable input demand functions, respectively.
For all p > 0 and w
0, π(p, w, w̄, x̄) where well-defined is continuous in p and
w, increasing in p, decreasing in w, and convex in (p, w). If π(p, w, w̄, x̄) is twice contin-
uously differentiable, y(p, w, w̄, x̄) and x(p, w, w̄, x̄) possess all three properties listed in
Theorem 5.8 with respect to output and variable input prices.
Proof: The properties of π(p, w, w̄, x̄) can be established simply by mimicking the proof
of corresponding properties of π(p, w) in Theorem 3.7. The only one there that does not
carry over is homogeneity in variable input prices. The properties of short-run supply and
demand functions can be established by mimicking the proof of Theorem 3.8, except in
the case of homogeneity. To prove that requires a slight modification and is left as an
exercise.
EXAMPLE 3.6 Let us derive the short-run profit function for the constant-returns Cobb-
Douglas technology. Supposing that x2 is fixed at x̄2 , our problem is to solve:
where 0 < α < 1. Assuming an interior solution, the constraint holds with equality, so we
can substitute from the constraint for y in the objective function. The problem reduces to
choosing the single variable x1 to solve:
αpx1α−1 x̄21−α − w1 = 0.
Substituting into (E.1) and simplifying gives the short-run profit function,
α/(α−1) α/(1−α)
π(p, w1 , w̄2 , x̄2 ) = p1/(1−α) w1 α (1 − α)x̄2 − w̄2 x̄2 . (E.3)
Notice that because α < 1, short-run profits are well-defined even though the production
function exhibits (long-run) constant returns to scale.
By Hotelling’s lemma, short-run supply can be found by differentiating (E.3) with
respect to p:
as expected.
For one last perspective on the firm’s short-run behaviour, let us abstract from input
demand behaviour and focus on output supply. We can subsume the input choice problem
into the short-run cost function and express short-run profits as
dsc(y∗ )
p= ,
dy
the latter the cost of fixed inputs. Ignoring unnecessary parameters, short-run profits can
be expressed as
What if π 1 is negative? Is it still best for the firm to produce y1 even though it is making
a loss? The preceding first-order condition tells us that if the firm is going to produce a
positive level of output, then the profit-maximising (or loss-minimising) one would be y1 ,
where price equals marginal cost. However, the firm always has the option of shutting
down and producing nothing. If it produces y = 0, it will have no revenues and need to
buy no variable inputs, so variable costs are zero. However, the firm must still pay fixed
costs, so profit (loss) if it shuts down would be
Clearly, a profit maximiser will choose between producing y1 > 0 at a loss or ‘producing’
y = 0 at a loss according to which gives greater profit (smaller loss). The firm will produce
y1 > 0, therefore, only if π 1 − π 0 ≥ 0 or only if
py1 − tvc(y1 ) ≥ 0,
tvc(y1 )
p≥ ≡ avc(y1 ).
y1
We now have a complete description of output choice in the short run. If the firm produces
a positive amount of output, then it will produce an amount of output where price equals
marginal cost (and marginal cost is non-decreasing) and price is not below the average
variable cost at that level of output. If price is less than the average variable cost where
price equals marginal cost, the firm will shut down and produce no output.
One final comment on profit functions. Just as with cost functions, there is a full set
of duality relations between profit functions and production functions. In both its long-run
and short-run forms, every function with the required properties is the profit function for
some production function with the usual properties. The analyst may choose therefore
to begin with either a specification of the firm’s technology or with a specification of
the relevant profit function. See Diewert (1974) for details and Exercise 3.53 for an
integrability result.
3.6 EXERCISES
3.1 The elasticity of average product is defined as (∂APi (x)/∂xi )(xi /APi (x)). Show that this is equal
to μi (x) − 1. Show that average product is increasing, constant, or decreasing as marginal product
exceeds, is equal to, or is less than average product.
THEORY OF THE FIRM 155
3.2 Let y = f (x1 , x2 ) be a constant returns-to-scale production function. Show that if the average product
of x1 is rising, the marginal product of x2 is negative.
3.3 Prove that
when the production function is homogeneous of degree one, it may be written as the sum
f (x) = ni=1 MPi (x)xi , where MPi (x) is the marginal product of input i.
3.4 Suppose the production function F(x) is homothetic so that F(x) = f (g(x)) for some strictly increas-
ing function f and some linear homogeneous function g. Take any point x0 on the unit isoquant so
that F(x0 ) = 1. Let x1 be any point on the ray through x0 and suppose that F(x1 ) = y so that x1 is
on the y-level isoquant. Show that x1 = t∗ x0 , where t∗ = f −1 (y)/f −1 (1).
3.5 Suppose that F is a homothetic function so that it can be written as F(x) = f (g(x)), where f is strictly
increasing, and g is homogeneous of degree one. Show that if the image of F is all of R+ , f −1 (y) > 0
for all y > 0.
3.6 Let f (x1 , x2 ) be a production function satisfying Assumption 3.1, and suppose it is homogeneous
of degree one. Show that the isoquants of f are radially parallel, with equal slope at all points
along any given ray from the origin. Use this to demonstrate that the marginal rate of technical
substitution depends only on input proportions. Further, show that MP1 is non-decreasing and MP2
is non-increasing in input proportions, R ≡ x2 /x1 . Show that the same is true when the production
function is homothetic.
3.7 Goldman and Uzawa (1964) have shown that the production function is weakly separable with
respect to the partition {N1 , . . . , NS } if and only if it can be written in the form
f (x) = g f 1 x(1) , . . . , f S x(S) ,
where g is some function of S variables, and, for each i, f i (x(i) ) is a function of the subvector x(i)
of inputs from group i alone. They have also shown that the production function will be strongly
separable if and only if it is of the form
f (x) = G f 1 x(1) + · · · + f S x(S) ,
where G is a strictly increasing function of one variable, and the same conditions on the subfunctions
and subvectors apply. Verify their results by showing that each is separable as they claim.
3.8 (a) Letting fi (x) = ∂f (x)/∂xi , show that,
(b) Using the formula in (a), show that σij (x) ≥ 0 whenever f is increasing and concave. (The elas-
ticity of substitution is non-negative when f is merely quasiconcave but you need not show
this.)
3.9 Suppose that the production function f : Rn+ → R+ satisfies Assumption 3.1 and is twice contin-
uously differentiable. Further, suppose that MRTSij (x) depends only upon the ratio xi /xj and is
independent of xk for all k distinct from i and j. For every vector of input prices w ∈ Rn++ , suppose
that the input vector ζ (w) ∈ Rn++ minimises the cost of producing f (ζ (w)) units of output. Prove
156 CHAPTER 3
where you must show that the right-hand side is well-defined by showing that ζj (w)/ζi (w) depends
only on wj /wi and is independent of wk for k = i, j. The above formula for the firm’s elasticity of
substitution is useful in empirical applications because the right-hand side can be computed from
data on input prices and quantities, alone, without any direct information on the firm’s production
technology. Because only cost-minimisation is assumed, the firm need not be a perfect competitor
in its output market since even a monopolist seeks to minimise the cost of producing output. (That
is, when w is the observed vector of input prices and x is the observed vector of input demands, the
above formula assumes that x minimises the cost of producing y = f (x) units of output – a necessary
condition for profit maximisation.)
3.10 A Leontief production function has the form
y = min{αx1 , βx2 }
for α > 0 and β > 0. Carefully sketch the isoquant map for this technology and verify that the
elasticity of substitution σ = 0, where defined.
β
3.11 Calculate σ for the Cobb-Douglas production function y = Ax1α x2 , where A > 0, α > 0, and β > 0.
3.12 The CMS (constant marginal shares) production function is the form y = Ax1α x21−α − mx2 . Calculate
σ for this function and show that, for m = 0 and α = 1, AP2 rises as σ → 1. Under what conditions
does this function reduce to a linear production function?
3.13 A generalisation of the CES production function is given by
n β/ρ
ρ
y = A α0 + αi xi
i=1
for A > 0, α0 ≥ 0, αi ≥ 0, and 0 = ρ < 1. Calculate σij for this function. Show that when α0 = 0,
the elasticity of scale is measured by the parameter β.
3.14 Calculate the elasticity of substitution for the production function in Example 3.2.
3.15 Show that the elasticity of substitution for any homothetic production function is equal to the
elasticity of substitution for its linear homogeneous part alone.
3.16 Let
n 1/ρ
ρ
n
y= αi xi , where αi = 1 and 0 = ρ < 1.
i=1 i=1
n αi
(a) lim y = i=1 xi .
ρ→0
3.30 Firm 1 has cost function c1 (w, y). Firm 2 has the following cost function. Will the input demand and
output supply behaviour of the two firms be identical when
158 CHAPTER 3
c A, c B
c A(w1, w2, y)
c B(w1, w2, y)
w1
w10 w´1
iy (w, y) ≡ (∂xi (w, y)/∂y)(y/xi (w, y)).
If γij = γji and i γij = 0, for i = 1, . . . , n, the substitution matrix is symmetric, as required.
(a) What restrictions on the parameters αi are required to ensure homogeneity?
(b) For what values of the parameters does the translog reduce to the Cobb-Douglas form?
(c) Show that input shares in the translog cost function are linear in the logs of input prices and
output.
3.35 Calculate
the cost function and the conditional input demands for the linear production function,
y = ni=1 αi xi .
THEORY OF THE FIRM 159
3.36 Derive the cost function for the two-input, constant-returns, Cobb-Douglas technology. Fix one input
and derive the short-run cost function. Show that long-run average and long-run marginal cost are
constant and equal. Show that for every level of the fixed input, short-run average cost and long-run
average cost are equal at the minimum level of short-run average cost. Illustrate your results in the
cost-output plane.
3.37 Prove each of the results you obtained in the preceding exercise for the general case of any constant
returns-to-scale technology.
3.38 Show that when the production function is homothetic, the proportions in which the firm will
combine any given pair of inputs is the same for every level of output.
3.39 Show that when the production function is homothetic, the conditional demand for every input must
be non-increasing in its own price.
3.40 If the firm faces an upward-sloping supply curve for one input k, we can write the wage it must pay
each unit of the input as wk = wk (xk ), where wk > 0.
(a) Define the firm’s cost function in this case and write down the first-order conditions for its
optimal choice of each input.
(b) Define the elasticity of supply for input k as
k ≡ (dxk (wk )/dwk )(wk /xk ), and suppose that the
firm uses a positive amount of input k in equilibrium. Show that Shephard’s lemma applies only
if
k → ∞.
3.41 Suppose the production function satisfies Assumption 3.1. Prove that the cost function is the linear-
in-output form c(w, y) = yφ(w) if and only if the production function has constant returns to scale.
3.42 We have seen that every Cobb-Douglas production function, y = Ax1α x21−α , gives rise to a Cobb-
ρ
Douglas cost function, c(w, y) = yAwα1 w1−α 2 , and every CES production function, y = A(x1 +
ρ 1/ρ
x2 ) , gives rise to a CES cost function, c(w, y) = yA(w1 + w2 ) . For each pair of functions,
r r 1/r
show that the converse is also true. That is, starting with the respective cost functions, ‘work back-
ward’ to the underlying production function and show that it is of the indicated form. Justify your
approach.
3.43 Show that long-run average cost, lac(y) ≡ c(w, w̄, y)/y, is the lower envelope of short-run aver-
age cost sac(y) ≡ sc(w, w̄, y; x̄)/y, in the cost-output plane. Sketch your result in that plane, and
be sure to include an accurate demonstration of the necessary relationship that must hold between
long-run marginal cost, lmc(y) ≡ dc(w, w̄, y)/dy, and short-run marginal cost, smc(y; x̄) ≡
dsc(w, w̄, y; x̄)/dy.
β
3.44 Derive the profit function for a firm with the Cobb-Douglas technology, y = x1α x2 . What restrictions
on α and β are required to ensure that the profit function is well-defined? Explain.
3.45 Suppose the production function is additively separable so that f (x1 , x2 ) = g(x1 ) + h(x2 ). Find con-
ditions on the functions g and h so that input demands x1 (p, w) and x2 (p, w) are homogeneous of
degree 1/2 in w.
3.46 Verify Theorem 3.7 for the profit function obtained in Example 3.5. Verify Theorem 3.8 for the
associated output supply and input demand functions.
3.47 In deriving the firm’s short-run supply function in Example 3.6, we ignored the shutdown condi-
tion by supposing an interior solution to the firm’s profit-maximisation problem. Give a complete
description of short-run supply behaviour in that Cobb-Douglas case.
160 CHAPTER 3
c(w1 , w2 , y) = y2 (w1 + w2 ).
THEORY OF THE FIRM 161
(a) On the same diagram, sketch the firm’s marginal and average total cost curves and its output
supply function.
(b) On a separate diagram, sketch the input demand for input x1 against its own price w1 .
(c) On both diagrams, illustrate the effects of an increase in the price of input x2 .
3.55 A utility produces electricity to meet the demands of a city. The price it can charge for electricity is
fixed and it must meet all demand at that price. It turns out that the amount of electricity demanded
is always the same over every 24-hour period, but demand differs from day (6:00 A . M . to 6:00 P. M .)
to night (6:00 P. M . to 6:00 A . M .). During the day, 4 units are demanded, whereas during the night
only 3 units are demanded. Total output for each 24-hour period is thus always equal to 7 units. The
utility produces electricity according to the production function
where K is the size of the generating plant, and Fi is tons of fuel. The firm must build a single plant;
it cannot change plant size from day to night. If a unit of plant size costs wk per 24-hour period and
a ton of fuel costs wf , what size plant will the utility build?
PART II
MARKETS AND
WELFARE
CHAPTER 4
PARTIAL EQUILIBRIUM
In previous chapters we studied the behaviour of individual consumers and firms, describ-
ing optimal behaviour when market prices were fixed and beyond the agent’s control. Here
we begin to explore the consequences of that behaviour when consumers and firms come
together in markets. First, we shall consider price and quantity determination in a single
market or group of closely related markets. Then we shall assess those markets from a
social point of view. Along the way, we pay special attention to the close relationship
between a market’s competitive structure and its social ‘performance’.
qd (p) ≡ qi (p, p, yi ). (4.1)
i∈I
166 CHAPTER 4
There are several things worth noting in the definition of market demand. First, qd (p)
gives the total amount of q demanded by all buyers in the market. Second, because each
buyer’s demand for q depends not only on the price of q, but on the prices of all other goods
as well, so, too, does the market demand for q, though we will generally suppress explicit
mention of this. Third, whereas a single buyer’s demand depends on the level of his own
income, market demand depends both on the aggregate level of income in the market and
on its distribution among buyers. Finally, because individual demand is homogeneous of
degree zero in all prices and the individual’s income, market demand will be homogeneous
of degree zero in all prices and the vector of buyers’ incomes. Although several restrictions
on an individual’s demand system follow from utility maximisation, homogeneity is the
only such restriction on the market demand for a single good.
The supply side of the market is made up of all potential sellers of q. However, we
sometimes distinguish between firms that are potential sellers in the short run and those
that are potential sellers in the long run. Earlier, we defined the short run as that period of
time in which at least one input (for example, plant size) is fixed to the firm. Consistent
with that definition, in the short-run market period, the number of potential sellers is fixed,
finite, and limited to those firms that ‘already exist’ and are in some sense able to be up
and running simply by acquiring the necessary variable inputs. If we let J ≡ {1, . . . , J}
index those firms, the short-run market supply function is the sum of individual firm
short-run supply functions qj (p, w):
qs (p) ≡ qj (p, w). (4.2)
j∈J
Market demand and market supply together determine the price and total quantity
traded. We say that a competitive market is in short-run equilibrium at price p∗ when
qd (p∗ ) = qs (p∗ ). Geometrically, this corresponds to the familiar intersection of market
supply and market demand curves drawn in the (p, q) plane. Note that by construction of
market demand and market supply, market equilibrium is characterised by some interesting
and important features: each price-taking buyer is buying his optimal amount of the good
at the prevailing price, and each price-taking firm is selling its profit-maximising output
at the same prevailing price. Thus, we have a true equilibrium in the sense that no agent
in the market has any incentive to change his behaviour – each is doing the best he can
under the circumstances he faces.
EXAMPLE 4.1 Consider a competitive industry composed of J identical firms. Firms pro-
duce output according to the Cobb-Douglas technology, q = xα k1−α , where x is some
variable input such as labour, k is some input such as plant size, which is fixed in the short
run, and 0 < α < 1. In Example 3.6, we derived the firm’s short-run profit and supply
functions with this technology. At prices p, wx , and wk , maximum profits are
π j = p1/1−α wα/α−1
x α α/1−α (1 − α)k − wk k, (E.1)
PARTIAL EQUILIBRIUM 167
q j = pα/1−α wα/α−1
x α α/1−α k. (E.2)
qd = 294/p. (E.4)
We can use (E.1) through (E.4) to solve for the short-run equilibrium price, market
quantity, output per firm, and firm profits:
p∗ = 7,
q∗ = 42,
qj = 7/8,
π j = 2.0625 > 0.
This equilibrium, at both market and individual firm levels, is illustrated in Fig. 4.1.
(Note that short-run cost curves for firms with this technology can be derived from
Exercise 3.36.)
sac(q)
7 7 f
0
4.64
q d(p)
1
6 8
q q
42 1 7
2 8
In the long run, no inputs are fixed for the firm. Incumbent firms – those already
producing – are free to choose optimal levels of all inputs, including, for example, the size
of their plant. They are also free to leave the industry entirely. Moreover, in the long run,
new firms may decide to begin producing the good in question. Thus, in the long run, there
are possibilities of entry and exit of firms. Firms will enter the industry in response to
positive long-run economic profits and will exit in response to negative long-run profits
(losses).
In a long-run equilibrium, we shall require not only that the market clears but also
that no firm has an incentive to enter or exit the industry. Clearly, then, long-run profits
must be non-negative; otherwise, firms in the industry will wish to exit. On the other hand,
because all firms have free access to one another’s technology (in particular, firms currently
not producing have access to the technology of every firm that is producing), no firm can
be earning positive profits in the long run. Otherwise, firms outside the industry will adopt
the technology of the firm earning positive profits and enter the industry themselves.
Thus, two conditions characterise long-run equilibrium in a competitive market:
Ĵ
qd (p̂) = q j (p̂),
j=1
π (p̂) = 0,
j
j = 1, . . . , Ĵ. (4.3)
The first condition simply says the market must clear. The second says long-run profits for
all firms in the industry must be zero so that no firm wishes to enter or exit the industry.
In contrast to the short run, where the number of firms is given and the market-
clearing condition determines the short-run equilibrium price, the number of firms is not
given in the long run. In the long run therefore, both the long-run equilibrium price p̂
and the long-run equilibrium number of firms Ĵ must be determined jointly. Any such
pair satisfying the market-clearing and zero-profit conditions in (4.3) constitute a long-run
market equilibrium.
The next two examples demonstrate that the long-run number of firms is uniquely
determined when long-run supply is upward-sloping but not when it is horizontal. On the
other hand, because market demand is downward-sloping the long-run equilibrium price
is uniquely determined in both cases.
p = 39 − 0.009q. (E.1)
Technology for producing q is identical for all firms, and all firms face identical input
prices. The long-run profit function for a representative firm is given by
39
mc
ac
q s 100p 100
21 21
p 39 0.009q
1 q 1 q
2,000 40
dπ(p)
yj = = 2p − 2. (E.3)
dp
From the zero-profit condition, we obtain p̂ = 21. Substituting into the market-clearing
condition gives Ĵ = 50. From (E.3), each firm produces an output of 40 units in long-run
equilibrium. This market equilibrium is illustrated in Fig. 4.2.
EXAMPLE 4.3 Let us examine long-run equilibrium in the market of Example 4.1. There,
technology was the constant-returns-to-scale form, q = xα k1−α for x variable and k fixed in
the short run. For α = 1/2, wx = 4, and wk = 1, the short-run profit and short-run supply
functions reduce to
π j ( p, k) = p2 k/16 − k, (E.1)
q = pk/8.
j
(E.2)
qd = 294/p (E.3)
170 CHAPTER 4
294 4
= Ĵk̂.
4 8
Because at p̂ = 4 firm profits are zero regardless of plant size k̂, long-run equilibrium is
consistent with a wide range of market structures indeed. From (E.4) and (E.5), long-run
equilibrium may involve a single firm operating a plant of size k̂ = 147, two firms each
with plants k̂ = 147/2, three firms with plants k̂ = 147/3, all the way up to any number J
of firms, each with a plant of size 147/J. This indeterminacy in the long-run equilibrium
number of firms is a phenomenon common to all constant-returns industries. You are asked
to show this in the exercises.
condition (q∗ ) ≡ r (q∗ ) − c (q∗ ) = 0. This, in turn, is the same as the requirement that
marginal revenue equal marginal cost:
Equilibrium price will be p∗ = p(q∗ ), where p(q) is the inverse market demand function.
Let us explore the monopolist’s output choice a bit further. Because r(q) ≡ p(q)q,
differentiating to obtain marginal revenue gives
dp(q)
mr(q) = p(q) + q
dq
dp(q) q
= p(q) 1 +
dq p(q)
1
= p(q) 1 + , (4.5)
(q)
where (q) = (dq/dp)(p/q) is the elasticity of market demand at output q. We assume that
(q) is less than zero, i.e., that market demand is negatively sloped. By combining (4.4)
and (4.5), q∗ will satisfy
∗ 1
p(q ) 1 + = mc(q∗ ) ≥ 0 (4.6)
(q∗ )
because marginal cost is always non-negative. Price is also non-negative, so we must have
|(q∗ )| ≥ 1. Thus, the monopolist never chooses an output in the inelastic range of market
demand, and this is illustrated in Fig. 4.3.
Rearranging (4.6), we can obtain an expression for the percentage deviation of price
from marginal cost in the monopoly equilibrium:
p(q∗ ) − mc(q∗ ) 1
= . (4.7)
p(q∗ ) |(q∗ )|
When market demand is less than infinitely elastic, |(q∗ )| will be finite and the monopo-
list’s price will exceed marginal cost in equilibrium. Moreover, price will exceed marginal
cost by a greater amount the more market demand is inelastic, other things being equal.
As we have remarked, pure competition and pure monopoly are opposing extreme
forms of market structure. Nonetheless, they share one important feature: Neither the pure
competitor nor the pure monopolist needs to pay any attention to the actions of other firms
in formulating its own profit-maximising plans. The perfect competitor individually cannot
affect market price, nor therefore the actions of other competitors, and so only concerns
itself with the effects of its own actions on its own profits. The pure monopolist completely
172 CHAPTER 4
Price, cost
mc(q)
p(q*)
mr(q*) mc(q*)
p(q)
0 q
q*
| (q) | 1 | (q) | 1
| (q) | 1 mr(q)
controls market price and output, and need not even be concerned about the possibility of
entry because entry is effectively blocked.
Many markets display a blend of monopoly and competition simultaneously. Firms
become more interdependent the smaller the number of firms in the industry, the easier
entry, and the closer the substitute goods available to consumers. When firms perceive
their interdependence, they have an incentive to take account of their rivals’ actions and
to formulate their own plans strategically. In Chapter 7, we shall have a great deal more
to say about strategic behaviour and how to analyse it, but here we can take a first look at
some of the most basic issues involved.
When firms are behaving strategically, one of the first things we need to do is ask
ourselves how we should characterise equilibrium in situations like this. On the face of it,
one might be tempted to reason as follows: because firms are aware of their interdepen-
dence, and because the actions of one firm may reduce the profits of others, will they not
simply work together or collude to extract as much total profit as they can from the market
and then divide it between themselves? After all, if they can work together to make the
profit ‘pie’ as big as possible, will they not then be able to divide the pie so that each has
at least as big a slice as they could otherwise obtain? Putting the legality of such collu-
sion aside, there is something tempting in the idea of a collusive equilibrium such as this.
However, there is also a problem.
Let us consider a simple market consisting of J firms, each producing output q j .
Suppose each firm’s profit is adversely affected by an increase in the output of any other
firm, so that
J
j=1 ,
Now suppose firms cooperate to maximise joint profits. If q̄ maximises j it must
satisfy the first-order conditions
∂k (q̄)
> 0, k = 1, . . . , J.
∂qk
Think what this means. Because each firm’s profit is increasing in its own output at q̄,
each can increase its own profit by increasing output away from its assignment under q̄ –
provided, of course, that everyone else continues to produce their assignment under q̄! If
even one firm succumbs to this temptation, q̄ will not be the output vector that prevails in
the market.
Virtually all collusive solutions give rise to incentives such as these for the agents
involved to cheat on the collusive agreement they fashion. Any appeal there may be
in the idea of a collusive outcome as the likely ‘equilibrium’ in a market context is
therefore considerably reduced. It is perhaps more appropriate to think of self-interested
firms as essentially non-cooperative. To be compelling, any description of equilibrium in
imperfectly competitive markets must take this into account.
The most common concept of non-cooperative equilibrium is due to John Nash
(1951). In a Nash equilibrium, every agent must be doing the very best he or she can,
given the actions of all other agents. It is easy to see that when all agents have reached
such a point, none has any incentive to change unilaterally what he or she is doing, so the
situation is sensibly viewed as an equilibrium.
In a market situation like the ones we have been discussing, the agents concerned
are firms. There, we will not have a Nash equilibrium until every firm is maximising its
own profit, given the profit-maximising actions of all other firms. Clearly, the joint profit-
maximising output vector q̄ in (4.9) does not satisfy the requirements of a Nash equilibrium
because, as we observed, no firm’s individual profit is maximised at q̄ given the output
choices of the other firms. Indeed, if q∗ is to be a Nash equilibrium, each firm’s output
must maximise its own profit given the other firms’ output choices. Thus, q∗ must satisfy
the first-order conditions:
∂k (q∗ )
= 0, k = 1, . . . , J. (4.10)
∂qk
Clearly, there is a difference between (4.9) and (4.10). In general, they will determine quite
different output vectors.
In what follows, we shall employ the Nash equilibrium concept in a number of
different settings in which firms’ decisions are interdependent.
174 CHAPTER 4
Firms sell output on a common market, so market price depends on the total output sold
by all firms in the market. Let inverse market demand be the linear form,
j
p=a−b q j, (4.12)
j=1
where a > 0, b > 0, and we require a > c. From (4.11) and (4.12), profit for firm j is
j
(q , . . . , q ) = a − b
j 1 j
q k
q j − cq j . (4.13)
k=1
We seek a vector of outputs (q̄1 , . . . , q̄J ) such that each firm’s output choice is profit-
maximising given the output choices of the other firms. Such a vector of outputs is called
a Cournot-Nash equilibrium. This name gives due credit to Cournot, who introduced
this solution to the oligopoly problem, and to Nash, who later developed the idea more
generally.
So, if (q̄1 , . . . , q̄J ) is a Cournot-Nash equilibrium, q̄j must maximise (4.13) when
qk = q̄k for all k = j. Consequently, the derivative of (4.13) with respect to qj must be zero
when qk = q̄k for all k = 1, . . . , J. Thus,
a − 2bq̄j − b q̄k − c = 0,
k =j
j
bq̄j = a − c − b q̄k . (4.14)
k=1
Noting that the right-hand side of (4.14) is independent of which firm j we are
considering, we conclude that all firms must produce the same amount of output in
equilibrium. By letting q̄ denote this common equilibrium output, (4.14) reduces to
PARTIAL EQUILIBRIUM 175
a−c
q̄ = . (4.15)
b(J + 1)
By using (4.15), and doing a few calculations, the full set of market equilibrium values
namely, firm output, total output, market price, and firm profits are as follows:
q̄ j = (a − c)/b(J + 1), j = 1, . . . , J,
j
q̄ j = J(a − c)/b(J + 1),
j=1
p̄ = a − J(a − c)/(J + 1) < a,
¯ j = (a − c)2 /(J + 1)2 b.
a−c
p̄ − c = > 0, (4.16)
J+1
and observe that equilibrium price will typically exceed the marginal cost of each identical
firm. When J = 1, and that single firm is a pure monopolist, the deviation of price from
marginal cost is greatest. At the other extreme, when the number of firms J → ∞, (4.16)
gives
Equation (4.17) tells us that price will approach marginal cost as the number of competitors
becomes large. Indeed, this limiting outcome corresponds precisely to what would obtain
if any finite number of these firms behaved as perfect competitors. Thus, this simple model
provides another interpretation of perfect competition. It suggests that perfect competition
can be viewed as a limiting case of imperfect competition, as the number of firms becomes
large.
has identical marginal costs c > 0, and no fixed cost. Though not at all crucial, for easy
comparison with the Cournot case, we can again suppose that market demand is linear in
total output, Q, and write
Q = α − βp,
Note that firm 1’s profit is positive as long as its price exceeds marginal cost. Other
things being equal, it will be largest, of course, if firm 1 has the lowest price, and only half
as large if the two firms charge the same price. Its profit need never be negative, however,
because the firm can always charge a price equal to marginal cost and assure itself zero
profits at worst. The situation for firm 2 is symmetrical. Thus, we shall suppose that each
firm i restricts attention to prices pi ≥ c.
What is the Nash equilibrium in this market? It may be somewhat surprising, but in
the unique Nash equilibrium, both firms charge a price equal to marginal cost, and both
earn zero profit. Because profit functions here are discontinuous, we cannot argue the case
by differentiating and solving first-order conditions. Instead, we just use some common
sense.
Note that because the firm with the lowest price serves the entire market, each firm
has an incentive to undercut its rival. It is this effect that ultimately drives the equilibrium
price down to marginal cost. We now provide the formal argument.
First, note that if each firm chooses its price equal to c, then this is a Nash equilib-
rium. In this case, each firm serves half the market and earns zero profits because each unit
is sold at cost. Moreover, by increasing its price, a firm ceases to obtain any demand at all
because the other firm’s price is then strictly lower. Consequently, it is not possible to earn
more than zero profits. Therefore, each firm’s price choice is profit-maximising given the
other’s.
Next we argue that there are no other Nash equilibria. Because each firm i chooses
pi ≥ c, it suffices to show that here are no equilibria in which pi > c for some i. So let
(p1 , p2 ) be an equilibrium.
PARTIAL EQUILIBRIUM 177
If p1 > c, then because p2 maximises firm 2’s profits given firm 1’s price choice,
we must have p2 ∈ (c, p1 ], because some such choice earns firm 2 strictly positive profits,
whereas all other choices earn firm 2 zero profits. Moreover, p2 = p1 because if firm 2 can
earn positive profits by choosing p2 = p1 and splitting the market, it can earn even higher
profits by choosing p2 just slightly below p1 and supplying the entire market at virtually
the same price. Therefore,
p1 > c
⇒ p2 > c and p2 < p1 .
But by switching the roles of firms 1 and 2, an analogous argument establishes that
p2 > c
⇒ p1 > c and p1 < p2 .
Consequently, if one firm’s price is above marginal cost, both prices must be above
marginal cost and each firm must be strictly undercutting the other, which is impossible.
In the Bertrand model, price is driven to marginal cost by competition among just
two firms. This is striking, and it contrasts starkly with what occurs in the Cournot model,
where the difference between price and marginal cost declines only as the number of firms
in the market increases.
and p = (p1 , . . . , p j , . . .). In addition, we assume there is always some price p̃ j > 0 at
which demand for j is zero, regardless of the prices of the other products.
Clearly, one firm’s profit depends on the prices of all variants; being the difference
between revenue and cost:
∂q j (p̄) j j
mr (q (p̄)) − mc j (q j (p̄)) = 0, (4.20)
∂p j
where we have made use of (4.5). Because ∂q j /∂p j < 0, this reduces to the familiar
requirement that price and output be chosen to equate marginal revenue and marginal
cost. As usual, the monopolistic competitor may have positive, negative, or zero short-run
profit.
In the long run, firms will exit the industry if their profits are negative. To analyse the
long run, we assume that each variant has arbitrarily close substitutes that can be produced
at the same cost. Under this assumption, positive long-run profits for any single firm will
induce the entry of arbitrarily many firms producing close substitutes. As usual, long-run
mc j mc j
ac j ac j
pj
p j*
j(p) 0
mc j mr j
qj
mc j mr j
qj
mr j
mr j
q q
q j(p) q j(p*)
(a) (b)
Figure 4.4. (a) Short-run and (b) long-run equilibrium in monopolistic competition.
PARTIAL EQUILIBRIUM 179
∂q j (p∗ ) j j ∗
mr (q (p )) − mc j (q j (p∗ )) = 0, (4.21)
∂p j
j (q j (p∗ )) = 0. (4.22)
Both short-run and long-run equilibrium for a representative active firm are illus-
trated in Fig. 4.4, which shows the tangency between demand and average cost in long-run
equilibrium implied by (4.21) and (4.22).
welfare analysis we have in mind, then, we need to know how the price of a good affects
a person’s welfare. To keep things simple, let us suppose the price of every other good
except good q remains fixed throughout our discussion. This is the essence of the partial
equilibrium approach.
So, if the price of good q is p, and the vector of all other prices is p, then instead
of writing the consumer’s indirect utility as v(p, p, y), we shall simply write it as v(p, y).
Similarly, we shall suppress the vector p of other prices in the consumer’s expenditure
function, and in both his Hicksian and Marshallian demand functions. In fact, it will be
convenient to introduce a composite commodity, m, as the amount of income spent on
all goods other than q. If x(p, p, y) denotes demand for the vector of all other goods, then
the demand for the composite commodity is m(p, p, y) ≡ p · x(p, p, y), which we denote
simply as m(p, y). In Exercise 4.16, you are asked to show that if the consumer’s utility
function over all goods, u(q, x), satisfies our standard assumptions, then the utility function
over the two goods q and m, ū(q, m) ≡ maxx u(q, x) subject to p · x ≤ m, also satisfies
those assumptions. Moreover, we can use ū to analyse the consumer’s problem as if there
were only two goods, q and m. That is, the consumer’s demands for q and m, q(p, y) and
m(p, y), respectively, solve
m($)
y0
y0 CV
A B
C
v( p1, y0) u (B)
v 0 ⬅ v( p0, y0) u (A) u(C)
Price
p0
p1
q(p, y0)
q h(p, v 0)
q
q(p0, y0) q h(p0, v 0)
we must have
Because we also know that y0 = e(p0 , v(p0 , y0 )), we can substitute into (4.24), rearrange,
and write
where we have let v0 ≡ v(p0 , y0 ) stand for the consumer’s base utility level facing base
prices and income.
Now we know that the Hicksian demand for good q is (by Shephard’s lemma) given
by the price partial of the expenditure function. From that and (4.25), we can write
CV = e(p1 , v0 ) − e(p0 , v0 )
p1
∂e(p, v0 )
= dp
p0 ∂p
p1
= qh (p, v0 )dp. (4.26)
p0
Note then that when p1 < p0 , CV is the negative of the area to the left of the Hicksian
demand curve for base utility level v0 between p1 and p0 , and if p1 > p0 , CV is positive
and simply equal to that area. This is taken care of automatically in (4.26) because one
must change the sign of the integral when the limits of integration are interchanged. In
Fig. 4.5, CV is therefore equal to the (negative of the) lightly shaded area between p0
and p1 . Study (4.26) and Fig. 4.5 carefully. You will see, as common sense suggests, that if
price rises (p > p0 ), a positive income adjustment will be necessary to restore the original
utility level (CV > 0), and if price declines (p < p0 ), a negative income adjustment will
restore the original utility level (CV < 0).
The compensating variation makes good sense as a dollar-denominated measure of
the welfare impact a price change will have. Unfortunately, however, we have just learned
that CV will always be the area to the left of some Hicksian demand curve, and Hicksian
demand curves are not quite as readily observable as Marshallian ones. Of course, with
enough data on the consumer’s Marshallian demand system at different prices and income
levels, one can recover via integrability methods the consumer’s Hicksian demand and
directly calculate CV. However, our economist only has access to the consumer’s demand
curve for this one good corresponding to one fixed level of income. And this is not
generally enough information to recover Hicksian demand.
Despite this, we can still take advantage of the relation between Hicksian and
Marshallian demands expressed by the Slutsky equation to obtain an estimate of CV.
Recall that Marshallian demand picks up the total effect of a price change, and the Hicksian
PARTIAL EQUILIBRIUM 183
only picks up the substitution effect. The two will generally therefore diverge, and diverge
precisely because of, the income effect of a price change. In the bottom portion of Fig. 4.5,
this is illustrated for the case where q is a normal good by the horizontal deviation between
the two curves everywhere but at p0 .
We would like to relate Hicks’ idea of compensating variation to the notion of con-
sumer surplus, because the latter is easily measured directly from Marshallian demand.
Recall that at the price–income pair (p0 , y0 ), consumer surplus, CS(p0 , y0 ), is simply the
area under the demand curve (given y0 ) and above the price, p0 . Consequently, the com-
bined shaded areas in Fig. 4.5 equal the gain in consumer surplus due to the price fall from
p0 to p1 . That is,
p0
CS ≡ CS(p1 , y0 ) − CS(p0 , y0 ) = q(p, y0 )dp. (4.27)
p1
As you can see, CS will always be opposite in sign to CV, and it will diverge in
absolute value from CV whenever demand depends in any way on the consumer’s income,
due to the income effect of a price change. Because we want to know CV but can only
calculate CS, a natural question immediately arises. How good an approximation of CV
does CS provide?
The answer is that as long as the price reduction from p0 to p1 is not too large, our
economist can obtain a very good estimate indeed of each consumer’s willingness to pay
for the new water treatment facility. Based on this, an informed decision can be made as
to who is taxed and by how much.
Before moving on, a word of warning: when only the market demand curve, as
opposed to individual demand curves, is known, the change in consumer surplus (again
for small price decreases, say) will provide a good approximation to the total amount of
income that consumers are willing to give up for the price decrease. However, it may well
be that some of them are willing to give up more income than others (heavy water users,
for example). Consequently, market demand analysis might well indicate that total will-
ingness to pay exceeds the total cost of the project, which would imply that there is some
way to distribute the cost of the project among consumers so that everyone is better off
after paying their part of the cost and enjoying the lower price. However, it would give no
hint as to how that total cost should be distributed among consumers.
The idea of Pareto efficiency is pervasive in economics and it is often used as one
means to evaluate the performance of an economic system. The basic idea is that if an
economic system is to be considered as functioning well, then given the distribution of
resources it determines, it should not be possible to redistribute them in a way that results
in a Pareto improvement. We shall pursue this idea more systematically in the next chapter.
For now, we limit ourselves to the following question: which, if any, of the three types of
market competition – perfect competition, monopoly, or Cournot oligopoly – function well
in the sense that they yield a Pareto-efficient outcome?
Note that the difference between the three forms of competition is simply the prices
and quantities they determine. For example, were a perfectly competitive industry taken
over by a monopolist, the price would rise from the perfectly competitive equilibrium
price to the monopolist’s profit-maximising price and the quantity of the good produced
and consumed would fall. Note, however, that in both cases, the price–quantity pair is
a point on the market demand curve. The same is true of the Cournot-oligopoly solution.
Consequently, we might just as well ask: which price–quantity pairs on the market demand
curve yield Pareto-efficient outcomes? We now direct our attention toward providing an
answer to this question.
To simplify the discussion, we shall suppose from now on that there is just one pro-
ducer and one consumer. (The arguments generalise.) Refer now to Fig. 4.6, which depicts
the consumer’s (and therefore the market) Marshallian demand q(p, y0 ), his Hicksian-
compensated demand qh (p, v0 ), where v0 = v(p0 , y0 ), and the firm’s marginal cost curve,
mc(q). Note then that if this firm behaved as a perfect competitor, the equilibrium price–
quantity pair would be determined by the intersection of the two curves, because a
competitive firm’s supply curve coincides with its marginal cost curve above the minimum
of its average variable costs. (We have assumed that average variable costs are minimised
at q = 0.)
qh(p, v0)
mc(q)
p0
A B
p1
q(p, y0)
D
q
q0 q1
Consider now the price–quantity pair (p0 , q0 ) on the consumer’s demand curve
above the competitive point in Fig. 4.6. We wish to argue that this market outcome is
not Pareto efficient. To do so, we need only demonstrate that we can redistribute resources
in a way that makes someone better off and no one worse off.
So, consider reducing the price of q from p0 to p1 . What would the consumer be
willing to pay for this reduction? As we now know, the answer is the absolute value of the
compensating variation, which, in this case, is the sum of areas A and B in the figure. Let
us then reduce the price to p1 and take A + B units of income away from the consumer.
Consequently, he is just as well off as he was before, and he now demands q1 units of the
good according to his Hicksian-compensated demand.
To fulfil the additional demand for q, let us insist that the firm produce just enough
additional output to meet it.
So, up to this point, we have lowered the price to p1 , increased production to q1 , and
collected A + B dollars from the consumer, and the consumer is just as well off as before
these changes were made. Of course, the price–quantity change will have an effect on the
profits earned by the firm. In particular, if c(q) denotes the cost of producing q units of
output, then the change in the firm’s profits will be
1 1
p q − c(q1 ) − p0 q0 − c(q0 ) = p1 q1 − p0 q0 − c(q1 ) − c(q0 )
q1
= p1 q1 − p0 q0 − mc(q)dq
q0
= [C + D − A] − D
= C − A.
Consequently, if after making these changes, we give the firm A dollars out of the
A + B collected from the consumer, the firm will have come out strictly ahead by C dollars.
We can then give the consumer the B dollars we have left over so that in the end, both the
consumer and the firm are strictly better off as a result of the changes we have made.
Thus, beginning from the market outcome (p0 , q0 ), we have been able to make
both the consumer and the firm strictly better off simply by redistributing the available
resources. Consequently, the original situation was not Pareto efficient.
A similar argument applies to price–quantity pairs on the consumer’s Marshallian
demand curve lying below the competitive point.1 Hence, the only price–quantity pair
that can possibly result in a Pareto-efficient outcome is the perfectly competitive one –
and indeed it does. We shall not give the argument here because it will follow from our
more general analysis in the next chapter. However, we encourage the reader to check
that the particular scheme used before to obtain a Pareto improvement does not work
when one begins at the competitive equilibrium. (No other scheme will produce a Pareto
improvement either.)
Thus, our conclusion is that the only price–quantity pair yielding a Pareto-efficient
outcome is the perfectly competitive one. In particular, neither the monopoly outcome nor
the Cournot-oligopoly outcome is Pareto efficient.
Note well that we cannot conclude from this analysis that forcing a monopoly to
behave differently than it would choose to must necessarily result in a Pareto improvement.
It may well lower the price and increase the quantity supplied, but unless the consumers
who are made better off by this change compensate the monopolist who is made worse off,
the move will not be Pareto improving.
CS
p
PS
p(q)
q
q
p(q) = mc(q),
which occurs precisely at the perfectly competitive equilibrium quantity when demand is
downward-sloping and marginal costs rise, as we have depicted in Fig. 4.7.
In fact, it is this relation between price and marginal cost that is responsible for the
connection between our analysis in the previous section and the present one. Whenever
price and marginal cost differ, a Pareto improvement like the one employed in the previous
section can be implemented. And, as we have just seen, whenever price and marginal cost
differ, the total surplus can be increased.
Once again, a warning: although Pareto efficiency requires that the total surplus be
maximised, a Pareto improvement need not result simply because the total surplus has
increased. Unless those who gain compensate those who lose as a result of the change, the
change is not Pareto improving.
We have seen that when markets are imperfectly competitive, the market equilibrium
generally involves prices that exceed marginal cost. However, ‘price equals marginal cost’
is a necessary condition for a maximum of consumer and producer surplus. It should there-
fore come as no surprise that the equilibrium outcomes in most imperfectly competitive
markets are not Pareto efficient.
EXAMPLE 4.4 Let us consider the performance of the Cournot oligopoly in Section
4.2.1. There, market demand is p = a − bq for total market output q. Firms are identical,
with marginal cost c ≥ 0. When each firm produces the same output q/J, total surplus,
W ≡ cs + ps, as a function of total output, will be
q
q/J
W(q) = (a − bξ )dξ − J cdξ,
0 0
188 CHAPTER 4
which reduces to
(a − c)2
W(q∗ ) = . (E.2)
2b
In the Cournot-Nash equilibrium, we have seen that total market output will be q̄ =
J(a − c)/(J + 1)b. Clearly, q̄ < q∗ , so the Cournot oligopoly produces too little output
from a social point of view. Total surplus in the Cournot equilibrium will be
(a − c)2 J 2 + 2J
W(q̄) = , (E.3)
2b (J + 1)2
(a − c)2
W(q∗ ) − W(q̄) = > 0. (E.4)
(J + 1)2 2b
By using (E.3), it is easy to show that total surplus increases as the number of firms
in the market becomes larger. Before, we noted that market price converges to marginal
cost as the number of firms in the oligopoly becomes large. Consequently, total surplus
rises toward its maximal level in (E.2), and the dead weight loss in (E.4) declines to zero,
as J → ∞.
4.4 EXERCISES
4.1 Suppose that preferences are identical and homothetic. Show that market demand for any good must
be independent of the distribution of income. Also show that the elasticity of market demand with
respect to the level of market income must be equal to unity.
4.2 Suppose that preferences are homothetic but not identical. Will market demand necessarily be
independent of the distribution of income?
4.3 Show that if q is a normal good for every consumer, the market demand for q will be negatively
sloped with respect to its own price.
4.4 Suppose that x and y are substitutes for all but one consumer. Does it follow that the market demand
for x will be increasing in the price of y?
4.5 Show that the long-run equilibrium number of firms is indeterminate when all firms in the industry
share the same constant returns-to-scale technology and face the same factor prices.
PARTIAL EQUILIBRIUM 189
4.6 A firm j in a competitive industry has total cost function c j (q) = aq + bj q2 , where a > 0, q is firm
output, and bj is different for each firm.
(a) If bj > 0 for all firms, what governs the amount produced by each of them? Will they produce
equal amounts of output? Explain.
(b) What happens if bj < 0 for all firms?
4.7 Technology for producing q gives rise to the cost function c(q) = aq + bq2 . The market demand for
q is p = α − βq.
(a) If a > 0, if b < 0, and if there are J firms in the industry, what is the short-run equilibrium
market price and the output of a representative firm?
(b) If a > 0 and b < 0, what is the long-run equilibrium market price and number of firms? Explain.
(c) If a > 0 and b > 0, what is the long-run equilibrium market price and number of firms? Explain.
4.8 In the Cournot oligopoly of Section 4.2.1, suppose that J = 2. Let each duopolist have constant
average and marginal costs, as before, but suppose that 0 ≤ c1 < c2 . Show that firm 1 will have
greater profits and produce a greater share of market output than firm 2 in the Nash equilibrium.
4.9 In a Stackelberg duopoly, one firm is a ‘leader’ and one is a ‘follower’. Both firms know each
other’s costs and market demand. The follower takes the leader’s output as given and picks his own
output accordingly (i.e., the follower acts like a Cournot competitor). The leader takes the follower’s
reactions as given and picks his own output accordingly. Suppose that firms 1 and 2 face market
demand, p = 100 − (q1 + q2 ). Firm costs are c1 = 10q1 and c2 = q22 .
(a) Calculate market price and each firm’s profit assuming that firm 1 is the leader and firm 2 the
follower.
(b) Do the same assuming that firm 2 is the leader and firm 1 is the follower.
(c) Given your answers in parts (a) and (b), who would firm 1 want to be the leader in the market?
Who would firm 2 want to be the leader?
(d) If each firm assumes what it wants to be the case in part (c), what are the equilibrium market price
and firm profits? How does this compare with the Cournot-Nash equilibrium in this market?
4.10 (Stackelberg Warfare) In the market described in Section 4.2.1, let J = 2.
(a) Show that if, say, firm 1 is leader and firm 2 is follower, the leader earns higher and the follower
earns lower profit than they do in the Cournot equilibrium. Conclude that each would want to be
the leader.
(b) If both firms decide to act as leader and each assumes the other will be a follower, can the
equilibrium be determined? What will happen in this market?
4.11 In the Cournot market of Section 4.2.1, suppose that each identical firm has cost function c(q) =
k + cq, where k > 0 is fixed cost.
(a) What will be the equilibrium price, market output, and firm profits with J firms in the market?
(b) With free entry and exit, what will be the long-run equilibrium number of firms in the market?
4.12 In the Bertrand duopoly of Section 4.2.2, market demand is Q = α − βp, and firms have no fixed
costs and identical marginal cost. Find a Bertrand equilibrium pair of prices, (p1 , p2 ), and quantities,
(q1 , q2 ), when the following hold.
190 CHAPTER 4
p1 = 20 + 12 p2 − q1 and p2 = 20 + 12 p1 − q2 ,
respectively. Each firm has constant marginal costs of 20 and no fixed costs. Each firm is a
Cournot competitor in price, not quantity. Compute the Cournot equilibrium in this market, giving
equilibrium price and output for each good.
4.14 An industry consists of many identical firms each with cost function c(q) = q2 + 1. When there are J
active firms, each firm faces an identical inverse market demand p = 10 − 15q − (J − 1)q̄ whenever
an identical output of q̄ is produced by each of the other (J − 1) active firms.
(a) With J active firms, and no possibility of entry or exit, what is the short-run equilibrium output
q∗ of a representative firm when firms act as Cournot competitors in choosing output?
(b) How many firms will be active in the long run?
4.15 When firms j = 1, . . . , J are active in a monopolistically competitive market, firm j faces the
following demand function:
⎛ ⎞−2
⎜
j
−1/2 ⎟
q j = (p j )−2 ⎜
⎝ pi ⎟
⎠ , j = 1, . . . , J.
i=1
i=j
c(q) = cq + k,
where c > 0 and k > 0. Each firm chooses its price to maximise profits, given the prices chosen by
the others.
(a) Show that each firm’s demand is negatively sloped, with constant own-price elasticity, and that
all goods are substitutes for each other.
(b) Show that if all firms raise their prices proportionately, the demand for any given good declines.
(c) Find the long-run Nash equilibrium number of firms.
4.16 Suppose that a consumer’s utility function over all goods, u(q, x), is continuous, strictly increasing,
and strictly quasiconcave, and that the price p of the vector of goods, x, is fixed. Let m denote the
composite commodity p · x, so that m is the amount of income spent on x. Define the utility function
ū over the two goods q and m as follows.
(a) Show that ū(q, m) is strictly increasing and strictly quasiconcave. If you can, appeal to a theorem
that allows you to conclude that it is also continuous.
(b) Show that if q(p, p, y) and x(p, p, y) denote the consumer’s Marshallian demands for q and x,
then, q(p, p, y) and m(p, p, y) ≡ p · x(p, p, y) solve
∂q(p, y) y
≡ η(y)
∂y q(p, y)
for all p and y in the relevant region, then for base price p0 and income y0 , CS and CV are related,
exactly, as follows:
CV+y0
ζ
η(ξ )
−CS = exp − dξ dζ.
y0 y0 ξ
(a) Show that when income elasticity is constant but not equal to unity,
1/(1−η)
−CS
CV = y0 (1 − η) + 1 − y0 .
y0
(b) Use this to show that when demand is independent of income, −CS = CV, so consumer
surplus can then be used to obtain an exact measure of the welfare impact of a price change.
(c) Derive the relation between CV and CS when income elasticity is unity.
(d) Finally, we can use the result in part (a) to establish a convenient rule of thumb that can be used
to quickly gauge the approximate size of the deviation between the change in consumer surplus
and the compensating variation when income elasticity is constant. Show that when income
elasticity is constant and not equal to unity, we have (CV − |CS|)/|CS| ≈ (η|CS|)/2y0 .
4.19 A consumer has preferences over the single good x and all other goods m represented by the utility
function, u(x, m) = ln(x) + m. Let the price of x be p, the price of m be unity, and let income be y.
(a) Derive the Marshallian demands for x and m.
(b) Derive the indirect utility function, v(p, y).
192 CHAPTER 4
(c) Use the Slutsky equation to decompose the effect of an own-price change on the demand for x
into an income and substitution effect. Interpret your result briefly.
(d) Suppose that the price of x rises from p0 to p1 > p0 . Show that the consumer surplus area
between p0 and p1 gives an exact measure of the effect of the price change on consumer welfare.
(e) Carefully illustrate your findings with a set of two diagrams: one giving the indifference curves
and budget constraints on top, and the other giving the Marshallian and Hicksian demands below.
Be certain that your diagrams reflect all qualitative information on preferences and demands that
you have uncovered. Be sure to consider the two prices p0 and p1 , and identify the Hicksian and
Marshallian demands.
4.20 A consumer’s demand for the single good x is given by x(p, y) = y/p, where p is the good’s price,
and y is the consumer’s income. Let income be $7. Find the compensating variation for an increase
in the price of this good from $1 to $4.
4.21 Use a figure similar to Fig. 4.6 to argue that price–quantity pairs on the demand curve below the
competitive price–quantity pair are not Pareto efficient.
4.22 A monopolist faces linear demand p = α − βq and has cost C = cq + F, where all parameters are
positive, α > c, and (α − c)2 > 4βF.
(a) Solve for the monopolist’s output, price, and profits.
(b) Calculate the deadweight loss and show that it is positive.
(c) If the government requires this firm to set the price that maximises the sum of consumer and
producer surplus, and to serve all buyers at that price, what is the price the firm must charge?
Show that the firm’s profits are negative under this regulation, so that this form of regulation is
not sustainable in the long run.
4.23 (Ramsey Rule) Building from the preceding exercise, suppose a monopolist faces negatively sloped
demand, p = p(q), and has costs C = cq + F. Now suppose that the government requires this firm
to set a price (p∗ ) that will maximise the sum of consumer and producer surplus, subject to the
constraint that firm profit be non-negative, so that the regulation is sustainable in the long run. Show
that under this form of regulation, the firm will charge a price greater than marginal cost, and that the
percentage deviation of price from marginal cost ((p∗ − c)/p∗ ) will be proportional to 1/ ∗ , where
∗ is the elasticity of firm demand at the chosen price and output. Interpret your result.
4.24 Suppose that (p̄, q̄) are equilibrium market price and output in a perfectly competitive market with
only two firms. Show that when demand is downward-sloping and marginal costs rise, (p̄, q̄) satisfy
the second-order conditions for a maximum of consumer plus producer surplus.
4.25 (Welfare Bias in Product Selection) A monopolist must decide between two different designs for its
product. Each design will have a different market demand and different costs of production. If design
x1 is introduced, it will have market demand and costs of
⎧
⎪ 2
⎪
⎨ p + 6 8 − p1 ,
7
if 0 < p1 ≤ 6 78 ,
1
x1 =
⎪
⎪ 2
⎩ , if p1 > 6 78 ,
p1
c1 (x1 ) = 5 18 + x1 .
PARTIAL EQUILIBRIUM 193
If design x2 is introduced, it will have the following market demand and costs:
x2 = 7 78 − 1 18 p2 ,
c2 (x2 ) = 4 18 + x2 .
Note that the only difference in costs between these two designs is a difference in fixed costs.
(a) Calculate the price the firm would charge and the profits it would make if it introduced each
design. Which design will it introduce?
(b) Carefully sketch the demand and marginal cost curves for both designs on the same set of
axes. Does the firm’s choice maximise consumer plus producer surplus? Is the outcome Pareto
efficient?
4.26 A competitive industry is in long-run equilibrium. Market demand is linear, p = a − bQ, where
a > 0, b > 0, and Q is market output. Each firm in the industry has the same technology with cost
function, c(q) = k2 + q2 .
(a) What is the long-run equilibrium price? (Assume what is necessary of the parameters to ensure
that this is positive and less than a.)
(b) Suppose that the government imposes a per-unit tax, t > 0, on every producing firm in the indus-
try. Describe what would happen in the long run to the number of firms in the industry. What is
the post-tax market equilibrium price? (Again, assume whatever is necessary to ensure that this
is positive and less than a.)
(c) Calculate the long-run effect of this tax on consumer surplus. Show that the loss in consumer
surplus from this tax exceeds the amount of tax revenue collected by the government in the
post-tax market equilibrium.
(d) Would a lump-sum tax, levied on producers and designed to raise the same amount of tax
revenue, be preferred by consumers? Justify your answer.
(e) State the conditions under which a lump-sum tax, levied on consumers and designed to raise the
same amount of revenue, would be preferred by consumers to either preceding form of tax.
4.27 A per-unit tax, t > 0, is levied on the output of a monopoly. The monopolist faces demand, q = p− ,
where > 1, and has constant average costs. Show that the monopolist will increase price by more
than the amount of the per-unit tax.
4.28 A firm under uncertainty faces gambles of the form g = (p1 ◦ π1 , . . . , pn ◦ πn ), where the πi are
profits and the pi their probabilities of occurrence. The firm’s owner has a VNM utility function over
gambles in profit, and he is an expected utility maximiser. Prove that the firm’s owner will always
act to maximise expected profit if and only if he is risk neutral.
4.29 Consider a two-period monopoly facing the negatively sloped inverse demand function pt = p(qt ) in
each period t = 0, 1. The firm maximises the present discounted value of profits, PDV = 1t=0 (1 +
r)−t πt , where r > 0 is the market interest rate, and πt is period-t profit. In each of the following,
assume that costs each period are increasing in that period’s output and are strictly convex, and that
PDV is strictly concave.
194 CHAPTER 4
(a) If costs are ct = c(qt ) for t = 0, 1, show that the firm will ‘short-run profit maximise’ in each
period by choosing output to equate marginal cost and marginal revenue in each period.
(b) Now suppose that the firm can ‘learn by doing’. Its period-zero costs are simply c0 =
c0 (q0 ). Its period-one costs, however, depend on output in period zero; c1 = c1 (q1 , q0 ), where
∂c1 /∂q0 < 0. Does the firm still ‘short-run profit maximise’ in each period? Why or why not?
Interpret your results.
CHAPTER 5
GENERAL EQUILIBRIUM
Many scholars trace the birth of economics to the publication of Adam Smith’s The Wealth
of Nations (1776). Behind the superficial chaos of countless interdependent market actions
by selfish agents, Smith saw a harmonising force serving society. This Invisible Hand
guides the market system to an equilibrium that Smith believed possessed certain socially
desirable characteristics.
One can ask many questions about competitive market systems. A fundamental one
arises immediately: is Smith’s vision of a smoothly functioning system composed of many
self-interested individuals buying and selling on impersonal markets – with no regard for
anything but their personal gain – a logically coherent vision at all? If so, is there one
particular state towards which such a system will tend, or are there many such states? Are
these fragile things that can be easily disrupted or are they robust?
These are questions of existence, uniqueness, and stability of general competitive
equilibrium. All are deep and important, but we will only address the first.
In many ways, existence is the most fundamental question and so merits our closest
attention. What is at issue is the logical coherence of the very notion of a competi-
tive market system. The question is usually framed, however, as one of the existence
of prices at which demand and supply are brought into balance in the market for every
good and service simultaneously. The market prices of everything we buy and sell are
principal determinants of what we can consume, and so, of the well-being we can
achieve. Thus, market prices determine to a large extent ‘who gets what’ in a market
economy.
In this chapter, we do not merely ask under what conditions a set of market-clearing
prices exists. We also ask how well a market system solves the basic economic prob-
lem of distribution. We will begin by exploring the distribution problem in very general
terms, then proceed to consider the existence of general competitive equilibrium itself.
Along the way, we will focus particular scrutiny on Smith’s claim that a competitive
market system promotes society’s welfare through no conscious collective intention of
its members.
196 CHAPTER 5
(x 1, x 2)
x 21 x 22
e 21 e 22
e ⫽ (e 1, e 2)
Increasing amounts of x1 for consumer 1 are measured rightwards from 01 along the bot-
tom side, and increasing amounts of x1 for consumer 2 are measured leftwards from 02
along the top side. Similarly, x2 for consumer 1 is measured vertically up from 01 on the
left, and for consumer 2, vertically down on the right. The box is constructed so that its
width measures the total endowment of x1 and its height the total endowment of x2 .
Notice carefully that each point in the box has four coordinates – two indicating
some amount of each good for consumer 1 and two indicating some amount of each good
for consumer 2. Because the dimensions of the box are fixed by the total endowments, each
set of four coordinates represents some division of the total amount of each good between
the two consumers. For example, the point labelled e denotes the pair of initial endowments
e1 and e2 . Every other point in the box represents some other way the totals can be allocated
between the consumers, and every possible allocation of the totals between the consumers
is represented by some point in the box. The box therefore provides a complete picture of
every feasible distribution of existing commodities between consumers.
To complete the description of the two-person exchange economy, suppose each
consumer has preferences represented by a usual, convex indifference map. In Fig. 5.2,
consumer 1’s indifference map increases north-easterly, and consumer 2’s increases south-
westerly. One indifference curve for each consumer passes through every point in the box.
The line labelled CC is the subset of allocations where the consumers’ indifference curves
through the point are tangent to each other, and it is called the contract curve. At any
point off the contract curve, the consumers’ indifference curves through that point must
cut each other.
Given initial endowments at e, which allocations will be barter equilibria in this
exchange economy? Obviously, the first requirement is that the allocations be somewhere,
x2 02
C
B
D
A c
x1
01
‘in the box’, because only those are feasible. But not every feasible allocation can be a
barter equilibrium. For example, suppose a redistribution from e to point A were proposed.
Consumer 2 would be better off, but consumer 1 would clearly be worse off. Because
this economy relies on voluntary exchange, and because consumers are self-interested, the
redistribution to A would be refused, or ‘blocked’, by consumer 1, and so could not arise
as an equilibrium given the initial endowment. By the same argument, all allocations to the
left of consumer 1’s indifference curve through e would be blocked by consumer 1, and all
allocations to the right of consumer 2’s indifference curve through e would be blocked by
consumer 2.
This leaves only allocations inside and on the boundary of the lens-shaped area delin-
eated by the two consumers’ indifference curves through e as potential barter equilibria.
At every point along the boundary, one consumer will be better off and the other no worse
off than they are at e. At every allocation inside the lens, however, both consumers will be
strictly better off than they are at e. To achieve these gains, the consumers must arrange
a trade. Consumer 1 must give up some x1 in exchange for some of consumer 2’s x2 , and
consumer 2 must give up some x2 in exchange for some of consumer 1’s x1 .
But are all allocations inside the lens barter equilibria? Suppose a redistribution to B
within that region were to occur. Because B is off the contract curve, the two indifference
curves passing through it must cut each other, forming another lens-shaped region con-
tained entirely within the original one. Consequently, both consumers once again can be
made strictly better off by arranging an appropriate trade away from B and inside the lens
it determines. Thus, B and every such point inside the lens through e but off the contract
curve can be ruled out as barter equilibria.
Now consider a point like D on segment cc of the contract curve. A move from e to
any such point will definitely make both parties better off. Moreover, once the consumers
trade to D, there are no feasible trades that result in further mutual gain. Thus, once D is
achieved, no further trades will take place: D is a barter equilibrium. Indeed, any point
along cc is a barter equilibrium. Should the consumers agree to trade and so find them-
selves at any allocation on cc, and should a redistribution to any other allocation in the
box then be proposed, that redistribution would be blocked by one or both of them. (This
includes, of course, any movement from one point on cc to another on cc.) Pick any point
on cc, consider several possible reallocations, and convince yourself of this. Once on cc,
we can be sure there will be no subsequent movement away.
Clearly, there are many barter equilibria toward which the system might evolve. We
are content with having identified all of the possibilities. Note that these equilibria all share
the property that once there, it is not possible to move elsewhere in the box without making
at least one of the consumers worse off. Thus, each point of equilibrium in exchange is
Pareto efficient in the sense described in Chapter 4.
Consider now the case of many consumers and many goods. Let
I = {1, . . . , I}
index the set of consumers, and suppose there are n goods. Each consumer i ∈ I has a
preference relation, i , and is endowed with a non-negative vector of the n goods, ei =
(ei1 , . . . , ein ). Altogether, the collection E = ( i , ei )i∈I defines an exchange economy.
GENERAL EQUILIBRIUM 199
e ≡ (e1 , . . . , eI )
x ≡ (x1 , . . . , xI ),
where xi ≡ (x1i , . . . , xni ) denotes consumer i’s bundle according to the allocation. The set
of feasible allocations in this economy is given by
F(e) ≡ x x =
i
ei , (5.1)
i∈I i∈I
and it contains all allocations of goods across individuals that, in total, exhaust the available
amount of every good. The first requirement on x as a barter equilibrium is therefore that
x ∈ F(e).
Now in the two-consumer case, we noted that if both consumers could be made better
off by trading with one another, then we could not yet be at a barter equilibrium. Thus, at
a barter equilibrium, no Pareto improvements were possible. This also carries over to the
more general case. To formalise this, let us begin with the following.
equilibrium of our process of voluntary exchange. Thus, it remains to describe the set of
Pareto-efficient allocations that can be reached through voluntary exchange.
Recall from the two-consumer case that not all Pareto-efficient allocations were
equilibria there. That is, only those allocations on the contract curve and within the lens
created by the indifference curves through the endowment point were equilibria. The rea-
son for this was that the other Pareto-efficient allocations – those on the contract curve
but outside the lens – made at least one of the consumers worse off than they would be
by simply consuming their endowment. Thus, each such Pareto-efficient allocation was
‘blocked’ by one of the consumers.
Similarly, when there are more than two consumers, no equilibrium allocation can
make any consumer worse off than he would be consuming his endowment. That consumer
would simply refuse to make the necessary trade. But in fact there are now additional
reasons you might refuse to trade to some Pareto-efficient allocation. Indeed, although
you might prefer the bundle assigned to you in the proposed allocation over your own
endowment, you might be able to find another consumer to strike a trade with such that
you do even better as a result of that trade and he does no worse than he would have done
had you both gone along with the proposed allocation. Consequently, although you alone
are unable to block the proposal, you are able to block it together with someone else. Of
course, the potential for blocking is not limited to coalitions of size 2. Three or more of
you might be able to get together to block an allocation. With all of this in mind, consider
the following.
Together, the first and second items in the definition say that the consumers in S must
be able to take what they themselves have and divide it up differently among themselves
so that none is worse off and at least one is better off than with their assignment under x.
Thus, an allocation x is blocked whenever some group, no matter how large or small,
can do better than they do under x by simply ‘going it alone’. By contrast, we say that an
allocation is ‘unblocked’ if no coalition can block it. Our final requirement for equilibrium,
then, is that the allocation be unblocked.
Note that this takes care of the two-consumer case because all allocations outside
the lens are blocked by a coalition consisting of a single consumer (sometimes con-
sumer 1, sometimes consumer 2). In addition, note that in general, if x ∈ F(e) is unblocked,
then it must be Pareto efficient, because otherwise it would be blocked by the grand
1 Note that there is no need to insist that y ∈ F(e), because one can always make it so by replacing the bundles in
it going to consumers j ∈
/ S by e j .
GENERAL EQUILIBRIUM 201
coalition S = I . This lets us summarise the requirements for equilibrium in exchange very
compactly.
Specifically, an allocation x ∈ F(e) is an equilibrium in the exchange economy with
endowments e if x is not blocked by any coalition of consumers. Take a moment to convince
yourself that this definition reduces to the one we developed earlier when there were only
two goods and two consumers.
The set of allocations we have identified as equilibria of the process of voluntary
exchange is known as the ‘core’, and we define this term for future reference.
Can we be assured that every exchange economy possesses at least one allocation in
the core? That is, must there exist at least one feasible and unblocked allocation? As we
shall later show, the answer is yes under a number of familiar conditions.
We have argued that under ideal circumstances, including the costless nature of both
the formation of coalitions and the acquisition of the information needed to arrange mutu-
ally beneficial trades, consumers are led, through the process of voluntary exchange, to
pursue the attainment of allocations in the core. From this point of view, points in the core
seem very far indeed from becoming a reality in a real-world economy. After all, most of us
have little or no direct contact with the vast majority of other consumers. Consequently, one
would be quite surprised were there not substantial gains from trade left unrealised, regard-
less of how the economy were organised – centrally planned, market-based, or otherwise.
In the next section, we investigate economies organised by competitive markets. Prepare
for a surprise.
markets, demands a bundle that is best for him, without the need to consider what other
consumers might demand, being fully confident that sufficient production has taken place.
Similarly, producers, also fully aware of the prevailing prices of all goods (both inputs and
outputs), choose amounts of production that maximise their profits, without the need to
consider how much other producers are producing, being fully confident that their output
will be purchased.
The naivete expressed in the decentralised aspect of the competitive model (i.e.,
that every agent acts in his own self-interest while ignoring the actions of others) should
be viewed as a strength. Because in equilibrium consumers’ demands will be satisfied,
and because producers’ outputs will be purchased, the actions of the other agents can be
ignored and the only information required by consumers and producers is the prevailing
prices. Consequently, the informational requirements of this model are minimal. This is in
stark contrast to the barter model of trade developed in the previous section in which each
consumer requires very detailed information about all other consumers’ preferences and
bundles.
Clearly, the optimality of ignoring others’ actions requires that at prevailing prices
consumer demands are met and producer supplies are sold. So, it is essential that prices are
able to clear all markets simultaneously. But is it not rather bold to presume that a suitable
vector of prices will ensure that the diverse tastes of consumers and the resulting totality
of their demands will be exactly matched by the supplies coming from the production side
of the market, with its many distinct firms, each being more or less adept at producing
one good or another? The existence of such a vector of prices is not obvious at all, but the
coherence of our competitive model requires such a price vector to exist.
To give you a feeling for the potential for trouble on this front, suppose that there are
just three goods and that at current prices the demand for good 1 is equal to its supply, so
this market is in equilibrium. However, suppose that there is excess demand for good 2 and
excess supply of good 3, so that neither of these markets clears at current prices. It would
be natural to suppose that one can achieve equilibrium in these markets by increasing the
price of good 2 and decreasing the price of good 3. Now, while this might help to reduce
the difference between demand and supply in these markets, these price changes may very
well affect the demand for good 1! After all if goods 1 and 2 are substitutes, then increases
in the price of good 2 can lead to increases in the demand for good 1. So, changing the
prices of goods 2 and 3 in an attempt to equilibrate those markets can upset the equilibrium
in the market for good 1.
The interdependence of markets renders the existence of an equilibrium price vector
a subtle issue indeed. But again, the existence of a vector of prices that simultaneously
clears all markets is essential for employing the model of the consumer and producer
developed in Chapters 1 and 3, where we assumed that demands were always met and
supplies always sold. Fortunately, even though it is not at all obvious, we can show (with a
good deal of effort) that under some economically meaningful conditions, there does exist
at least one vector of prices that simultaneously clears all markets. We now turn to this
critical question.
GENERAL EQUILIBRIUM 203
The constraint in (5.2) simply expresses the consumer’s usual budget constraint but explic-
itly identifies the source of a consumer’s income. Intuitively, one can imagine a consumer
selling his entire endowment at prevailing market prices, receiving income, p · ei , and then
facing the ordinary constraint that expenditures, p · xi , not exceed income. The solution
xi (p, p · ei ) to (5.2) is the consumer’s demanded bundle, which depends on market prices
and the consumer’s endowment income. We record here a familiar result that we will need
later.
Recall that existence of a solution follows because p 0 implies that the budget set
is bounded, and uniqueness follows from the strict quasiconcavity of ui . Continuity at p
follows from Theorem A2.21 (the theorem of the maximum), and this requires p 0. We
emphasise here that xi (p, p · ei ) is not continuous in p on all of Rn+ because demand may
well be infinite if one of the prices is zero. We will have to do a little work later to deal
with this unpleasant, yet unavoidable, difficulty.
We can interpret the consumer’s endowment ei as giving the quantity of each of the
n goods that he inelastically supplies on the various markets.
2 Recallthat a function is strongly increasing if strictly raising one component in the domain vector and lower-
ing none strictly increases the value of the function. Note also that Cobb-Douglas utilities are neither strongly
increasing nor strictly quasiconcave on all of Rn+ and so are ruled out by Assumption 5.1.
204 CHAPTER 5
follows because when ui is strongly increasing, each consumer’s budget constraint holds
with equality.
When the budget constraint in (5.2) holds with equality,
n
pk xki (p, p · ei ) − eik = 0.
k=1
n
pk xki (p, p · ei ) − eik = 0.
i∈I k=1
Because the order of summation is immaterial, we can reverse it and write this as
n
pk xki (p, p · ei ) − eik = 0.
k=1 i∈I
n
pk xki (p, p · ei ) − eik = 0.
k=1 i∈I i∈I
From Definition 5.4, the term in parentheses is the aggregate excess demand for good k,
so we have
n
pk zk (p) = 0,
k=1
Walras’ law has some interesting implications. For example, consider a two-good
economy and suppose that prices are strictly positive. By Walras’ law, we know that
If there is excess demand in market 1, say, so that z1 (p) > 0, we know immediately that we
must have z2 (p) < 0, or excess supply in market 2. Similarly, if market 1 is in equilibrium
at p, so that z1 (p) = 0, Walras’ law ensures that market 2 is also in equilibrium with
z2 (p) = 0. Both of these ideas generalise to the case of n markets. Any excess demand in
the system of markets must be exactly matched by excess supply of equal value at the given
prices somewhere else in the system. Moreover, if at some set of prices n − 1 markets are
in equilibrium, Walras’ law ensures the nth market is also in equilibrium. This is often
quite useful to remember.
206 CHAPTER 5
Now consider a market system described by some excess demand function, z(p).
We know that excess demand in any particular market, zk (p), may depend on the prices
prevailing in every market, so that the system of markets is completely interdependent.
There is a partial equilibrium in the single market k when the quantity of commodity k
demanded is equal to the quantity of k supplied at prevailing prices, or when zk (p) = 0. If,
at some prices p, we had z(p) = 0, or demand equal to supply in every market, then we
would say that the system of markets is in general equilibrium. Prices that equate demand
and supply in every market are called Walrasian.3
3 Note that we restrict attention to positive prices. Strictly speaking, there is no reason to do so. However, under
our assumption that consumers’ utility functions are strongly increasing, aggregate excess demand can be zero
only if all prices are positive. See Exercise 5.3.
GENERAL EQUILIBRIUM 207
Before giving the proof, let us consider the three conditions in the theorem. The
first two are familiar and are guaranteed to hold under the hypotheses of Theorem 5.2.
Only the third, rather ominous-looking condition, is new. What it says is actually very
easy to understand, however. It says roughly that if the prices of some but not all goods
are arbitrarily close to zero, then the (excess) demand for at least one of those goods is
arbitrarily high. Put this way, the condition sounds rather plausible. Later, we will show
that under Assumption 5.1, condition 3 is satisfied.
Before getting into the proof of the theorem, we remark that it is here where the lack
of continuity of consumer demand, and hence aggregate excess demand, on the boundary
of the non-negative orthant of prices requires us to do some hard work. In particular, you
will note that in a number of places, we take extra care to stay away from that boundary.
Proof: For each good, k, let z̄k (p) = min(zk (p), 1) for all p 0, and let z̄(p) =
(z̄1 (p), . . . , z̄n (p)). Thus, we are assured that z̄k (p) is bounded above by 1.
Now, fix ε ∈ (0, 1), and let
n ε
Sε = p pk = 1 and pk ≥ ∀k .
1 + 2n
k=1
and let f (p) = (f1 (p), . . . , fn (p)). Consequently, nk=1 fk (p) = 1 and fk (p) ≥ ε/(nε + 1 +
n · 1), because z̄m (p) ≤ 1 for each m. Hence, fk (p) ≥ ε/(1 + 2n) because ε < 1. Therefore
f : Sε → S ε .
Note now that each fk is continuous on Sε because, by condition 1 of the statement
of the theorem, zk (·), and therefore z̄k (·), is continuous on Sε , so that both the numerator
and denominator defining fk are continuous on Sε . Moreover, the denominator is bounded
away from zero because it always takes on a value of at least 1.
Therefore, f is a continuous function mapping the non-empty, compact, convex set
Sε into itself. We may then appeal to Brouwer’s fixed-point theorem (Theorem A1.11) to
conclude that there exists pε ∈ Sε such that f (pε ) = pε , or, equivalently, that fk (pε ) = pεk
for every k = 1, 2, . . . , n. But this means, using the definition of fk (pε ) and rearranging,
that for every k
n
ε
pk nε + max(0, z̄m (p )) = ε + max(0, z̄k (pε )).
ε
(P.1)
m=1
So, up to this point, we have shown that for every ε ∈ (0, 1) there is a price vector in Sε
satisfying (P.1).
Now allow ε to approach zero and consider the associated sequence of price vectors
{pε } satisfying (P.1). Note that the price sequence is bounded, because pε ∈ Sε implies
that the price in every market always lies between zero and one. Consequently, by Theorem
A1.8, some subsequence of {pε } must converge. To keep the notation simple, let us suppose
that we were clever enough to choose this convergent subsequence right from the start so
that {pε } itself converges to p∗ , say. Of course, p∗ ≥ 0 and p∗ = 0 because its components
sum to 1. We argue that in fact, p∗ 0. This is where condition 3 enters the picture.
Let us argue by way of contradiction. So, suppose it is not the case that p∗ 0.
Then for some k̄, we must have p∗k̄ = 0. But condition 3 of the statement of the theorem
then implies that there must be some good k
with p∗k
= 0 such that zk
(pε ) is unbounded
above as ε tends to zero.
GENERAL EQUILIBRIUM 209
n
p∗k max(0, z̄m (p∗ )) = max(0, z̄k (p∗ )) (P.2)
m=1
for all k = 1, 2, . . . , n. Multiplying both sides by zk (p∗ ) and summing over k yields
n n
∗ ∗ ∗
p · z(p ) max(0, z̄m (p )) = zk (p∗ ) max(0, z̄k (p∗ )).
m=1 k=1
Now, condition 2 in the statement of the theorem (Walras’ law) says that p∗ z(p∗ ) =
0, so we may conclude that the left-hand side and therefore also the right-hand side of the
preceding equation is zero. But because the sign of z̄k (p∗ ) is the same as that of zk (p∗ ),
the sum on the right-hand side can be zero only if zk (p∗ ) ≤ 0 for all k. This, together with
p∗ 0 and Walras’ law implies that each zk (p∗ ) = 0, as desired.
Proof: Conditions 1 and 2 follow from Theorem 5.2. Thus, it remains only to verify con-
dition 3. Consider a sequence of strictly positive
I price vectors, {pm }, converging
to p̄ = 0,
such that p̄k = 0 for some good k. Because i=1 e 0, we must have p̄ · Ii=1 ei > 0.
i
Consequently, p̄ · Ii=1 ei = Ii=1 p̄ · ei > 0, so that there must be at least one consumer
i for whom p̄ · ei > 0.
210 CHAPTER 5
Consider this consumer i’s demand, xi (pm , pm · ei ), along the sequence of prices.
Now, let us suppose, by way of contradiction, that this sequence of demand vectors is
bounded. Then, by Theorem A1.8, there must be a convergent subsequence. So we may
assume without any loss (by reindexing the subsequence, for example) that the original
sequence of demands converges to x∗ , say. That is, xi (pm , pm · ei ) → x∗ .
To simplify the notation, let xm ≡ xi (pm , pm · ei ) for every m. Now, because xm max-
imises ui subject to i’s budget constraint given the prices pm , and because ui is strongly
(and, therefore, strictly) increasing, the budget constraint must be satisfied with equality.
That is,
pm · xm = pm · ei
for every m.
Taking the limit as m → ∞ yields
p̄ · x∗ = p̄ · ei > 0, (P.1)
p̄ · x̂ = p̄ · ei > 0. (P.3)
So, because ui is continuous, (P.2) and (P.3) imply that there is a t ∈ (0, 1) such that
But because pm → p̄, xm → x∗ and ui is continuous, this implies that for m large
enough,
contradicting the fact that xm solves the consumer’s problem at prices pm . We conclude
therefore that consumer i’s sequence of demand vectors must be unbounded.
Now because i’s sequence of demand vectors, {xm }, is unbounded yet non-negative,
there must be some good k
such that {xkm
} is unbounded above. But because i’s income
converges to p̄ · ei , the sequence of i’s income {pm · ei } is bounded. (See Exercise 5.8.)
Consequently, we must have pm k
→ 0, because this is the only way that the demand for
good k
can be unbounded above and affordable. Consequently, p̄k
= limm pm k
= 0.
GENERAL EQUILIBRIUM 211
We now can state an existence result in terms of the more primitive elements of the
model. The next theorem follows directly from Theorems 5.4 and 5.3.
EXAMPLE 5.1 Let us take a simple two-person economy and solve for a Walrasian
equilibrium. Let consumers 1 and 2 have identical CES utility functions,
ρ ρ
ui (x1 , x2 ) = x1 + x2 , i = 1, 2,
where 0 < ρ < 1. Let there be 1 unit of each good and suppose each consumer owns all
of one good, so initial endowments are e1 = (1, 0) and e2 = (0, 1). Because the aggregate
endowment of each good is strictly positive and the CES form of utility is strongly increas-
ing and strictly quasiconcave on Rn+ when 0 < ρ < 1, the requirements of Theorem 5.5
are satisfied, so we know a Walrasian equilibrium exists in this economy.
From (E.10) and (E.11) in Example 1.1, consumer i’s demand for good j at prices
p will be xji (p, yi ) = pr−1
j y /(p1 + p2 ), where r ≡ ρ/(ρ − 1), and y is the consumer’s
i r r i
Because only relative prices matter, and because we know from Theorem 5.5 that
there is an equilibrium in which all prices are strictly positive, we can choose a convenient
normalisation to simplify calculations. Let p̄ ≡ (1/p2 )p. Here, p̄1 ≡ p1 /p2 and p̄2 ≡ 1, so
p̄1 is just the relative price of the good x1 . Because each consumer’s demand at p is the
same as the demand at p̄, we can frame our problem as one of finding an equilibrium set
of relative prices, p̄.
Now consider the market for good 1. Assuming an interior solution, equilibrium
requires p̄∗ where total quantity demanded equals total quantity supplied, or where
Solving, we obtain p̄∗1 = 1. We conclude that any vector p∗ where p∗1 = p∗2 , equates
demand and supply in market 1. By Walras’ law, those same prices must equate demand
and supply in market 2, so we are done.
5.2.2 EFFICIENCY
We can adapt the Edgeworth box description of a two-person economy to gain useful per-
spective on the nature of Walrasian equilibrium. Fig. 5.4 represents an economy where
preferences satisfy the requirements of Theorem 5.5. Initial endowments are (e11 , e12 ) and
(e21 , e22 ), and the box is constructed so these two points coincide at e, as before. At relative
prices p∗1 /p∗2 , consumer 1’s budget constraint is the straight line through e when viewed
from 1’s origin. Facing the same prices, consumer 2’s budget constraint will coincide with
that same straight line when viewed (upside down) from 2’s origin. Consumer 1’s most
preferred bundle within her budget set is (x11 , x21 ), giving the quantities of each good con-
sumer 1 demands facing prices p∗1 /p∗2 and having income equal to the market value of her
endowment, p∗1 e11 + p∗2 e12 . Similarly, consumer 2’s demanded bundle at these same prices
with income equal to the value of his endowment is (x12 , x22 ). Equilibrium in the market
for good 1 requires x11 + x12 = e11 + e21 , or that total quantity demanded equal total quantity
supplied. This, of course, is equivalent to the requirement x12 − e21 = e11 − x11 , or that con-
sumer 2’s net demand be equal to consumer 1’s net supply of good 1. A similar description
of equilibrium in the market for good 2 also can be given.
A little experimentation with different relative prices, and so different budget sets
for the two consumers, should convince you that these conditions for market equilibrium
will obtain only when the demanded bundles – viewed from the consumers’ respective
origins – coincide with the same point in the box, as in Fig. 5.4. Because by construc-
tion one indifference curve for each consumer passes through every point in the box, and
because equilibrium requires the demanded bundles coincide, it is clear that equilibrium
GENERAL EQUILIBRIUM 213
x 12 ⫺ e 12
x 12 e 12
x2 02
c
x 21 x 22
c
e
e 21 e 22
C ⫺p 1* / p 2*
01 x1
x 11 e11
e11 ⫺ x 11
will involve tangency between the two consumers’ indifference curves through their
demanded bundles, as illustrated in the figure.
There are several interesting features of Walrasian equilibrium that become imme-
diately apparent with the perspective of the box. First, as we have noted, consumers’
supplies and demands depend only on relative prices. Doubling or tripling all prices
will not change the consumers’ budget sets, so will not change their utility-maximising
market behaviour. Second, Fig. 5.4 reinforces our understanding that market equilibrium
amounts to the simultaneous compatibility of the actions of independent, decentralised,
utility-maximising consumers.
Finally, Fig. 5.4 gives insight into the distributional implications of competitive mar-
ket equilibrium. We have noted that equilibrium there is characterised by a tangency of the
consumers’ indifference curves through their respective demanded bundles. These bundles,
in turn, give the final amount of each good owned and consumed by the consumer in the
market system equilibrium. Thus, having begun with some initial distribution of the goods
given by e, the maximising actions of self-interested consumers on impersonal markets
has led to a redistribution of goods that is both ‘inside the lens’ formed by the indiffer-
ence curves of each consumer through their respective endowments and ‘on the contract
curve’. In the preceding section, we identified allocations such as these as in the ‘core’ of
the economy with endowments e. Thus, despite the fact that in the competitive market we
have considered here, consumers do not require knowledge of other consumers’ prefer-
ences or endowments, the allocation resulting from Walrasian equilibrium prices is in the
214 CHAPTER 5
core, at least for the Edgeworth box economy. As we now proceed to show, this remarkable
property holds in general. We begin by defining some notation.
where component i gives the n-vector of goods demanded and received by consumer i at
prices p∗ . Then x(p∗ ) is called a Walrasian equilibrium allocation, or WEA.
Now consider an economy with initial endowments e and feasible allocations F(e)
defined in (5.1). We should note some basic properties of the WEA in such economies.
First, it should be obvious that any WEA will be feasible for this economy. Second, Fig. 5.4
makes clear that the bundle received by every consumer in a WEA is the most preferred
bundle in that consumer’s budget set at the Walrasian equilibrium prices. It therefore fol-
lows that any other allocation that is both feasible and preferred by some consumer to their
bundle in the WEA must be too expensive for that consumer. Indeed, this would follow
even if the price vector were not a Walrasian equilibrium. We record both of these facts as
lemmas and leave the proof of the first and part of the proof of the second as exercises.
LEMMA 5.1 Let p∗ be a Walrasian equilibrium for some economy with initial endowments e. Let x(p∗ )
be the associated WEA. Then x(p∗ ) ∈ F(e).
LEMMA 5.2 Suppose that ui is strictly increasing on Rn+ , that consumer i’s demand is well-defined at
p ≥ 0 and equal to x̂i , and that xi ∈ Rn+ .
i. If ui (xi ) > ui (x̂i ), then p · xi > p · x̂i .
ii. If ui (xi ) ≥ ui (x̂i ), then p · xi ≥ p · x̂i .
Proof: We leave the first for you to prove as an exercise. So let us suppose that (i) holds.
We therefore can employ it to prove (ii).
Suppose, by way of contradiction, that (ii) does not hold. Then ui (xi ) ≥ ui (x̂i ) and
p · x < p · x̂i . Consequently, beginning with xi , we may increase the amount of every good
i
consumed by a small enough amount so that the resulting bundle, x̄i , remains strictly less
expensive than x̂i . But because ui is strictly increasing, we then have ui (x̄i ) > ui (xi ) ≥
ui (x̂i ), and p · x̄i < p · x̂i . But this contradicts (i) with xi replaced by x̄i .
It bears noting, in general, that we have no reason to expect that when WEAs exist,
they will be unique. Even in the two-person Edgeworth box economy, it is easy to con-
struct examples where preferences satisfy very ordinary properties yet multiple Walrasian
equilibrium allocations exist. Fig. 5.5 illustrates such a case. It seems prudent, therefore, to
keep such possibilities in mind and avoid slipping into the belief that Walrasian equilibria
GENERAL EQUILIBRIUM 215
x2 02
01 x1
are ‘usually’ unique. As a matter of notation, then, let us give a name to the set of WEAs
in an economy.
We now arrive at the crux of the matter. It is clear in both Figs. 5.4 and 5.5 that the
WEAs involve allocations of goods to consumers that lie on the segment cc of the contract
curve representing the core of those economies. It remains to show that WEAs have this
property in arbitrary economies. Recall that C(e) denotes the set of allocations in the core.
W(e) ⊂ C(e).
Proof: The theorem claims that if x(p∗ ) is a WEA for equilibrium prices p∗ , then x(p∗ ) ∈
C(e). To prove it, suppose x(p∗ ) is a WEA, and assume x(p∗ ) ∈ / C(e).
Because x(p∗ ) is a WEA, we know from Lemma 5.1 that x(p∗ ) ∈ F(e), so x(p∗ ) is
feasible. However, because x(p∗ ) ∈ / C(e), we can find a coalition S and another allocation
y such that
yi = ei (P.1)
i∈S i∈S
216 CHAPTER 5
and
Now from (P.2) and Lemma 5.2, we know that for each i ∈ S, we must have
p∗ · yi ≥ p∗ · xi (p∗ , p∗ · ei ) = p∗ · ei , (P.4)
with at least one inequality strict. Summing over all consumers in S, we obtain
p∗ · yi > p∗ · ei ,
i∈S i∈S
Note, in particular, that because all core allocations are Pareto efficient, so, too, must
be all Walrasian equilibrium allocations. Although we have proven more, this alone is
quite remarkable. Imagine being charged with allocating all the economy’s resources, so
that in the end, the allocation is Pareto efficient. To keep you from giving all the resources
to one person, let us also insist that in the end, every consumer must be at least as well
off as they would have been just consuming their endowment. Think about how you might
accomplish this. You might start by trying to gather information about the preferences of
all consumers in the economy. (What a task that would be!) Only then could you attempt to
redistribute goods in a manner that left no further gains from trade. As incredibly difficult
as this task is, the competitive market mechanism achieves it, and more. To emphasise the
fact that competitive outcomes are Pareto efficient, we state it as a theorem, called the First
Welfare Theorem.
Proof: The proof follows immediately from Theorem 5.6 and the observation that all core
allocations are Pareto efficient.
Theorem 5.7 provides some specific support for Adam Smith’s contention that
society’s interests are served by an economic system where self-interested actions of
individuals are mediated by impersonal markets. If conditions are sufficient to ensure
that Walrasian equilibria exist, then regardless of the initial allocation of resources, the
allocation realised in market equilibrium will be Pareto efficient.
It is extremely important to appreciate the scope of this aspect of competitive market
systems. It is equally important to realise its limitations and to resist the temptation to read
more into what we have shown than is justified. Nothing we have argued so far should lead
us to believe that WEAs are necessarily ‘socially optimal’ if we include in our notion of
social optimality any consideration for matters of ‘equity’ or ‘justice’ in distribution. Most
would agree that an allocation that is not Pareto efficient is not even a candidate for the
socially best, because it would always be possible to redistribute goods and make someone
better off and no one worse off. At the same time, few could argue persuasively that every
Pareto-efficient distribution has an equal claim to being considered the best or ‘most just’
from a social point of view.
In a later chapter, we give fuller consideration to normative issues such as these. For
now, a simple example will serve to illustrate the distinction. Consider an economy with
total endowments given by the dimensions of the Edgeworth box in Fig. 5.6. Suppose by
some unknown means society has identified the distribution x̄ as the socially best. Suppose,
in addition, that initial endowments are given by the allocation e. Theorem 5.6 tells us that
an equilibrium allocation under a competitive market system will be some allocation in
C(e), such as x
, which in this case is quite distinct from x̄. Thus, while competitive market
systems can improve on an initial distribution that is not itself Pareto efficient, there is no
218 CHAPTER 5
x2 02
e* C
e
x
x⬘
C
⫺p 1*/ p 2*
01 x1
assurance a competitive system, by itself, will lead to a final distribution that society as a
whole views as best.
Before we become unduly pessimistic, let us consider a slightly different question.
If by some means, we can determine the allocation we would like to see, can the power of
a decentralised market system be used to achieve it? From Fig. 5.6, it seems this should be
so. If initial endowments could be redistributed to e∗ , it is clear that x̄ is the allocation that
would be achieved in competitive equilibrium with those endowments and prices p∗ .
In fact, this is an example of a rather general principle. It can be shown that under
certain conditions, any Pareto-efficient allocation can be achieved by competitive markets
and some initial endowments. This result is called the Second Welfare Theorem.
Consequently, we may apply Theorem 5.5 to conclude that the exchange economy
(ui , x̄i )i∈I possesses a Walrasian equilibrium allocation x̂. It only remains to show that
x̂ = x̄.
Now in the Walrasian equilibrium, each consumer’s demand is utility maximising
subject to her budget constraint. Consequently, because i demands x̂i , and has endowment
GENERAL EQUILIBRIUM 219
omy as well.
Thus, by (P.1), x̂ is feasible for the original economy and makes no consumer worse
off than the Pareto-efficient (for the original economy) allocation x̄. Therefore, x̂ cannot
make anyone strictly better off; otherwise, x̄ would not be Pareto efficient. Hence, every
inequality in (P.1) must be an equality.
To see now that x̂i = x̄i for every i, note that if for some consumer this were
not the case, then in the Walrasian equilibrium of the new economy, that consumer
could afford the average of the bundles x̂i and x̄i and strictly increase his utility (by
strict quasiconcavity), contradicting the fact that x̂i is utility-maximising in the Walrasian
equilibrium.
One can view the Second Welfare Theorem as an affirmative answer to the follow-
ing question: is a system that depends on decentralised, self-interested decision making
by a large number of consumers capable of sustaining the socially ‘best’ allocation of
resources, if we could just agree on what that was? Under the conditions stated before,
the Second Welfare Theorem says yes, as long as socially ‘best’ requires, at least, Pareto
efficiency.
Although we did not explicitly mention prices in the statement of the Second Welfare
Theorem, or in its proof, they are there in the background. Specifically, the theorem says
that there are Walrasian equilibrium prices, p̄, such that when the endowment allocation
is x̄, each consumer i will maximise ui (xi ) subject to p̄ · xi ≤ p̄ · x̄i by choosing xi = x̄i .
Because of this, the prices p̄ are sometimes said to support the allocation x̄.
We began discussing the Second Welfare Theorem by asking whether redistribution
to a point like e∗ in Fig. 5.6 could yield the allocation x̄ as a WEA. In the theorem, we
showed that the answer is yes if endowments were redistributed to x̄ itself. It should be
clear from Fig. 5.6, however, that x̄ in fact will be a WEA for market prices p̄ under a
redistribution of initial endowments to any point along the price line through x̄, including,
of course, to e∗ . This same principle applies generally, so we have an immediate corollary
to Theorem 5.8. The proof is left as an exercise.
5.3.1 PRODUCERS
To describe the production sector, we suppose there is a fixed number J of firms that we
index by the set
J = {1, . . . , J}.
We now let y j ∈ Rn be a production plan for some firm, and observe the convention of
j j
writing yk < 0 if commodity k is an input used in the production plan and yk > 0 if it is an
output produced from the production plan. If, for example, there are two commodities and
y j = (−7, 3), then the production plan requires 7 units of commodity one as an input, to
produce 3 units of commodity two as an output.
To summarise the technological possibilities in production, we return to the most
general description of the firm’s technology, first encountered in Section 3.2, and sup-
pose each firm possesses a production possibility set, Y j , j ∈ J . We make the following
assumptions on production possibility sets.
1. 0 ∈ Y j ⊆ Rn .
2. Y j is closed and bounded.
3. Y j is strongly convex. That is, for all distinct y1 , y2 ∈ Y j and all t ∈ (0, 1), there
exists ȳ ∈ Y j such that ȳ ≥ ty1 + (1 − t)y2 and equality does not hold.
GENERAL EQUILIBRIUM 221
The first of these guarantees firm profits are bounded from below by zero, and the
second that production of output always requires some inputs. The closedness part of
the second condition imposes continuity. It says that the limits of possible production
plans are themselves possible production plans. The boundedness part of this condition
is very restrictive and is made only to keep the analysis simple to follow. Do not be
tempted into thinking that it merely expresses the idea that resources are limited. For the
time being, regard it as a simplifying yet dispensable assumption. We shall discuss the
importance of removing this assumption a little later. The third assumption, strong con-
vexity, is new. Unlike all the others, which are fairly weak restrictions on the technology,
strong convexity is a more demanding requirement. In effect, strong convexity rules out
constant and increasing returns to scale in production and ensures that the firm’s profit-
maximising production plan is unique. Although Assumption 5.2 does not impose it, all
of our results to follow are consistent with the assumption of ‘no free production’ (i.e.,
Y j ∩ Rn+ = {0}).
Each firm faces fixed commodity prices p ≥ 0 and chooses a production plan to
maximise profit. Thus, each firm solves the problem
max p · y j (5.3)
y j ∈Y j
Note how our sign convention ensures that inputs are accounted for in profits as costs and
outputs as revenues. Because the objective function is continuous and the constraint set
closed and bounded, a maximum of firm profit will exist. So, for all p ≥ 0 let
j (p) ≡ max p · y j
y j ∈Y j
denote firm j’s profit function. By Theorem A2.21 (the theorem of the maximum), j (p)
is continuous on Rn+ . As you are asked to show in Exercise 5.23, strong convexity ensures
that the profit-maximising production plan, y j (p), will be unique whenever p 0. Finally,
from Theorem A2.21 (the theorem of the maximum), y j (p) will be continuous on Rn++ .
Note that for p 0, y j (p) is a vector-valued function whose components are the firm’s
output supply and input demand functions. However, we often simply refer to y j (p) as
firm j’s supply function. We record these properties for future reference.
Finally, note that maximum firm profits are homogeneous of degree 1 in the vector
of commodity prices. Each output supply and input demand function will be homogeneous
of degree zero in prices. (See Theorems 3.7 and 3.8.)
222 CHAPTER 5
The set Y will inherit all the properties of the individual production sets, and we take note
of that formally.
We shall leave the proof of this as an exercise. Conditions 1, 3, and the bounded-
ness of Y follow directly from those properties of the Y j . The closedness of Y does not
follow simply from the closedness of the individual Y j ’s. However, under our additional
assumption that the Y j ’s are bounded, Y can be shown to be closed.
Now consider the problem of maximising aggregate profits. Under Theorem 5.10,
a maximum of p · y over the aggregate production set Y will exist and be unique
when p 0. In addition, the aggregate profit-maximising production plan y(p) will be
a continuous function of p. Moreover, we note the close connection between aggre-
gate profit-maximising production plans and individual firm profit-maximising production
plans.
p · ȳ ≥ p · y for all y ∈ Y
if and only if for some ȳ j ∈ Y j , j ∈ J , we may write y = ¯ j∈J ȳ j , and
p · ȳ j ≥ p · y j for all y j ∈ Y j , j ∈ J .
In words, the theorem says that ȳ ∈ Y maximises aggregate profit if and only if it
can be decomposed into individual firm profit-maximising production plans. The proof is
straightforward.
Proof: Let ȳ ∈ Y maximise aggregate profits at price p. Suppose that ȳ ≡ j∈J ȳ j for
ȳ j ∈ Y j . If ȳk does not maximise profits for firm k, then there exists some other ỹk ∈ Y k
that gives firm k higher profits. But then the aggregate production vector ỹ ∈ Y composed
of ỹk and the sum of the ȳ j for j = k must give higher aggregate profits than the aggregate
vector ȳ, contradicting the assumption that ȳ maximises aggregate profits at price p.
GENERAL EQUILIBRIUM 223
Next, suppose feasible production plans ȳ1 , . . . , ȳ j maximise profits at price p for
the individual firms in J . Then
p · ȳ j ≥ p · y j for y j ∈ Y j and j ∈ J .
p · ȳ ≥ p · y for y ∈ Y,
5.3.2 CONSUMERS
Formally, the description of consumers is just as it has always been. However, we need to
modify some of the details to account for the distribution of firm profits because firms are
owned by consumers. As before, we let
I ≡ {1, . . . , I}
index the set of consumers and let ui denote i’s utility function over the consumption
set Rn+ .
Before continuing, note that our assumption that consumer bundles are non-negative
does not preclude the possibility that consumers supply goods and services to the market.
Indeed, labour services are easily included by endowing the consumer with a fixed number
of hours that are available for consumption. Those that are not consumed as ‘leisure’ are
then supplied as labour services. If the consumer’s only source of income is his endow-
ment, then just as before, whether a consumer is a net demander or supplier of a good
depends upon whether his (total) demand falls short of or exceeds his endowment of that
good.
Of course, we must here also take account of the fact that consumers receive income
from the profit earned by firms they own. In a private ownership economy, which we shall
consider here, consumers own shares in firms and firm profits are distributed to sharehold-
ers. Consumer i’s shares in firm j entitle him to some proportion 0 ≤ θ ij ≤ 1 of the profits
224 CHAPTER 5
of firm j. Of course, these shares, summed over all consumers in the economy, must sum
to 1. Thus,
where
θ ij = 1 for all j ∈ J.
i∈I
By letting mi (p) denote the right-hand side of (5.4), the consumer’s problem is
Now, under Assumption 5.2, each firm will earn non-negative profits because each
can always choose the zero production vector. Consequently, mi (p) ≥ 0 because p ≥ 0
and ei ≥ 0. Therefore, under Assumptions 5.1 and 5.2, a solution to (5.5) will exist and
be unique whenever p 0. Again, we denote it by xi (p, mi (p)), where mi (p) is just the
consumer’s income.
Recall from Chapter 1 that under the assumptions we made there (and also here),
xi (p, y) is continuous in (p, y) ∈ Rn++ × Rn+ . Consequently, as long as mi (p) is continuous
in p, xi (p, mi (p)) will be continuous in p. By appealing to Theorem 5.9, we see that mi (p)
is continuous on Rn+ under Assumption 5.2. Putting this all together we have the following
theorem.
5.3.3 EQUILIBRIUM
As in the case with no production, we can again define a real-valued aggregate excess
demand function for each commodity market and a vector-valued aggregate excess demand
function for the economy as a whole. Aggregate excess demand for commodity k is
j
zk (p) ≡ xki (p, mi (p)) − yk (p) − eik ,
i∈I j∈J i∈I
As before (see Definition 5.5), a Walrasian equilibrium price vector p∗ 0 clears all
markets. That is, z(p∗ ) = 0.
satisfies Assumption 5.2, and y + i∈I e 0 for some aggregate production vector y ∈
i
∗ ∗
j∈J Y , then there exists at least one price vector p 0, such that z(p ) = 0.
j
Recall that when there was no production, we required the aggregate endowment
vector to be strictly positive to guarantee existence. With production, that condition can be
weakened to requiring that there is a feasible production vector
for this economy whose net
result is a strictly positive amount of every good (i.e., y + i∈I ei 0 for some aggregate
production vector y).
Proof: We shall get the proof started, and leave the rest for you to complete as an exer-
cise. The idea is to show that under the assumptions above, the aggregate excess demand
function satisfies the conditions of Theorem 5.3. Because production sets are bounded and
consumption is non-negative, this reduces to showing that some consumer’s demand for
some good is unbounded as some, but not all, prices approach zero. (However, you should
check even this logic as you complete the proof for yourself.) Therefore, we really need
only mimic the proof of Theorem 5.4.
So, consider a sequence of strictly positive price vectors, {pm }, converging to p̄ = 0,
such that p̄k = 0 for some good k. We would like to show that for some, possibly other,
good k
with p̄k
= 0, the sequence {zk
(pm )}, of excess demands for good k
is unbounded.
Recall that our first step in the proof of Theorem 5.4 was to identify a consumer
whose income was strictly positive at the limit price vector p̄. This is where we shall use
the new condition on net aggregate production.
Because y + Ii=1 ei 0 for some aggregate production vector y, and because the
non-zero price vector p̄ has no negative components, we must have p̄ · (y + Ii=1 ei ) > 0.
226 CHAPTER 5
Consequently, recalling that both mi (p) and j (p) are well-defined for all p ≥ 0,
m (p̄) =
i
p̄ · e +
i
θ (p̄)
ij j
where the first equality follows by the definition of mi (p̄), the second follows because total
non-endowment income is simply aggregate profits, and the weak inequality follows from
Theorem 5.11, which ensures that the sum of individual firm maximised profits must be at
least as large as maximised aggregate profits and hence aggregate profits from y. Therefore,
there must exist at least one consumer whose income at prices p̄, mi (p̄), is strictly positive.
The rest of the proof proceeds now as in the proof of Theorem 5.4, and we leave it for you
to complete as an exercise. (You will need to use the result noted in Theorem 5.12 that
mi (p) is continuous on Rn+ .)
EXAMPLE 5.2 In the classic Robinson Crusoe economy, all production and all consump-
tion is carried out by a single consumer. Robinson the consumer sells his labour time h (in
hours) to Robinson the producer, who in turn uses the consumer’s labour services for that
amount of time to produce coconuts, y, which he then sells to Robinson the consumer. All
profits from the production and sale of coconuts are distributed to Robinson the consumer.
With only one firm, the production possibility set for the firm and the economy
coincide. Let that set be
Y = {(−h, y) | 0 ≤ h ≤ b, and 0 ≤ y ≤ hα },
So, for example, the production vector (−2, 2α ) is in the production set, which
means that it is possible to produce 2α coconuts by using 2 hours of Robinson’s time.
The set Y is illustrated in Fig. 5.7(a), and it is easy to verify that it satisfies all the
requirements of Assumption 5.2. Note that parameter b serves to bound the production set.
Because this bound is present for purely technical purposes, do not give it much thought.
In a moment, we will choose it to be large enough so that it is irrelevant.
As usual, the consumption set for Robinson the consumer is just the non-negative
orthant, which in this two-good case is R2+ . Robinson’s utility function is
u(h, y) = h1−β yβ ,
where β ∈ (0, 1). Here, h denotes the number of hours consumed by Robinson (leisure, if
you will), and y denotes the number of coconuts consumed. We will suppose that Robinson
is endowed with T > 0 units of h (i.e., T hours), and no coconuts. That is, e = (T, 0).
We will now choose b large enough so that b > T. Consequently, in any Walrasian
equilibrium, the constraint for the firm that h ≤ b will not be binding because in equilib-
rium the number of hours demanded by the firm cannot exceed the total available number
of hours, T.
This economy satisfies all the hypotheses of Theorem 5.13 except that Robinson’s
utility function, being of the Cobb-Douglas form, is neither strongly increasing nor strictly
quasiconcave on all of Rn+ . However, as you are asked to show in Exercise 5.14, the result-
ing aggregate excess demand function nonetheless satisfies the conditions of Theorem 5.3.
Consequently, a Walrasian equilibrium in strictly positive prices is guaranteed to exist. We
now calculate one.
Let p > 0 denote the price of coconuts, y, and w > 0 denote the price per hour
of Robinson’s time, h. (Thus, it makes sense to think of w as a wage rate.) Consumer
Robinson’s budget set, before including income from profits, is depicted in Fig. 5.7(b),
and Fig. 5.7(c) shows Robinson’s budget set when he receives his (100 per cent) share of
the firm’s profits, equal to π̄ in the figure.
To determine Walrasian equilibrium prices (w∗ , p∗ ), we shall first determine the
firm’s supply function (which, in our terminology also includes the firm’s demand for
hours of labour), then determine the consumer’s demand function, and finally put them
together to find market-clearing prices. We begin with Robinson the firm. From this point,
we use the terms firm and consumer and trust that you will keep in mind that both are in
fact Robinson.
Because it never pays the firm to waste hours purchased, it will always choose
(−h, y) ∈ Y, so that y = hα . Consequently, because we have chosen b large enough so
that it will not be a binding constraint, the firm will choose h ≥ 0 to maximise
phα − wh.
When α < 1, h = 0 will not be profit-maximising (as we shall see); hence, the
first-order conditions require setting the derivative with respect to h equal to zero, i.e.,
αphα−1 − w = 0. Rewriting this, and recalling that y = hα , gives the firm’s demand for
228 CHAPTER 5
x y
Y e ⫽ (T, 0)
h ⫺ ⫹ h
⫺b ⫺ 0 ⫹ 0 T
(a) (b)
py ⫹ wh ⫽ wT ⫹
h
0c T
/w
(c)
Figure 5.7. Production possibility set, Y, pre-profit budget line, and post-profit budget line in
the Robinson Crusoe economy.
hours, denoted h f , and its supply of output, denoted y f , as functions of the prices w, p:4
1/(1−α)
αp
h =
f
,
w
α/(1−α)
αp
yf = .
w
Note that profits are positive as long as prices are. (This shows that choosing h = 0 is not
profit-maximising just as we claimed earlier.)
4 In case you are keeping track of sign conventions, this means that (−h f , y f ) ∈ Y.
GENERAL EQUILIBRIUM 229
We now turn to the consumer’s problem. Robinson’s income is the sum of his endow-
ment income, (w, p) · (T, 0) = wT, and his income from his 100 per cent ownership in
the firm, π(w, p), the firm’s profits. So the consumer’s budget constraint, which will be
satisfied with equality because his utility function is strictly increasing, is
py + wh = wT + π(w, p).
He chooses (h, y) ≥ (0, 0) to maximise utility subject to this constraint. By now, you
are familiar with the demand functions of a consumer with Cobb-Douglas utility. He will
spend the fraction 1 − β of his total income on h and fraction β of it on y. So, letting hc
and yc denote the consumer’s demands, we have
We can now put all of this together to search for a price vector (w, p) that will
clear both markets. There are two simplifications we can make, however. The first is that
because aggregate excess demand is homogeneous of degree zero, and we are guaranteed
a Walrasian equilibrium in strictly positive prices, we may set the Walrasian equilibrium
price of y, p∗ , equal to one without any loss. The second is that we need now only find a
price w∗ so that the market for h clears, because by Walras’ law, the market for y will then
clear as well.
It thus remains to find w∗ such that hc + h f = T, or using the equations above and
setting p∗ = 1,
1/(1−α)
(1 − β)(w∗ T + π(w∗ , 1)) α
+ = T,
w∗ w∗
or
1/(1−α)
(1 − β)(1 − α) α 1/(1−α) α
∗
+ = βT,
α w w∗
where we have substituted for the firm’s profits to arrive at the second equality. It is
straightforward now to solve for w∗ to obtain the equilibrium wage
1−α
∗ 1 − β(1 − α)
w =α > 0.
αβT
We invite you to check that for this value of w∗ , and with p∗ = 1, both markets do indeed
clear.
230 CHAPTER 5
y y
py ⫹ wh ⫽ *
yf yc
py ⫹ wh ⫽ wT ⫹ *
0 h 0 h
⫺b hf ⫺ ⫹ */w* ⫺ ⫹ hc T
*/w*
(a) (b)
y y
0c 0f
h
⫺b
hc hp */w*
T
(c)
We can illustrate the equilibrium diagrammatically. Fig. 5.8(a) shows the firm’s
profit-maximising solution. The line given by π ∗ = py + wh is an iso-profit line for the
firm, because profits are constant and equal to π ∗ for every (h, y) on it. Note that when
(h, y) ∈ Y, h ≤ 0, so that py + wh is indeed the correct formula for profits in the figure.
Also note that this iso-profit line (and all others) has slope −w/p. Moreover, the iso-profit
line depicted yields the highest possible profits for the firm because higher profits would
require a production plan above the π ∗ iso-profit line, and none of those is in the production
set. Therefore, π ∗ = π(w∗ , 1).
Fig. 5.8(b) shows the consumer’s utility-maximising solution given the budget con-
straint py + wh = wT + π ∗ . Note the slope of the consumer’s budget constraint is −w/p,
which is the same as the slope of the firm’s iso-profit line.
Fig. 5.8(c) puts Figs. 5.8(a) and 5.8(b) together by superimposing the consumer’s
figure over the firm’s, placing the point marked T in the consumer’s figure onto the origin
in the firm’s figure. The origin for the consumer is marked as 0c and the origin for the firm
is 0f . Point A shows the Walrasian equilibrium allocation.
Fig. 5.8(c) allows us to conclude that this competitive equilibrium with production
is Pareto efficient. Consider the shaded region in the figure. With the origin at 0f , the
GENERAL EQUILIBRIUM 231
shaded region denotes the set of feasible production plans – those that can be actually
implemented in this economy, taking into account the available resources. Any production
plan in the shaded region can be carried out because it calls for no more than T hours, and
this is the total number of hours with which the economy is endowed. On the other hand, a
production plan like point B is technologically possible because it is in the production set,
but it is infeasible because it requires more than T hours.
Switching our point of view, considering 0c as the origin, the shaded region indicates
the set of feasible consumption bundles for this economy. With this in mind, it is clear that
the Walrasian allocation at A is Pareto efficient. It maximises Robinson’s utility among all
feasible consumption bundles.
Soon, we shall show that, just as in the case of a pure exchange economy, this is a
rather general result even with production.
bounds on them (which is essentially what we have done) and then letting the artificial
bounds become arbitrarily large (which we will not do). Under suitable conditions, this
will yield a competitive equilibrium of the economy with unbounded production sets.
For the record, strict convexity of preferences and strong convexity of firm produc-
tion possibility sets assumed in Theorem 5.13 are more stringent than needed to prove
existence of equilibrium. If, instead, merely convexity of preferences and production pos-
sibility sets is assumed, existence can still be proved, though the mathematical techniques
required are outside the scope of this book. If production possibility sets are convex, we
allow the possibility of constant returns to scale for firms. Constant returns introduces the
possibility that firm output supply and input demand functions will be set-valued rela-
tionships and that they will not be continuous in the usual way. Similarly, mere convexity
of preferences raises the possibility of set-valued demand functions together with similar
continuity problems. All of these can be handled by adopting generalised functions (called
‘correspondences’), an appropriately generalised notion of continuity, and then applying
a generalised version of Brouwer’s fixed-point theorem due to Kakutani (1941). In fact,
we can even do without convexity of individual firm production possibility sets altogether,
as long as the aggregate production possibility set is convex. The reader interested in
exploring all of these matters should consult Debreu (1959). But see also Exercise 5.22.
5.3.4 WELFARE
Here we show how Theorems 5.7 and 5.8 can be extended to an economy with production.
As before, we focus on properties of the allocations consumers receive in a Walrasian
equilibrium. In a production economy, we expand our earlier definition of Walrasian
equilibrium allocations as follows.
Throughout the remainder of this section, we shall be concerned with the fixed econ-
omy (ui , ei , θ ij , Y j )i∈I ,j∈J . Thus, all definitions and theorems are stated with this economy
in mind.
An allocation, (x, y) = ((x1 , . . . , xI ), (y1 , . . . , y j )), of bundles to consumers
and
production plans to firms is feasible if x i ∈ Rn for all i, y j ∈ Y j for all j, and x i =
+ i∈ I
i∈I e +
i j
j∈J y .
Proof: We suppose (x, y) is a WEA at prices p∗ , but is not Pareto efficient, and derive a
contradiction.
Because (x, y) is a WEA, it is feasible, so
xi = yj + ei . (P.1)
i∈I j∈J i∈I
Because (x, y) is not Pareto efficient, there exists some feasible allocation (x̂, ŷ) such
that
with at least one strict inequality. By Lemma 5.2, this implies that
p∗ · x̂i ≥ p∗ · xi , i ∈ I, (P.3)
234 CHAPTER 5
with at least one strict inequality. Summing over consumers in (P.3) and rearranging gives
p∗ · x̂i > p∗ · xi . (P.4)
i∈I i∈I
Now (P.4) together with (P.1) and the feasibility of (x̂, ŷ) tell us
p∗ · ŷ j + ei > p∗ · yj + ei ,
j∈J i∈I j∈J i∈I
so
p∗ · ŷ j > p∗ · y j.
j∈J j∈J
However, this means that p∗ · ŷ j > p∗ · y j for some firm j, where ŷ j ∈ Y j . This contradicts
the fact that in the Walrasian equilibrium, y j maximises firm j’s profit at prices p∗ .
Next we show that competitive markets can support Pareto-efficient allocations after
appropriate income transfers.
Proof: For each j ∈ J , let Ȳ j ≡ Y j − {ŷ j }, and note that so defined, each Ȳ j satisfies
Assumption 5.2. Consider now the economy Ē = (ui , x̂i , θ ij , Ȳ j )i∈I ,j∈J obtained from
the original economy by replacing consumer i’s endowment, ei , with the endowment
x̂i , and replacing each production set, Y j , with the production set Ȳ j . It is straightfor-
ward to show using hypotheses (i) to (iii) that Ē satisfies all the assumptions of Theorem
5.13. Consequently, Ē possesses a Walrasian equilibrium, p̄ 0, and an associated
WEA, (x̄, ȳ).
Now because 0 ∈ Ȳ j for every firm j, profits of every firm are non-negative in
equilibrium, so that each consumer can afford his endowment vector. Consequently,
Next we shall argue that for some aggregate production vector ỹ, (x̄, ỹ) is feasible
for the original economy. To see this, note that each ȳ j ∈ Ȳ j is of the form ȳ j = ỹ j − ŷ j
for some ỹ j ∈ Y j , by the definition of Ȳ j . Now, because (x̄, ȳ) is a WEA for Ē , it must be
feasible in that economy. Therefore,
x̄i = x̂i + ȳ j
i∈I i∈I j∈J
= x̂i + (ỹ j − ŷ j )
i∈I j∈J
= x̂i − ŷ j + ỹ j
i∈I j∈J j∈J
= e +
i
ỹ ,
j
i∈I j∈J
where the last equality follows from the feasibility of (x̂, ŷ) in theoriginal economy.
Consequently, (x̄, ỹ) is feasible for the original economy, where ỹ = j∈J ỹ j .
We may conclude that every inequality in (P.1) must be an equality, otherwise (x̂, ŷ)
would not be Pareto efficient. But the strict quasiconcavity of ui then implies that
x̄i = x̂i , i ∈ I,
because otherwise some consumer would strictly prefer the average of the two bundles to
x̄i , and the average is affordable at prices p̄ because both bundles themselves are afford-
able. This would contradict the fact that (x̄, ȳ) is a WEA for Ē at prices p̄. Thus, we may
conclude that
x̂i maximises ui (xi ) s.t. p̄ · xi ≤ p̄ · x̂i + θ ij p̄ · ȳ j , i ∈ I .
j∈J
But because utility is strongly increasing, the budget constraint holds with equality
at xi = x̂i , which implies that each consumer i’s income from profits is zero. This means
that every firm must be earning zero profits, which in turn means that ȳ j = 0 for every
firm j.
We leave it as an exercise to show that because ȳ j = 0 maximises firm j’s profits at
prices p̄ when its production set is Ȳ j , then (by the definition of Ȳ j ) ŷ j maximises firm j’s
profits at prices p̄ when its production set is Y j (i.e., in the original economy).
So altogether, we have shown the following:
p̄. These transfers sum to zero by the feasibility of (x̂, ŷ), and when employed (in the
original economy), they reduce each consumer’s problem to that in (P.2). Consequently,
both (1) and (2) are satisfied.
5.4.1 TIME
If we wish to include time in our model, then we simply index goods not only by what
they are, e.g. apples, oranges, etc., but also by the date at which they are consumed (or
produced). So instead of keeping track only of xk , the amount of good k consumed by a
consumer, we also keep track of the date t at which it is consumed. Thus, we let xkt denote
the amount of good k consumed at date t. If there are two goods, k = 1, 2, and two dates
t = 1, 2, then a consumption bundle is a vector of four numbers (x11 , x12 , x21 , x22 ), where,
for example, x12 is the amount of good k = 1 consumed at date t = 2.
But if a consumption bundle is (x11 , x12 , x21 , x22 ), then in keeping with our conven-
tion up to now, we should really think of each of the four coordinates of the consumption
bundle as representing the quantities of distinct goods. That is, with two ‘ basic’ goods,
apples and oranges, and two dates, today and tomorrow, we actually have four goods –
apples today, apples tomorrow, oranges today, and oranges tomorrow.
5.4.2 UNCERTAINTY
Uncertainty, too, can be captured using the same technique. For example, suppose there
is uncertainty about today’s weather and that this is important because the weather might
affect the desirability of certain products (e.g., umbrellas, sunscreen, vacations,. . .) and/or
the production possibilities for certain products (e.g., agriculture). To keep things simple,
let us suppose that there are just two possibilities for the state of the weather. In state
s = 1 it rains, and in state s = 2 it is sunny. Then, analogous to what we did with time,
we can index each good k with the state in which it is consumed (or produced) by letting
xks denote the amount of good k consumed in state s, and letting yks denote the amount of
GENERAL EQUILIBRIUM 237
good k produced in state s. This permits consumers to have quite distinct preferences over
umbrellas when it is sunny and umbrellas when it rains, and it also permits production
possibilities, for agricultural products for example, to be distinct in the two states. We
can also model the demand for insurance by allowing a consumer’s endowment vector to
depend upon the state, with low endowments being associated with one state (fire or flood,
for example) and high endowments with another.
There are I consumers. Each consumer i ∈ I has preferences over the set of con-
sumption bundles in RNM + and i’s preferences are represented by a utility function u (·).
i
j ∈ J .5 Note that the endowment vector ei specifies that at date t and in state s, consumer
i’s endowment of the N goods is (ei1ts , . . . , eiNts ).
In terms of our previous definitions, this is simply a private ownership economy
with n = NM goods. For example xkts = 2 denotes two units of good kts or equivalently
it denotes two units of the basic good k at date t in state s. Thus, we are treating the same
basic good as distinct when consumed at distinct dates or in distinct states. After all, the
amount one is willing to pay for an automobile delivered today might well be higher than
the amount one is willing to pay for delivery of an otherwise identical automobile six
months from today. From this perspective, treating the same basic good at distinct dates
(or in distinct states) as distinct goods is entirely natural.
Under the hypotheses of Theorem 5.13, there is a price vector p∗ ∈ RNM ++ constituting
a Walrasian equilibrium for this private ownership economy. In particular, demand must
equal supply for each of the NM goods, that is for every basic good at every date and in
every state of the world. Let us now understand what this means starting with firms.
j
For each firm j ∈ J , let ŷj = (ŷkts ) ∈ Y j ⊆ RNM denote its (unique) profit-
maximising production plan given the price vector p∗ . Consequently, at date t in state
j j
s, firm j will produce ŷkts units of the basic good (output) k if ŷkts ≥ 0 and will demand
5 One could allow ownership shares to depend upon the date and the state, but we shall not do so.
238 CHAPTER 5
j j
ŷkts units of the basic good (input) k if ŷkts < 0. Thus, ŷj is a profit-maximising contin-
gent production plan, describing output supply and input demand for the N basic goods
contingent upon each date and state. Let us now turn to consumers.
For each i ∈ I , let x̂i = (x̂kts
i ) ∈ RNM denote consumer i’s (unique) utility-
+
maximising affordable consumption bundle given prices p∗ and income mi (p∗ ).
Consequently, at date t in state s consumer i will consume x̂kts i units of the basic good
Consequently, at every date and in every state, demand equals supply for each of the basic
goods. On the other hand, each consumer i has only a single budget constraint linking his
expenditures on all goods as follows:
p∗kts x̂kts p∗kts eikts + p∗kts ŷkts , for every i ∈ I .
j
i
= θ ij (5.7)
k,t,s k,t,s j∈J k,t,s
k k j∈J k
consumer i is entitled to receive from the market x̂kts i − ei units of basic good k at date
kts
i − ei < 0, consumer i is required to supply to the market x̂i − ei
t in state s. If x̂kts kts kts kts
units of basic good k at date t in state s.
j
Similarly, each firm’s production plan ŷj = (ŷkts ) can be reinterpreted as the vector
j
of contracts requiring firm j to supply to the market ŷkts units of basic good k at date t in
j j
state s if ŷkts ≥ 0 and entitling firm j to receive from the market ŷkts units of basic good k
j
at date t in state s if ŷkts < 0.
Finally, note that if for each k, t, and s, the price of a contract per unit of basic
good k at date t in state s is p∗kts , then at date zero the market for contracts will clear
with consumers maximising utility and firms maximising profits. When each date t arrives
and any state s occurs, the contracts that are relevant for that date and state are executed.
The market-clearing condition (5.6) ensures that this is feasible. After the initial trading
of contracts in period zero, no further trade takes place. The only activity taking place as
time passes and states occur is the execution of contracts that were purchased and sold at
date zero.
Let us now provide several important remarks on this interpretation of our model.
First, we have implicitly assumed that there is perfect monitoring in the sense that it is not
possible for a firm or consumer to claim that he can supply more units of a basic good in
state s at date t than he actually can supply. Thus, bankruptcy is assumed away. Second,
it is assumed that there is perfect information in the sense that all firms and consumers
are informed of the state when it occurs at each date. Otherwise, if only some agents
were informed of the state, they might have an incentive to lie about which state actu-
ally did occur. Third, it is assumed that all contracts are perfectly enforced. Clearly, each
of these assumptions is strong and rules out important economic settings. Nonetheless,
it is quite remarkable how much additional mileage we are able to get from a model
that appears entirely static and deterministic simply by reinterpreting its variables! The
exercises explore this model further, examining how it provides theories of insurance,
borrowing and lending, interest rates, and asset pricing.
sense, the distinction between core allocations and Walrasian equilibrium ones disap-
pears. In considering that possibility anew, Debreu and Scarf (1963) extended Edgeworth’s
framework and proved him to be correct. Loosely speaking, they showed that as an
economy becomes ‘larger’, its core ‘shrinks’ to include only those allocations that are
Walrasian!
All in all, their result is heartening to those who believe in the special qualities of
a market system, where the only information a consumer requires is the set of market
prices he faces. It suggests a tantalising comparison between the polar paradigms of central
planning and laissez-faire in very large economies. If the objective of the planning process
is to identify and then implement some distribution of goods that is in the core, and if there
are no other allocations in the core but those that would be picked out by a competitive
market system, why go to the bother (and expense) of planning at all? To find the core, a
central planner needs information on consumers’ preferences, and consumers have selfish
incentives to be less than completely honest in revealing that information to the planner.
The market does not need to know anything about consumers’ preferences at all, and in
fact depends on consumers’ selfishness. What is a vice in one case is a virtue of sorts in
the other.
There is, of course, a great deal of loose language in this discussion. On a broad
plane, the choice between planning and market systems would never hinge on efficiency
alone. In addition, we know that core allocations from arbitrary initial endowments need
not be equitable in any sense of the word. Planning may still be justified as a means of
achieving a desired redistribution of endowments. On a narrower plane, there are technical
issues unaddressed. What does it mean for an economy to be ‘large’, or to be ‘larger’, than
another? Moreover, because an ‘allocation’ involves a vector of goods for each consumer,
and because presumably a larger economy has a greater number of consumers, is not the
‘dimensionality’ of the core in large economies different from that in small economies?
If so, how can we speak of the core ‘shrinking’? We will answer each of these questions
before we finish.
Now imagine that each consumer suddenly acquires a twin. The twins are completely
identical, having the same preferences and the same endowments. The new economy, con-
sisting of all the original consumers and their twins, now has two consumers of each type
rather than one. This new economy is clearly larger than the original one because it con-
tains exactly twice as many consumers. We call this new economy the twofold replica of
the original one. If each original consumer was tripled, or quadrupled, we could similarly
construct threefold or fourfold replicas of the original economy, each in turn being larger
than the preceding one in a well-defined way. Now you get the idea of a replica economy.
It is one with a finite number of ‘types’ of consumers, an equal number of consumers
of each type, and all individuals of the same type are identical in that they have identi-
cal preferences and identical endowments. Formally, we have the following definition and
assumptions.
Thus, when comparing two replica economies, we can unambiguously say which of
them is larger. It will be the one having more of every type of consumer.
Let us now think about the core of the r-fold replica economy Er . Under the assump-
tions we have made, all of the hypotheses of Theorem 5.5 will be satisfied. Consequently,
a WEA will exist, and by Theorem 5.5, it will be in the core. So we have made enough
assumptions to ensure that the core of Er is non-empty.
To keep track of all of the consumers in each replica economy, we shall index each
of them by two superscripts, i and q, where i = 1, . . . , I runs through all the types, and
q = 1, . . . , r runs through all consumers of a particular type. For example, the index iq =
23 refers to the type 2 consumer labelled by the number 3, or simply the third consumer
of type 2. So, an allocation in Er takes the form
where xiq denotes the bundle of the qth consumer of type i. The allocation is then feasible if
r
xiq = r ei , (5.9)
i∈I q=1 i∈I
This theorem with the delightfully democratic name identifies a crucial property of
core allocations in replica economies. It is therefore important that we not only believe
equal treatment of like types occurs in the core but that we also have a good feel for why
it is true. For that reason, we will give a leisurely ‘proof’ for the simplest, two-type, four-
person economy. Once you understand this case, you should be able to derive the formal
proof of the more general case for yourself, and that will be left as an exercise.
Proof: Let I = 2, and consider E2 , the replica economy with two types of consumers and
two consumers of each type, for a total of four consumers in the economy. Suppose that
is an allocation in the core of E2 . First, we note that because x is in the core, it must be
feasible, so
Of course, the preference may be strict, or the two bundles may be ranked equally.
Figs. 5.9(a) and 5.9(b) illustrate both possibilities. Either way, we would like to show
that because x11 and x12 are distinct, x cannot be in the core of E2 . To do this, we will
show that x can be blocked.
Now, consider the two consumers of type 2. Their bundles according to x are x21 and
x , and they each have preferences 2 . Let us assume (again without loss of generality)
22
that
So, consumer 2 of type 1 is the worst off type 1 consumer, and consumer 2 of type 2 is the
worst off type 2 consumer. Let us see if these worst off consumers of each type can get
together and block the allocation x.
GENERAL EQUILIBRIUM 243
x2 x2
x11 x12
x12
x12 x11
x12
x1 x1
01 01
(a) (b)
x11 + x12
x̄12 = ,
2
x21 + x22
x̄22 = .
2
The first bundle is the average of the bundles going to the type 1 consumers and the second
is the average of the bundles going to the type 2 consumers. See Fig. 5.9 for the placement
of x̄12 .
Now, suppose it were possible to give consumer 12 the bundle x̄12 . How would
this compare to giving him the bundle he’s getting under x̄, namely, x12 ? Well, remember
that according to (P.2), consumer 12 was the worst off consumer of type 1. Consequently,
because bundles x11 and x12 are distinct, consumer 12 would strictly prefer x̄12 to x12
because his preferences, 1 , are strictly convex. That is,
x̄12 1 x12 .
x̄22 2 x22 ,
where the preference need not be strict because we may have x21 = x22 .
The pair of bundles (x̄12 , x̄22 ) therefore makes consumer 12 strictly better off and
consumer 22 no worse off than the allocation x. If this pair of bundles can be achieved
244 CHAPTER 5
by consumers 12 and 22 alone, then they can block the allocation x, and the proof will be
complete.
To see that together they can achieve (x̄12 , x̄22 ), note the following:
= 12 (2e1 + 2e2 )
= e1 + e2 ,
where the third equality follows from (P.1). Consequently, the two worst off consumers
of each type can together achieve a pair of bundles that makes one of them strictly better
off and the other no worse off. The coalition S = {12, 22} therefore can block x. But this
contradicts the fact that x is in the core.
We conclude then that x must give consumers of the same type the same bundle.
Now that we have made clear what it means for one economy to be larger than
another, and have demonstrated the equal treatment property in the core of a replica econ-
omy, we can clarify what we mean when we say the core ‘shrinks’ as the economy gets
larger by replication. First, recognise that when we replicate some basic economy, we
increase the number of consumers in the economy and so increase the number of bundles
in an allocation. There should be no confusion about that. However, when we restrict our
attention to core allocations in these economies, the equal-treatment property allows us to
completely describe any allocation in the core of Er by reference to a similar allocation in
the basic economy, E1 .
To see this, suppose that x is in the core of Er . Then by the equal treatment property,
x must be of the form
x = x ,...,x ,x ,...,x ,...,x ,...,x ,
1 1 2 2 I I
r times r times r times
because all consumers of the same type must receive the same bundle. Consequently, core
allocations in Er are just r-fold copies of allocations in E1 – i.e., the above core allocation
is just the r-fold copy of the E1 allocation
(x1 , x2 , . . . , xI ). (5.10)
In fact, this allocation is feasible in E1 . To see this, note first that because x is a core
allocation in Er , it must be feasible in Er . Therefore, we have
r xi = r ei ,
i∈I i∈I
which, dividing by r, shows that the allocation in (5.10) is feasible in the basic economy E1 .
GENERAL EQUILIBRIUM 245
Altogether then, we have shown that every core allocation of the r-fold replica
economy is simply an r-fold copy of some feasible allocation in the basic economy E1 .
Consequently, we can keep track of how the core changes as we replicate the economy
simply by keeping track of those allocations in E1 corresponding to the core of each r-fold
replica. With this in mind, define Cr as follows:
Cr ≡ x = (x1 , . . . , xI ) ∈ F(e) | x1 , . . . , x1 , . . . , xI , . . . , xI is in the core of Er .
r times r times
We can now describe formally the idea that the core ‘shrinks’ as the economy is
replicated.
Proof: It suffices to show that for r > 1, Cr ⊆ Cr−1 . So, suppose that x = (x1 , . . . , xI ) ∈
Cr . This means that its r-fold copy cannot be blocked in the r-fold replica economy. We
must show that its (r − 1)-fold copy cannot be blocked in the (r − 1)-fold replica econ-
omy. But a moment’s thought will convince you of this once you realise that any coalition
that blocks the (r − 1)-fold copy in Er−1 could also block the r-fold copy in Er – after all,
all the members of that coalition are present in Er as well, and their endowments have not
changed.
So, by keeping track of the allocations in the basic economy whose r-fold copies
are in the core of the r-fold replica, Lemma 5.3 tells us that this set will get no larger
as r increases. To see how the core actually shrinks as the economy is replicated, we shall
look again at economies with just two types of consumers. Because we are only concerned
with core allocations in these economies, we can exploit the equal-treatment property and
illustrate our arguments in an Edgeworth box like Fig. 5.10. This time, we think of the
preferences and endowments in the box as those of a representative consumer of each type.
~
x
01
246 CHAPTER 5
In the basic economy with one consumer of each type, the core of E1 is the squiggly
line between the two consumers’ respective indifference curves through their endowments
at e. The core of E1 contains some allocations that are WEA and some that are not. The
allocation marked x̃ is not a WEA because the price line through x̃ and e is not tangent
to the consumers’ indifference curves at x̃. Note that x̃ is on consumer 11’s indifference
curve through his endowment. If we now replicate this economy once, can the replication
of this allocation be in the core of the larger four-consumer economy?
The answer is no; and to see it, first notice that any point along the line joining e and
x̃ is preferred to both e and x̃ by both (there are now two) type 1’s because their preferences
are strictly convex. In particular, the midpoint x̄ has this property. Now consider the three-
consumer coalition, S = {11, 12, 21}, consisting of both type 1’s and one type 2 consumer
(either one will do). Let each type 1 consumer have a bundle corresponding to the type 1
bundle at x̄ and let the lone type 2 consumer have a type 2 bundle like that at x̃. We know
that each type 1 strictly prefers this to the type 1 bundle at x̃, and the type 2 consumer is
just as well off. Specifically, we know
Are bundles {x̄11 , x̄12 , x̃21 } feasible for S? From the definitions, and noting that
x̃11 = x̃12 , we have
x̄11 + x̄12 + x̃21 = 2 12 e1 + 12 x̃11 + x̃21
= e1 + x̃11 + x̃21 . (5.11)
Next recall that x̃ is in the core of E1 , so it must be feasible in the two-consumer economy.
This implies
so the proposed allocation is indeed feasible for the coalition S of two type 1’s and one
type 2. Because we have found a coalition and an allocation they can achieve that makes
two of them strictly better off and the other no worse off than their assignments under x̃,
that coalition blocks x̃ in the four-consumer economy, ruling it out of the core of E2 .
If we continue to replicate the economy, so that more consumers can form more
coalitions, can we ‘shrink’ the core even further? If so, are there any allocations that are
never ruled out and so belong to the core of every replica economy? The answer to both
questions is yes, as we now proceed to show in the general case.
GENERAL EQUILIBRIUM 247
We would like to demonstrate that the set of core allocations for Er converges to its
set of Walrasian equilibrium allocations as r increases. Through the equal treatment prop-
erty, we have been able to describe core allocations for Er as r-fold copies of allocations
in the basic economy. We now do the same for Er ’s set of Walrasian equilibria.
Proof: If x is a WEA for Er , then by Theorem 5.5, it is in the core of Er , so that by Theorem
5.16 it must satisfy the equal treatment property. Hence, it must be an r-fold copy of some
allocation in E1 . We leave it as an exercise for you to show that this allocation in E1 is a
WEA for E1 . In addition, we leave the converse as an exercise.
Lemma 5.4 says that as we replicate the economy, the set of Walrasian equilibria
remains ‘constant’ in the sense that it consists purely of copies of Walrasian equilibria of
the basic economy. Consequently, the set of Walrasian equilibria of the basic economy
keeps track, exactly, of the set of Walrasian equilibria of the r-fold replicas.
We can now compare the set of core allocations for Er with its set of Walrasian
equilibrium allocations by comparing the set Cr – whose members are allocations for E1 –
with the set of Walrasian equilibrium allocations for E1 .
Because C1 ⊃ C2 ⊃ . . . , the core is shrinking, as we have already seen. Moreover,
C1 ⊃ C2 ⊃ . . . . ⊃ W1 (e), the set of WEAs for E1 . To see this, note that by Lemma 5.4,
the r-fold copy of a WEA for E1 is in the core of Er , which by the definition of Cr means
that the original WEA for E1 is in Cr .
Now, as we replicate the economy and consider Cr , in the limit only those allocations
satisfying x ∈ Cr for every r = 1, 2, . . . will remain. Thus, to say that the core shrinks to
the set of competitive equilibria is to say that if x ∈ Cr for every r, then x is a competitive
equilibrium allocation for E1 . This is precisely what Debreu and Scarf have shown.
Before presenting the general argument, we will sharpen our intuition by consid-
ering the two-type Edgeworth box case. So, consider Fig. 5.11. Let us suppose, by way
of contradiction, that some non-Walrasian equilibrium allocation, x̃, is in Cr for every r.
In particular, then, x̃ is in the core of the basic two consumer economy consisting of one
consumer of each type. In Fig. 5.11, this means that x̃ must be within the lens and on the
contract curve. That is, it must be on the squiggly line, and the consumers’ indifference
curves through x̃ must be tangent.
248 CHAPTER 5
xˆ
x∼
01
Now consider the line joining the endowment point, e, and x̃. This corresponds to a
budget line for both consumers and an associated pair of prices p1 , p2 for the two goods.
Because MRS112 (x̃1 ) =MRS212 (x̃), either p1 /p2 >MRS112 (x̃1 ), or p2 /p1 > MRS212 (x̃2 ). Note
that equality cannot hold; otherwise, these prices would constitute a Walrasian equilibrium,
and x would be a Walrasian equilibrium allocation. Fig. 5.11 depicts the first case. The
second is handled analogously by reversing the roles of types 1 and 2.
As shown, the line from e to x̃ therefore cuts the type 1’s indifference curve at point
A, and by strict convexity, lies entirely above it between A and x̃. Thus, there exists some
point like x̂ on the segment from A to x̃, which a type 1 consumer strictly prefers to his
bundle at x̃. Because x̂ lies on the chord from e to x̃, it can be expressed as a convex
combination of e and x̃. Thinking ahead a little, let us then write the type 1 bundle at x̂ as
follows:
1 1 r−1 1
x̂1 ≡ e + x̃ (5.13)
r r
for some r > 1. Notice first that this is indeed a convex combination of the sort described
because 1/r + (r − 1)/r = 1. For the record, let us recall that
Suppose, as can always be arranged, that r is an integer, and consider Er , the econ-
omy with r consumers of each type. Because we are assuming x̃ ∈ Cr , this means that the
r-fold copy of x̃ is in the core of Er . But can this be so? Not if we can find a coalition and
an allocation that blocks it, and that is just what we will do.
This time, our coalition S consists of all r type 1 consumers and r − 1 of the type 2
consumers. If we give each type 1 the bundle x̂1 , then from (5.14), each would prefer it to
his assignment under x̃. If we give each type 2 in the coalition a bundle x̃2 identical to her
GENERAL EQUILIBRIUM 249
assignment under x̃, each type 2 of course would be indifferent. Thus, we would have
Now recall that x̃1 and x̃2 are, by assumption, in the core of the basic two-consumer
economy. They therefore must be feasible for the two-consumer economy, so we know
confirming that the proposed allocation in (5.15) is indeed feasible for the coalition of r
type 1’s and (r − 1) type 2’s. Because that allocation is feasible and strictly preferred by
some members of S, and no worse for every member of S than the r-fold copy of x̃, S
blocks the r-fold copy of x̃ and so it is not in the core of Er . We conclude that if x ∈ Cr for
every r, then it must be a Walrasian equilibrium allocation in the basic economy.
We now give the general argument under two additional hypotheses. The first is that
if x ∈ C1 , then x 0. The second is that for each i ∈ I , the utility function ui representing
i is differentiable on Rn++ with a strictly positive gradient vector there.
Proof: Suppose that x̃ ∈ Cr for every r. We must show that x̃ is a WEA for E1 .
We shall first establish that
To see that this inequality must hold, let us suppose that it does not and argue to a
contradiction. So, suppose that for some t̄ ∈ [0, 1], and some i ∈ I ,
But we can now use precisely the same argument that we gave in the discussion
preceding the proof to show that the r-fold copy of x̃ is then not in the core of Er . But this
contradicts the fact that x̃ ∈ Cr . We therefore conclude that (P.1) must hold.
Now, look closely at (P.1). Considering the left-hand side as a real-valued function
of t on [0, 1], it says that this function achieves a maximum at t = 0. Because this is on the
lower boundary of [0, 1] it implies that the derivative of the left-hand side is non-positive
when evaluated at t = 0. Taking the derivative and evaluating it at t = 0 then gives
Note that we would be finished if each inequality in (P.4) were an equality. For in this
case, x̃i would satisfy the first-order conditions for a maximum of the consumer’s utility-
maximisation problem subject to the budget constraint at prices p̃. Moreover, under the
hypotheses we have made, the first-order conditions are sufficient for a utility-maximising
solution as well (see Theorem 1.4). That is, x̃i would be a Walrasian equilibrium allocation
for E1 .
We now show that indeed each inequality in (P.4) must be an equality. Note that
because x̃ ∈ Cr , it must be feasible in E1 . Therefore,
x̃i = ei ,
i∈I i∈I
GENERAL EQUILIBRIUM 251
so that
p̃ · x̃i = p̃ · ei .
i∈I i∈I
However, this equality would fail if for even one consumer i, the inequality in (P.4) were
strict.
We have shown that for large enough economies, only WEAs will be in the core. This
astonishing result really does point towards some unique characteristics of large market
economies and suggests itself as a sort of ultimate ‘proof’ of Adam Smith’s intuitions
about the efficacy of competitive market systems. The result does bear some scrutiny,
however. First of all, it was obtained within the rather rigid context of replica economies
with equal numbers of each type of consumer. Second, we cannot lose sight of the fact
that the core itself is a very weak solution concept with arguable equity properties. To
the extent that a ‘good’ solution to the distribution problem from society’s point of view
includes considerations of equity, even the broadest interpretation of this result does not
provide support to arguments for pure laissez-faire. The ‘equity’ of any core allocation,
and so of any WEA, depends on what the initial endowments are.
The first of these objections can be, and has been, addressed. Abandoning the rigid
world of replica economies in favour of more flexible ‘continuum economies’, Aumann
(1964), Hildenbrand (1974), and others have proved even stronger results without the
assumption of equal numbers of each type. What then of the second objection cited? Well,
if we want to use the market system to achieve the ‘good society’, the Second Welfare
Theorem tells us that we can. All we need to do is decide where in the core we want to
be and then redistribute ‘endowments’ or ‘income’ and use the market to ‘support’ that
distribution. Ah, but there’s the rub. How do we decide where we want to be? How does
‘society’ decide which distribution in the core it ‘prefers’? This is the kind of question we
take up in the next chapter.
5.6 EXERCISES
5.1 In an Edgeworth box economy, do the following:
(a) Sketch a situation where preferences are neither convex nor strictly monotonic and there is no
Walrasian equilibrium.
(b) Sketch a situation where preferences are neither convex nor strictly monotonic yet a Walrasian
equilibrium exists nonetheless.
(c) Repeat parts (a) and (b), but this time assume preferences are not continuous.
5.2 Let some consumer have endowments e and face prices p. His indirect utility function is thus
v(p, p · e). Show that whenever the price of a good rises by a sufficiently small amount, the consumer
will be made worse off if initially he was a net demander of the good (i.e., his demand exceeded his
endowment) and made better of if he was initially a net supplier of the good. What can you say if
the price of the good rises by a sufficiently large amount?
252 CHAPTER 5
5.3 Consider an exchange economy. Let p be a vector of prices in which the price of at least one good is
non-positive. Show that if consumers’ utility functions are strongly increasing, then aggregate excess
demand cannot be zero in every market.
5.4 Derive the excess demand function z(p) for the economy in Example 5.1. Verify that it satisfies
Walras’ law.
5.5 In Example 5.1, calculate the consumers’ Walrasian equilibrium allocations and illustrate in an
Edgeworth box. Sketch in the contract curve and identify the core.
5.6 Prove Lemma 5.1 and complete the proof of Lemma 5.2.
5.7 Consider an exchange economy with two goods. Suppose that its aggregate excess demand function
is z(p1 , p2 ) = (−1, p1 /p2 ) for all (p1 , p2 ) (0, 0).
(a) Show that this function satisfies conditions 1 and 2 of Theorem 5.3, but not condition 3.
(b) Show that the conclusion of Theorem 5.3 fails here. That is, show that there is no (p∗1 , p∗2 )
(0, 0) such that z(p∗1 , p∗2 ) = (0, 0).
5.8 Let pm be a sequence of strictly positive prices converging to p̄, and let a consumer’s endowment
vector be e. Show that the sequence {pm · e} of the consumer’s income is bounded. Indeed, show
more generally that if a sequence of real numbers converges, then it must be bounded.
5.9 Prove the corollary to Theorem 5.8. Extend the argument to show that, under the same assumptions,
any Pareto-efficient allocation can be supported as a WEA for some Walrasian equilibrium p̄ and
some distribution of income, (R1 , . . . , RI ), where Ri is the income distributed to consumer i.
5.10 In a two-person, two-good exchange economy with strictly increasing utility functions, it is easy to
see that an allocation x̄ ∈ F(e) is Pareto efficient if and only if x̄i solves the problem
max ui (xi ) s.t. u j (x j ) ≥ u j (x̄ j ),
xi
for i = 1, 2 and i = j.
(a) Prove the claim.
(b) Generalise this equivalent definition of a Pareto-efficient allocation to the case of n goods and I
consumers. Then prove the general claim.
5.11 Consider a two-consumer, two-good exchange economy. Utility functions and endowments are
5.12 There are two goods and two consumers. Preferences and endowments are described by
respectively.
(a) Find a Walrasian equilibrium for this economy and its associated WEA.
(b) Do the same when 1’s endowment is e1 = (5, 0) and 2’s remains e2 = (0, 20).
5.13 An exchange economy has two consumers with expenditure functions:
1/3
e1 (p, u) = 3(1.5)2 p21 p2 exp(u) ,
1/3
e2 (p, u) = 3(1.5)2 p22 p1 exp(u) .
If initial endowments are e1 = (10, 0) and e2 = (0, 10), find the Walrasian equilibrium.
5.14 Suppose that each consumer i has a strictly positive endowment vector, ei , and a Cobb-Douglas
αi αi αi
non Ri+ of the form u (x) = x1 x2 · · · xn , where αk > 0 for all consumers i, and
utility function n i 1 2 n i
(a) Sketch the Edgeworth box for this economy when aggregate endowments are (1, 1). Identify
the set of Pareto-efficient allocations.
(b) Sketch the Edgeworth box for this economy when aggregate endowments are (2, 1). Identify
the set of Pareto-efficient allocations.
5.17 Consider an exchange economy with two identical consumers. Their common utility function is
ui (x1 , x2 ) = x1α x21−α for 0 < α < 1. Society has 10 units of x1 and 10 units of x2 in all. Find endow-
ments e1 and e2 , where e1 = e2 , and Walrasian equilibrium prices that will ‘support’ as a WEA the
equal-division allocation giving both consumers the bundle (5, 5).
5.18 In a two-good, two-consumer economy, utility functions are
u1 (x1 , x2 ) = x1 (x2 )2 ,
u2 (x1 , x2 ) = (x1 )2 x2 .
Find a Walrasian equilibrium and the associated WEA for this economy.
5.20 In an exchange economy with two consumers, total endowments are (e1 , e2 ) ≡ (e11 + e21 , e12 + e22 ).
Consumer i requires sij units of good j to survive, but consumers differ in that (s11 , s12 ) = (s21 , s22 ).
Consumers are otherwise identical, with utility functions ui = (x1i − si1 )α + (x2i − si2 )α for 0 <
α < 1 and i = 1, 2.
(a) Suppose now that there is a single hypothetical consumer with initial endowments (e1 , e2 )
and utility function u = (x1 − s1 )α + (x2 − s2 )α , where sj ≡ s1j + s2j for j = 1, 2. Calculate
(∂u/∂x1 )/(∂u/∂x2 ) for this consumer and evaluate it at (x1 , x2 ) = (e1 , e2 ). Call what you’ve
obtained p∗ .
(b) Show that p∗ obtained in part (a) must be an equilibrium relative price for good x1 in the
exchange economy previously described.
5.21 Consider an exchange economy with the two consumers. Consumer 1 has utility function
u1 (x1 , x2 ) = x2 and endowment e1 = (1, 1) and consumer 2 has utility function u2 (x1 , x2 ) = x1 + x2
and endowment e2 = (1, 0).
GENERAL EQUILIBRIUM 255
n
n
vε (x) = u x1ε + (1 − ε) xiε , . . . , xnε + (1 − ε) xiε ,
i=1 i=1
is continuous, strictly quasiconcave and strongly increasing. Note that the approximation to u(·)
becomes better and better as ε → 1 because vε (x) → u(x) as ε → 1.
(b) Show that if in an exchange economy with a positive endowment of each good, each consumer’s
utility function is continuous, quasiconcave and strictly increasing on Rn+ , there are approximat-
ing utility functions as in part (a) that define an exchange economy with the same endowments
and possessing a Walrasian equilibrium. If, in addition, each consumer’s endowment gives him
a positive amount of each good, show that any limit of such Walrasian equilibria, as the approxi-
mations become better and better (e.g., as ε → 1 in the approximations in part (a)) is a Walrasian
equilibrium of the original exchange economy.
(c) Show that such a limit of Walrasian equilibria as described in part (b) exists. You will then have
proven the following result.
If each consumer in an exchange economy is endowed with a positive amount of each good and
has a continuous, quasiconcave and strictly increasing utility function, a Walrasian equilibrium
exists.
(d) Which hypotheses of the Walrasian equilibrium existence result proved in part (b) fail to hold in
the exchange economy in Exercise 5.21?
5.23 Show that if a firm’s production set is strongly convex and the price vector is strictly positive, then
there is at most one profit-maximising production plan.
5.24 Provide a proof of Theorem 5.10.
5.25 Complete the proof of Theorem 5.13 by showing that z(p) in the economy with production satisfies
all the properties of Theorem 5.3.
5.26 Suppose that in a single-consumer economy, the consumer is endowed with none of the consumption
good, y, and 24 hours of time, h, so that e = (24, 0). Suppose as well that preferences are defined
over R2+ and represented by u(h, y) = hy, and production possibilities are Y = {(−h, y) | 0 ≤ h ≤ b
√
and 0 ≤ y ≤ h}, where b is some large positive number. Let py and ph be prices of the consumption
good and leisure, respectively.
(a) Find relative prices py /ph that clear the consumption and leisure markets simultaneously.
(b) Calculate the equilibrium consumption and production plans and sketch your results in R2+ .
(c) How many hours a day does the consumer work?
256 CHAPTER 5
5.27 Consider an exchange economy (ui , ei )i∈I in which each ui is continuous and quasiconcave on Rn+ .
Suppose that x̄ = (x̄1 , x̄2 , . . . , x̄I ) 0 is Pareto efficient, that each ui is continuously differentiable
in an open set containing x̄i , and that ∇ui (x̄i ) 0. Under these conditions, which differ somewhat
from those of Theorem 5.8, follow the steps below to derive another version of the Second Welfare
Theorem.
(a) Show that for any two consumers i and j, the gradient vectors ∇ui (x̄i ) and ∇u j (x̄ j ) must be
proportional. That is, there must exist some α > 0 (which may depend on i and j) such that
∇ui (x̄i ) = α∇u j (x̄ j ). Interpret this condition in the case of the Edgeworth box economy.
(b) Define p̄ = ∇u1 (x̄1 ) 0. Show that for every consumer i, there exists λi > 0 such that
∇ui (x̄i ) = λi p̄.
(c) Use Theorem 1.4 to argue that for every consumer i, x̄i solves
5.28 Suppose that all of the conditions in Exercise 5.27 hold, except the strict positivity of x̄ and the
consumers’ gradient vectors. Using an Edgeworth box, provide an example showing that in such a
case, it may not be possible to support x̄ as a Walrasian equilibrium allocation. Because Theorem
5.8 does not require x̄ to be strictly positive, which hypothesis of Theorem 5.8 does your example
violate?
5.29 Consider an exchange economy (ui , ei )i∈I in which each ui is continuous and quasiconcave on Rn+ .
Suppose that x̄ = (x̄1 , x̄2 , . . . , x̄I ) 0 is Pareto efficient. Under these conditions, which differ from
those of both Theorem 5.8 and Exercise 5.27, follow the steps below to derive yet another version of
the Second Welfare Theorem.
(a) Let C = {y ∈ Rn | y = i∈I xi , some xi ∈ Rn such thati u (x ) ≥ u (x̄ ) for all i ∈ I, with at least
i i i i
one inequality strict}, and let Z = {z ∈ R | z ≤ i∈I e }. Show that C and Z are convex and that
n
max p · y s.t. y ∈ Y − y0 .
y
max p · y s.t. y ∈ Y.
y
5.31 Consider an economy with production in which there are many goods produced by the production
sector, but each firm produces only one of them. Suppose also that each firm’s output is given by a
differentiable production function and that each consumer’s utility function is differentiable as well.
GENERAL EQUILIBRIUM 257
Assume that this economy is in a Walrasian equilibrium with strictly positive prices and that all
consumer’s marginal utilities (of consumption goods) and all firm’s marginal products (of inputs)
are also strictly positive.
(a) Show that the MRS between any two consumption goods is the same for each consumer, and
that it is equal to the ratio of their prices.
(b) Show that the MRTS between any two inputs is the same for every firm and equal to the ratio of
their prices.
(c) What does this tell you about the information content of Walrasian equilibrium prices?
5.32 Consider a simple economy with two consumers, a single consumption good x, and two time periods.
Consumption of the good in period t is denoted xt for t = 1, 2. Intertemporal utility functions for the
two consumers are,
ui (x1 , x2 ) = x1 x2 , i = 1, 2,
and endowments are e1 = (19, 1) and e2 = (1, 9). To capture the idea that the good is perfectly
storable, we introduce a firm producing storage services. The firm can transform one unit of the
good in period one into one unit of the good in period 2. Hence, the production set Y is the set of all
vectors (y1 , y2 ) ∈ R2 such that y1 + y2 ≤ 0 and y1 ≤ 0. Consumer 1 is endowed with a 100 per cent
ownership share of the firm.
(a) Suppose the two consumers cannot trade with one another. That is, suppose that each consumer
is in a Robinson Crusoe economy and where consumer 1 has access to his storage firm. How
much does each consumer consume in each period? How well off is each consumer? How much
storage takes place?
(b) Now suppose the two consumers together with consumer 1’s storage firm constitute a compet-
itive production economy. What are the Walrasian equilibrium prices, p1 and p2 ? How much
storage takes place now?
(c) Interpret p1 as a spot price and p2 as a futures price.
(d) Repeat the exercise under the assumption that storage is costly, i.e., that Y is the set of vectors
(y1 , y2 ) ∈ R2 such that δy1 + y2 ≤ 0 and y1 ≤ 0, where δ ∈ [0, 1). Show that the existence of
spot and futures markets now makes both consumers strictly better off.
5.33 The contingent-commodity interpretation of our general equilibrium model permits us to consider
time (as in the previous exercise) as well as uncertainty and more (e.g. location). While the trading
of contracts nicely captures the idea of futures contracts and prices, one might wonder about the
role that spot markets play in our theory. This exercise will guide you through thinking about this.
The main result is that once the date zero contingent-commodity contracts market has cleared at
Walrasian prices, there is no remaining role for spot markets. Even if spot markets were to open up
for some or all goods in some or all periods and in some or all states of the world, no additional trade
would take place. All agents would simply exercise the contracts they already have in hand.
(a) Consider an exchange economy with I consumers, N goods, and T = 2 dates. There is no uncer-
tainty. We will focus on one consumer whose utility function is u(x1 , x2 ), where xt ∈ RN + is a
vector of period-t consumption of the N goods.
Suppose that p̂ = (p̂1 , p̂2 ) is a Walrasian equilibrium price vector in the contingent-
commodity sense described in Section 5.4, where p̂t ∈ RN ++ is the price vector for period-t
258 CHAPTER 5
contracts on the N goods. Let x̂ = (x̂1 , x̂2 ) be the vector of contracts that our consumer purchases
prior to date 1 given the Walrasian equilibrium price-vector p̂ = (p̂1 , p̂2 ).
Suppose now that at each date t, spot-markets open for trade.
(i) Because all existing contracts are enforced, argue that our consumer’s available endowment
in period t is x̂t .
(ii) Show that if our consumer wishes to trade in some period t spot-market and if all goods
have period t spot-markets and the period t spot-prices are p̂t , then our consumer’s period t
budget constraint is,
(iii) Conclude that our consumer can ultimately choose any (x1 , x2 ) such that
(iv) Prove that the consumer can do no better than to choose x1 = x̂1 in period t = 1 and x2 = x̂2
in period t = 2 by showing that any bundle that is feasible through trading in spot-markets
is feasible in the contingent-commodity contract market. You should assume that in period
1 the consumer is forward-looking, knows the spot-prices he will face in period 2, and that
he wishes to behave so as to maximise his lifetime utility u(x1 , x2 ). Further, assume that if
he consumes x̄1 in period t = 1, his utility of consuming any bundle x2 in period t = 2 is
u(x̄1 , x2 ).
Because the consumer can do no better if there are fewer spot-markets open, parts (i)–
(iv) show that if there is a period t spot-market for good k and the period t spot-price of good
k is p̂kt , then our consumer has no incentive to trade. Since this is true for all consumers,
this shows that spot-markets clear at prices at which there is no trade.
(b) Repeat the exercise with uncertainty instead of time. Assume N goods and two states of the
world, s = 1, 2. What is the interpretation of the assumption (analogous to that made in part (iv)
of (a)) that if the consumer would have consumed bundle x̄1 had state s = 1 occurred, his utility
of consuming any bundle x2 in state s = 2 is u(x̄1 , x2 )?
The next question shows that spot-markets nevertheless have a role.
5.34 (Arrow Securities) Exercise 5.33 shows that when there are opportunities to trade a priori in any
commodity contingent on any date, state, etc., there is no remaining role for spot-markets. Here we
show that if not all commodities can be traded contingent on every date and state, then spot-markets
do have a role. We will in fact suppose that there is only one ‘commodity’ that can be traded a priori,
an Arrow security (named after the Nobel prize winning economist Kenneth Arrow). An Arrow
security for date t and state s entitles the bearer to one dollar at date t and in state s and nothing
otherwise.
We wish to guide you towards showing that if p̂ 0 is a Walrasian equilibrium price in the
contingent-commodity sense of Section 5.4 when there are N goods as well as time and uncertainty,
and x̂ ≥ 0 is the corresponding Walrasian allocation, then the same prices and allocation arise when
only Arrow securities can be traded a priori and all other goods must be traded on spot-markets. This
shows that as long as there is a contingent-commodity market for a unit of account (money), the full
contingent-commodity Walrasian equilibrium can be implemented with the aid of spot-markets. We
will specialise our attention to exchange economies. You are invited to conduct the same analysis
for production economies.
GENERAL EQUILIBRIUM 259
Consider then the following market structure and timing. At date zero, there is a market for
trade in Arrow securities contingent on any date and any state. The price of each Arrow security is
one dollar, and each date t and state s security entitles the bearer to one dollar at date t and in state s,
and nothing otherwise. Let aits denote consumer i’s quantity of date t and state s Arrow securities. No
consumer is endowed with any Arrow securities. Hence, consumer i’s budget constraint for Arrow
securities at date zero is,
aits = 0.
t,s
At each date t ≥ 1, the date-t event st is realised and all consumers are informed of the date-
t state of the world s = (s1 , . . . , st ). Each consumer i receives his endowment eist ∈ RN
+ of the N
goods. Spot-markets open for each of the N goods. If the spot-price of good k is pkts , then consumer
i’s date-t state-s budget constraint is,
i
pkts xkts = pkts eikts + aits .
k k
Each consumer i is assumed to know all current and future spot prices for every good in every
state (a strong assumption!). Consequently, at date zero consumer i can decide on the trades he will
actually make in each spot-market for each good at every future date and in every state. At date zero
consumer i therefore solves,
max ui ((xkts
i
))
(aits ),(xkts
i )
for each date t and state s. (Note the inequality in the date-t state-s constraints. This ensures that
there is no bankruptcy.)
(a) Argue that the above formulation implicitly assumes that at any date t, current and future utility
in any state is given by ui (·) where past consumption is fixed at actual levels and consumption in
states that did not occur are fixed at the levels that would have been chosen had they occurred.
(b) The consumer’s budget constraint in the contingent-commodity model of Section 5.4 specialised
to exchange economies is,
i
pkts xkts = pkts eikts .
k,t,s k,t,s
i ) satisfies this budget constraint if and only if there is a vector of Arrow securities
Show that (xkts
(ast ) such that (xkts
i i ) and (ai ) together satisfy the Arrow security budget constraint and each of
st
the spot-market budget constraints.
260 CHAPTER 5
(c) Conclude from (b) that any Walrasian equilibrium price and allocation of the contingent-
commodity model of Section 5.4 can be implemented in the spot-market model described here
and that there will typically be trade in the spot-markets. Show also the converse.
(d) Explain why the price of each Arrow security is one. For example, why should the price of a
security entitling the bearer to a dollar today be equal to the price of a security entitling the
bearer to a dollar tomorrow when it is quite possible that consumers prefer consumption today
to the same consumption tomorrow? (Hint: Think about what a dollar will buy.)
(e) Repeat the exercise when, instead of paying the bearer in a unit of account, one date-t state-s
Arrow security pays the bearer one unit of good 1 at date t in state s and nothing otherwise. What
prices must be set for Arrow securities now in order to obtain the result in part (c)? How does
this affect the consumer’s Arrow security and spot-market budget constraints?
5.35 (Asset Pricing) We can use our general equilibrium Walrasian model to think about asset pricing.
We do this in the simplest possible manner by considering a setting with N = 1 good, T = 1 period,
and finitely many states, s = 1, 2, . . . , S. Thus a consumption bundle x = (x1 , x2 , . . . , xS ) ∈ RS+
describes the quantity of the good consumed in each state. Once again, we restrict attention to an
exchange economy. There are I consumers and consumer i’s utility function is ui (x1 , x2 , . . . , xS ) and
his endowment vector is ei = (ei1 , . . . , eiS ). Note that one unit of commodity s yields one unit of the
good in state s. Hence, we can think of commodity s as an Arrow security for the good in state s.
Because all Arrow securities are tradeable here, the market is said to be complete.
Before thinking about asset pricing, let us consider this simply as an exchange economy
and suppose that p̂ 0 is a Walrasian equilibrium price vector and that x̂ = (x̂1 , x̂2 , . . . , x̂I ) is the
associated Walrasian equilibrium allocation. Therefore, for each consumer i, x̂i = (x̂1i , x̂2i , . . . , x̂Si )
maximises ui (x1 , x2 , . . . , xS ) subject to
p̂1 x1 + . . . + p̂S xS = p̂1 ei1 + . . . + p̂S eiS ,
and markets clear. That is,
x̂si = eis ,
i i
(b) Suppose that πs is the probability that state s occurs and that all consumers agree on this. Further,
suppose that each consumer’s preferences are represented by a von Neumann-Morgenstern util-
ity function, vi (x), assigning VNM utility to any quantity x ≥ 0 of the good and that v
i > 0.
Further, assume that each consumer is strictly risk averse, i.e., that v
S
ui (x1 , . . . , xS ) = πs vi (xs ).
s=1
(i) Suppose the total endowment of the good is constant across states, i.e., suppose that
eis = eis
, for all states, s, s
.
i i
E(v
1 (x̃1 )α̃) E(v
I (x̃I )α̃)
= ... = ,
E(v1 (x̃ ))
1 E(v
I (x̃I ))
where E denotes mathematical expectation, x̃i is the random variable describing the
amount of the good consumed by consumer i in equilibrium (x̃i = x̂si in state s), and
α̃ is the random variable describing the amount of the good the asset yields (α̃ = αs in
state s). Conclude, at least roughly, that the real price of an asset is higher the more
negatively correlated are its returns with consumption – it is then more useful for diver-
sifying risk. In particular, conclude that an asset whose returns are independent of any
consumer’s marginal utility of consumption has a price equal to its expected value. Thus,
the price of an asset is not so much related to its variance but rather the extent to which
it is correlated with consumption.
5.36 (Arbitrage Pricing) We shift gears slightly in this question by considering an arbitrage argument that
delivers the same pricing of assets as derived in Exercise 5.35. Suppose once again that there is one
good and S states. Suppose also that there are N assets, α 1 , α 2 , . . . , α N , that can be traded, each
being a vector in RS+ . Let the price of asset k be qk . We shall normalise prices so that they are real
prices. That is, qk is the number of units of the good that must be given up to purchase one unit of
asset k. Suppose an investor purchases xk units of each asset k.
(a) Show that the (column) vector
Ax ∈ RS+
is the induced asset held by the investor subsequent to his purchase, where A is the S × N matrix
whose kth column is α k , and x = (x1 , . . . , xN ) is the vector of the investor’s asset purchases.
262 CHAPTER 5
Ax − 1(q · x) ∈ RS+
describes the real net gain to the investor in every state, where 1 is the column vector of S 1’s.
(c) Suppose that every coordinate of the real net gain vector
Ax − 1(q · x)
is strictly positive. Argue that the investor can earn arbitrarily large profits with an initial outlay
of a single unit of the good by repurchasing x (or an affordable fraction of it) again and again
using short sales to cover his expenses, and always guaranteeing against bankruptcy in any state.
(d) Conclude from (c) that for markets to clear, there can be no x ∈ RN such that every coordinate
of the real net gain vector is strictly positive. (Parts (c) and (d) constitute an ‘arbitrage-pricing’
argument. We next turn to its consequences.)
(e) Let C = {y ∈ RN : y = Ax − 1(q · x) for some x ∈ RN }. Conclude from part (d) that
C ∩ RN
++ = ∅,
and use the separating hyperplane theorem, Theorem A2.24, to conclude that there is a non-zero
vector, p̂ ∈ RN such that
p̂ · y ≤ p̂ · z,
Argue that the inequality cannot be strict for any x ∈ RN because the inequality would then fail
for −x. Conclude that,
T
p̂ A − q x = 0, for all x ∈ RN ,
q = p̂T A,
qk = p̂ · α k .
(g) Compare the result in part (f) with the pricing of the asset that arose from the general equilib-
rium model considered in part (a) of Exercise 5.35. In that exercise, we assumed that all Arrow
securities were tradeable, i.e., we assumed that the market was complete. Conclude from the
GENERAL EQUILIBRIUM 263
current exercise that if there are no opportunities for profitable arbitrage among the assets that
are available for trade, then even if markets are incomplete there are implicit prices, given by
p̂, for all Arrow securities. Moreover, the prices of all tradeable assets are derived from these
underlying Arrow security prices.
5.37 Complete the proof of Lemma 5.4.
(a) Show that if an allocation x is an r-fold copy of the allocation (x1 , x2 , . . . , xI ) in E1 , and x is a
WEA in Er , then (x1 , x2 , . . . , xI ) is a WEA in E1 .
(b) Show that if (x1 , x2 , . . . , xI ) is a WEA in E1 , then its r-fold copy is a WEA in Er .
5.38 Give a general proof of Theorem 5.16 that is valid for an arbitrary number I of consumer types and
an arbitrary number r of consumers of each type.
5.39 (Cornwall) In an economy with two types of consumer, each type has the respective utility function
and endowments:
(a) Draw an Edgeworth box for this economy when there is one consumer of each type.
(b) Characterise as precisely as possible the set of allocations that are in the core of this
two-consumer economy.
(c) Show that the allocation giving x11 = (4, 4) and x21 = (6, 6) is in the core.
(d) Now replicate this economy once so there are two consumers of each type, for a total of four
consumers in the economy. Show that the double copy of the previous allocation, giving x11 =
x12 = (4, 4) and x21 = x22 = (6, 6), is not in the core of the replicated economy.
5.40 In a pure exchange economy, consumer i envies consumer j if x j i xi . (Thus, i envies j if i likes j’s
bundle better than his own.) An allocation x is therefore envy free if xi i x j for all i and j. We know
that envy-free allocations will always exist, because the equal-division allocation, x̄ = (1/I)e, must
be envy free. An allocation is called fair if it is both envy free and Pareto efficient.
(a) In an Edgeworth box, demonstrate that envy-free allocations need not be fair.
(b) Under Assumption 5.1 on utilities, prove that every exchange economy having a strictly positive
aggregate endowment vector possesses at least one fair allocation.
5.41 There are two consumers with the following characteristics:
(a) Find the equation for the contract curve in this economy, and carefully sketch it in the
Edgeworth box.
(b) Find a fair allocation of goods to consumers in this economy.
(c) Now suppose that the economy is replicated three times. Find a fair allocation of goods to
consumers in this new economy.
264 CHAPTER 5
(a) What are the necessary conditions for a Pareto-efficient distribution of goods to consumers?
(b) Are the WEAs Pareto efficient in an economy like this? Why or why not?
5.44 In the text, we have called an allocation x̄ Pareto efficient if there exists no other feasible allocation
x such that xi i x̄i for all i and x j j x̄ j for at least one j. Sometimes, an allocation x̄ is called Pareto
efficient if there exists no other feasible allocation x such that xi i x̄i for all i.
(a) Show that when preferences are continuous and strictly monotonic, the two definitions are
equivalent.
(b) Construct an example where the two definitions are not equivalent, and illustrate in an
Edgeworth box.
5.45 (Eisenberg’s Theorem) Ordinarily, a system of market demand functions need not satisfy the
properties of an individual consumer’s demand system, such as the Slutsky restrictions, negative
semidefiniteness of the substitution matrix, and so forth. Sometimes, however, it is useful to know
when the market demand system does behave as though it were generated from a single, hypo-
thetical consumer’s utility-maximisation problem. Eisenberg (1961) has shown that this will be the
case when consumers’ preferences can be represented by linear homogeneous utility functions (not
necessarily identical), and when the distribution of income is fixed and independent of prices.
In particular, let xi (p, yi ) solve maxxi ∈Rn+ ui (xi ) subject to p · xi = yi for i ∈ I . Let x(p, y∗ )
solve maxx∈Rn+ U(x) subject to p · x = y∗ . If (1) ui (xi ) is linear homogeneous for all i ∈ I ; (2) y∗
is aggregate income and income shares are fixed so that yi = δ i y∗ for 0 < δ i < 1 and i∈I δ i = 1;
and (3)
(ui (xi ))δ
i
U(x) = max s.t. x= xi ,
i∈I i∈I
then x(p, y∗ ) = i∈I xi (p, yi ), so the system of market demand functions behaves as though
generated from a single utility-maximisation problem.
With only few exceptions, we have so far tended to concentrate on questions of ‘positive
economics’. We have primarily been content to make assumptions about agents’ motiva-
tions and circumstances, and deduce from these the consequences of their individual and
collective actions. In essence, we have characterised and predicted behaviour, rather than
judged it or prescribed it in any way. In most of this chapter, we change our perspective
from positive to normative, and take a look at some important issues in welfare economics.
At the end of the chapter we return to positive economics and consider how individuals
motivated by self-interest make the problem of social choice doubly difficult.
x1
01
To make things a bit more concrete for just a moment, let us consider the distribution
problem in a simple, two-good, two-person Edgeworth box economy, like the one depicted
in Fig. 6.1. There, each point in the box represents some way of dividing society’s fixed
endowment of goods between its two members, so we can view each point in the box as
one of the (mutually exclusive) alternate social states we could achieve. Each agent has his
or her own preferences over these alternatives, and clearly these preferences are often at
odds with one another. The social choice problem involved is easy to state. Which of the
possible alternative distributions is best for society?
Although easy to state, the question is hard to answer. Perhaps without too much
disagreement, points off the contract curve can be ruled out. Were one of these to be
recommended as the best, it would be easy to find some other point on the contract curve
that everyone prefers. Because it would be hard to argue with such unanimity of opinion,
it is probably safe to say that our search for the best alternative ought to be restricted to the
Pareto-efficient ones.
But which of these is best? Many will find it easy to say that wildly unequal alterna-
tives such as x̄ must also be ruled out, even though they are Pareto efficient. Yet in doing
so, appeal is being made to some additional ethical standard beyond the simple Pareto
principle because that principle is silent on the essential question involved: namely, how
may we trade off person 2’s well-being for that of person 1 in the interests of society as a
whole? In trying to make such trade-offs, does intensity of preference matter? If we think it
does, other questions enter the picture. Can intensity of preference be known? Can people
tell us how strongly they feel about different alternatives? Can different people’s intense
desires be compared so that a balancing of gains and losses can be achieved?
The questions are many and the problems are deep. To get very far at all, we will
need to have a systematic framework for thinking about them. Arrow (1951) has offered
such a framework, and we begin with a look at his path-breaking analysis of some of these
problems.
SOCIAL CHOICE AND WELFARE 269
We take it for granted that the ranking of alternatives from a social point of view
should depend on how individuals rank them. The problem considered by Arrow can be
simply put. How can we go from the often divergent, but individually consistent, personal
views of society’s members to a single and consistent social view?
This is not an easy problem at all. When we insist on transitivity as a criterion for
consistency in social choice, certain well-known difficulties can easily arise. For example,
Condorcet’s paradox illustrates that the familiar method of majority voting can fail to
satisfy the transitivity requirement on R. To see this, suppose N = 3, X = {x, y, z}, and
270 CHAPTER 6
In a choice between x and y, x would get two votes and y would get one, so the social
preference under majority rule would be xPy. In a choice between y and z, majority voting
gives yPz. Because xPy and yPz, transitivity of social preferences would require that xPz.
However, with these individual preferences, z gets two votes to one for x, so majority
voting here would give the social preference as zPx, thus violating transitivity. Note that in
this example, the mechanism of majority rule is ‘complete’ in that it is capable of giving
a best alternative in every possible pairwise comparison of alternatives in X. The failure
of transitivity, however, means that within this set of three alternatives, no single best
alternative can be determined by majority rule. Requiring completeness and transitivity of
the social preference relation implies that it must be capable of placing every element in X
within a hierarchy from best to worst. The kind of consistency required by transitivity has,
therefore, considerable structural implications.
Yet consistency, alone, is not particularly interesting or compelling in matters of
social choice. One can be perfectly consistent and still violate every moral precept the
community might share. The more interesting question to ask might be put like this: how
can we go from consistent individual views to a social view that is consistent and that also
respects certain basic values on matters of social choice that are shared by members of the
community? Because disagreement among individuals on matters of ‘basic values’ is in
fact the very reason a problem of social choice arises in the first place, we will have to be
very careful indeed in specifying these if we want to keep from trivialising the problem at
the outset.
With such cautions in mind, however, we can imagine our problem as one of finding
a ‘rule’, or function, capable of aggregating and reconciling the different individual views
represented by the individual preference relations Ri into a single social preference relation
R satisfying certain ethical principles. Formally, then, we seek a social welfare function,
f , where
R = f (R1 , . . . , RN ).
Thus, f takes an N-tuple of individual preference relations on X and turns (maps) them
into a social preference relation on X.
For the remainder of this subsection we shall suppose that the set of social states, X,
is finite.
Arrow has proposed a set of four conditions that might be considered minimal
properties the social welfare function, f , should possess. They are as follows.
SOCIAL CHOICE AND WELFARE 271
of everyone else in society. Thus, only the most extreme and absolute form of dictatorship
is specifically excluded. Not even a ‘virtual’ dictator, one who always gets his way on all
but one pair of social alternatives, would be ruled out by this condition alone.
Now take a moment to re-examine and reconsider each of these conditions in turn.
Play with them, and try to imagine the kind of situations that could arise in a problem of
social choice if one or more of them failed to hold. If, in the end, you agree that these are
mild and minimal requirements for a reasonable social welfare function, you will find the
following theorem astounding, and perhaps disturbing.
Proof: The strategy of the proof is to show that conditions U, WP, and IIA imply the exis-
tence of a dictator. Consequently, if U, WP, and IIA hold, then D must fail to hold, and so
no social welfare function can satisfy all four conditions.
The proof, following Geanakoplos (1996), proceeds in four steps. Note that axiom U,
unrestricted domain, is used in each step whenever we choose or alter the preference profile
under consideration. Unrestricted domain ensures that every such profile of preferences is
admissible.
Step 1: Consider any social state, c. Suppose each individual places state c at the
bottom of his ranking. By WP, the social ranking must place c at the bottom as well. See
Fig. 6.2.
Step 2: Imagine now moving c to the top of individual 1’s ranking, leaving the rank-
ing of all other states unchanged. Next, do the same with individual 2: move c to the top of
2’s ranking. Continue doing this one individual at a time, keeping in mind that each of these
changes in individual preferences might have an effect on the social ranking. Eventually,
c will be at the top of every individual’s ranking, and so it must then also be at the top of
the social ranking by WP. Consequently, there must be a first time during this process that
the social ranking of c increases. Let individual n be the first such that raising c to the top
of his ranking causes the social ranking of c to increase.
R1 R2 ··· RN R
x x ··· x x
y y ··· y y
· · · ·
· · · ·
· · · ·
c c ··· c c
Figure 6.2. A consequence of WP and U in the proof of Arrow’s theorem.
SOCIAL CHOICE AND WELFARE 273
R1 R2 ··· Rn ··· RN R
c c ··· c ··· x c
x x ··· ··· y
y y · ·
· · · ·
· · · ·
· · · ·
w w ··· ··· c w
Figure 6.3. Axioms WP, U, and IIA yield a pivotal individual.
We claim that, as shown in Fig. 6.3, when c moves to the top of individual n’s rank-
ing, the social ranking of c not only increases but c also moves to the top of the social
ranking.
To see this, assume by way of contradiction that the social ranking of c increases,
but not to the top; i.e., αRc and cRβ for some states α, β = c.
Now, because c is either at the bottom or at the top of every individual’s ranking,
we can change each individual i’s preferences so that βPi α, while leaving the position of
c unchanged for that individual. But this produces our desired contradiction because, on
the one hand, βPi α for every individual implies by WP that β must be strictly preferred to
α according to the social ranking; i.e., βPα. But, on the other hand, because the rankings
of c relative to α and of c relative to β have not changed in any individual’s ranking, IIA
implies that the social rankings of c relative to α and of c relative to β must be unchanged;
i.e., as initially assumed, we must have αRc and cRβ. But transitivity then implies αRβ,
contradicting βPα. This establishes our claim that c must have moved to the top of the
social ranking as in Fig. 6.3.
Step 3: Consider now any two distinct social states a and b, each distinct from c. In
Fig. 6.3, change the profile of preferences as follows: change individual n’s ranking so that
aPn cPn b, and for every other individual rank a and b in any way so long as the position of c
is unchanged for that individual. Note that in the new profile of preferences the ranking of
a to c is the same for every individual as it was just before raising c to the top of individual
n’s ranking in Step 2. Therefore, by IIA, the social ranking of a and c must be the same as
it was at that moment. But this means that aPc because at that moment c was still at the
bottom of the social ranking.
Similarly, in the new profile of preferences, the ranking of c to b is the same for
every individual as it was just after raising c to the top of individual n’s ranking in Step 2.
Therefore by IIA, the social ranking of c and b must be the same as it was at that moment.
But this means that cPb because at that moment c had just risen to the top of the social
ranking.
So, because aPc and cPb, we may conclude by transitivity that aPb. Note then that no
matter how the others rank a and b, the social ranking agrees with individual n’s ranking.
By IIA, and because a and b were arbitrary, we may therefore conclude that for all social
274 CHAPTER 6
That is, individual n is a dictator on all pairs of social states not involving c. The final step
shows that individual n is in fact a dictator.
Step 4: Let a be distinct from c. We may repeat the above steps with a playing
the role of c to conclude that some individual is a dictator on all pairs not involving a.
However, recall that individual n’s ranking of c (bottom or top) in Fig. 6.3 affects the
social ranking of c (bottom or top). Hence, it must be individual n who is the dictator on
all pairs not involving a. Because a was an arbitrary state distinct from c, and together with
our previous conclusion about individual n, this implies that n is a dictator.
Although here we have cast Arrow’s theorem as an ‘impossibility’ result, the proof
just sketched suggests it can also be stated as a ‘possibility’ result. That is, we have shown
that any social welfare function satisfying the three conditions U, WP, and IIA must yield
a social preference relation that exactly coincides with one person’s preferences whenever
that person’s preferences are strict. As you are asked to explore in Exercise 6.3 this leaves
several ‘possibilities’ for the social welfare function, although all of them are dictatorial
according to condition D.
1 The diagrammatic idea of this proof is due to Blackorby, Donaldson, and Weymark (1984).
2 This assumption can be weakened substantially. For example, the argument we shall provide is valid so long as
X ⊆ RK contains a point and a sequence of distinct points converging to it.
3 If X were finite, every Ri would have a utility representation and every utility representation would be continu-
ous. Hence, in the finite case, assuming continuity does not restrict the domain of preferences at all. This is why
we assume an infinite X here, so that continuity has ‘bite’.
SOCIAL CHOICE AND WELFARE 275
For each continuous u(·) = (u1 (·), . . . , uN (·)) we henceforth let fu denote the social
utility function f (u1 (·), . . . , uN (·)) and we let fu (x) = [f (u1 (·), . . . , uN (·))](x) denote the
utility assigned to x ∈ X.
To maintain the idea that the social preference relation is determined only by the indi-
vidual preference relations, Ri – an idea that is built into the previous section’s treatment
of Arrow’s Theorem – it must be the case that the ordering of the social states according
to fu = f (u1 (·), . . . , uN (·)) would be unchanged if any ui (·) were replaced with a utility
function representing the same preferences. Thus, because two utility functions represent
the same preferences if and only if one is a strictly increasing transformation of the other,
the social welfare function f must have the following property: if for each individual i,
ui : X → R is continuous and ψ i : R → R is strictly increasing and continuous, then
where ψ ◦ u(·) = (ψ 1 (u1 (·)), . . . , ψ N (uN (·))). That is, f must be order-invariant to
strictly increasing continuous transformations of individual utility functions, where only
continuous transformations ψ i are considered to ensure that the transformed individual
utility functions remain continuous.
Condition U in this setup means that the domain of f is the entire set of profiles
of continuous individual utility functions. Condition IIA means precisely what it meant
before, but note in particular it implies that whether fu (x) is greater, less, or equal to fu (y)
can depend only on the vectors u(x) = (u1 (x), . . . , uN (x)) and u(y) = (u1 (y), . . . , uN (y))
and not on any other values taken on by the vector function u(·) = (u1 (·), . . . , uN (·)).4
The meanings of conditions WP and D remain as before.
Consider now imposing the following additional condition on f .
PI. Pareto Indifference Principle. If ui (x) = ui (y) for all i = 1, . . . , N, then fu (x) =
fu (y).
The Pareto Indifference Principle requires society to be indifferent between two
states if each individual is indifferent between them.
It can be shown (see Exercise 6.4 and also Sen (1970a)) that if f satisfies U, IIA,
WP, and PI, then there is a strictly increasing continuous function, W : RN → R, such
that for all social states x, y, and every profile of continuous individual utility functions
u(·) = (u1 (·), . . . , uN (·)),
fu (x) ≥ fu (y) if and only if W(u1 (x), . . . , uN (x)) ≥ W(u1 (y), . . . , uN (y)). (6.2)
Condition (6.2) says that the social welfare function f can be summarised by a
strictly increasing and continuous function W – that we will also call a social welfare
function – that simply orders the vectors of individual utility numbers corresponding to
4 As already noted, the social utility, f (x), assigned to the alternative x might depend on each individual’s
u
entire utility function. IIA goes a long way towards requiring that fu (x) depend only on the vector of utilities
(u1 (x), . . . , uN (x)).
276 CHAPTER 6
the alternatives. Consequently, we may restrict our attention to this simpler yet equivalent
form of a social welfare function. It is simpler because it states directly that the social
utility of an alternative depends only on the vector of individual utilities of that alternative.
Our objective now is to deduce the existence of a dictator from the fact that W
satisfies (6.2).
The property expressed in (6.1) that f is order-invariant to continuous strictly increas-
ing transformations of individual utility functions has important implications for the
welfare function W. For suppose (u1 , . . . , uN ) and (ũ1 , . . . , ũN ) are utility vectors asso-
ciated with two social states x and y. Combining (6.1) with (6.2) implies that W’s ordering
of RN must be invariant to any continuous strictly increasing transformation of individual
utility numbers. Therefore if W ranks x as socially better than y, i.e., if
u2
~
v
~
II u I
III IV
u1
Now consider an arbitrary point ũ in II. One of the following must hold
Suppose for the moment that W(ū) < W(ũ). Then because W’s ordering of RN is invari-
ant to continuous strictly increasing transformations of utilities, that same ranking must be
preserved when we apply any continuous strictly increasing transformations to the indi-
viduals’ utilities. Suppose we choose two strictly increasing functions, ψ 1 and ψ 2 , where
ψ 1 (ū1 ) = ū1 ,
ψ 2 (ū2 ) = ū2 .
Now apply these functions to the coordinates of the point ũ. Because ũ is in region II, we
know that ũ1 < ū1 and ũ2 > ū2 . Then because the ψi are strictly increasing, when applied
to ũ, we must have
Equations (6.6) and (6.7), together, inform us that the point ṽ ≡ (ṽ1 , ṽ2 ) must be some-
where in region II, as well. Because we have complete flexibility in our choice of the
continuous strictly increasing ψ i , we can, by an appropriate choice, map ũ into any point
in region II.5 But then because the social ranking of the underlying social states must
be invariant to such transforms of individuals’ utility, every point in region II must be
ranked the same way relative to ū! If, as we supposed, W(ū) < W(ũ), then every point
in region II must be preferred to ū. Yet nowhere in the argument did we use the fact
that W(ū) < W(ũ). We could have begun by supposing any of (6.3), (6.4), or (6.5), and
reached the same general conclusion by the same argument. Thus, under the invariance
requirements on individual utility, every point in region II must be ranked in one of three
ways relative to ū: either ū is preferred, indifferent to, or worse than every point in region
II. We will write this as the requirement that exactly one of the following must hold:
Note that (6.9) certainly cannot hold, for this would mean that all points in region
II, being indifferent (under W) to ū, are indifferent to one another. But this contradicts
5 For example, to obtain ψ i (ūi ) = ūi and ψ i (ũi ) = ui we can choose the continuous function
i
ūi − ui u − ũi i
ψ i (t) ≡ t+ i ū ,
ū − ũ
i i ū − ũ i
which is the form ψ i (t) = α i t + β i . Note that for any choice of (u1 , u2 ) in region II, α 1 , α 2 > 0.
278 CHAPTER 6
Now, note that if adjacent regions are ranked the same way relative to ū, then the
dashed line separating the two regions must be ranked that same way relative to ū. For
example, suppose regions I and II are ranked above ū. Since by WP any point on the
dashed line above ū is ranked above points in region II that lie strictly below it, transitivity
implies this point on the dashed line must be ranked above ū.
Consequently, if (6.11) holds, then because region I is ranked above ū and region III
is ranked below, the social ranking must be as given in Fig. 6.5(a), where ‘+’ (‘−’) denotes
u2 u2
⫹ ⫹ ⫹ ⫹ ⫺ ⫺ ⫺ ⫹ ⫹
⫹ ⫹ ⫹ ⫹ ⫺ ⫺ ⫺ ⫹ ⫹
u
⫺ ⫺ ⫺ u ⫹ ⫹
⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫹ ⫹
⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫹ ⫹
u1 u1
⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫺ ⫹ ⫹
(a) (b)
utility vectors u = (u1 , u2 ) with W(u) greater than (less than) W(ū). But the continuity of
W then implies that the indifference curve through ū is a horizontal straight line. On the
other hand, if instead (6.12) holds so that Fig. 6.5(b) is relevant, then the indifference curve
through ū would be a vertical straight line.
So, because ū was arbitrary, we may conclude that the indifference curve through
every utility vector is either a horizontal or a vertical straight line. However, because indif-
ference curves cannot cross one another, this means that either all indifference curves are
horizontal straight lines, in which case individual 2 would be a dictator, or all indifference
curves are vertical straight lines, in which case individual 1 is a dictator. In either case, we
have established the existence of a dictator and the proof is complete.
Throughout the remainder of this section we will assume that the set of social states
X is a non-singleton convex subset of Euclidean space and that all social choice func-
tions, f , under consideration satisfy strict welfarism (i.e., U, WP, IIA, and PI), where U
means that f maps continuous individual utility functions into a continuous social utility
function.6 Consequently (see (6.2) and Exercise 6.4) we may summarise f with a strictly
increasing continuous function W : RN → R with the property that for every continuous
u(·) = (u1 (·), . . . , uN (·)) and every pair of states x and y,
fu (x) ≥ fu (y) if and only if W(u1 (x), . . . , uN (x)) ≥ W(u1 (y), . . . , uN (y)),
where we remind the reader that fu (x) is the social utility assigned to x when the profile of
individual utility functions is u(·) = (u1 (·), . . . , uN (·)).
The extent to which utility is assumed to be measurable and interpersonally compa-
rable can best be viewed as a question of how much information society uses when making
social decisions. This is quite distinct from the kind of ethical restrictions a society might
wish those decisions to respect. There is, of course, some ethical content to the conditions
U, WP, IIA and PI embodied in strict welfarism. However, a society may be willing to go
further and build even more ethical values into its social welfare function. Each amounts
to imposing an extra requirement on the strictly increasing and continuous social welfare
function, W. Here, we consider only two.
DEFINITION 6.3 Two More Ethical Assumptions on the Social Welfare Function
A. Anonymity. Let ū be a utility N-vector, and let ũ be another vector obtained
from ū after some permutation of its elements. Then W(ū) = W(ũ).
HE. Hammond Equity. Let ū and ũ be two distinct utility N-vectors and suppose
that ūk = ũk for all k except i and j. If ūi < ũi < ũj < ūj , then W(ũ) ≥ W(ū).
Condition A simply says people should be treated symmetrically. Under A, the rank-
ing of social states should not depend on the identity of the individuals involved, only the
levels of welfare involved. Condition HE is slightly more controversial. It expresses the
idea that society has a preference towards decreasing the dispersion of utilities across indi-
viduals. (Note that there is less dispersion of utilities under ū than under ũ. Nevertheless,
can you think of why one might object to ranking ū above ũ?) In what follows, we use these
conditions to illustrate how some well-known social welfare functions can be characterised
axiomatically.
Proof: Suppose that W is continuous, strictly increasing and satisfies HE. We must show
that it can take the form W = min[u1 , . . . , uN ], i.e., that W(ū) ≥ W(ũ) if and only if
min[ū1 , . . . , ūN ] ≥ min[ũ1 , . . . , ũN ].
We prove this diagrammatically only for N = 2 by once again characterising the
map of social indifference curves. Consult Fig. 6.6 throughout the proof. To begin, choose
an arbitrary point a on the 45◦ line and consider the infinite ray extending from a to the
right. We shall first argue that every point on this ray is socially indifferent to a according
to W.
Consider an arbitrary point ū = (ū1 , ū2 ) on the ray. We wish to show that W(ū) =
W(a). Let region I denote the region to the left of ū below the 45◦ and above the ray, and
let region II denote the region to the left of ū below the 45◦ line and below the ray. Thus
the ray is in neither region. Consider now an arbitrary point ũ = (ũ1 , ũ2 ) in region I. One
can easily see that to be in I, ũ must satisfy the inequalities ū2 < ũ2 < ũ1 < ū1 . (Think
~
u
I
a u
II
45⬚
u1
about this.) But then HE implies that W(ũ) ≥ W(ū). Since ũ was an arbitrary point in I,
the social utility of every point in I is at least W(ū), which we write as W(I) ≥ W(ū).8 As
for region II, we must have W(II) < W(ū) because every point in region II is south-west
of ū and W is strictly increasing. Thus, we have shown that,
Notice now that for every point on the line joining a and ū there are arbitrarily
nearby points in region I each of which we have shown to receive social utility at least
W(ū) and there are arbitrarily nearby points in region II each of which we have shown to
receive social utility less than W(ū). Hence, by the continuity of W, every point on the line
joining a and ū must receive social utility equal to W(ū). In particular, W(a) = W(ū), as
we wished to show. Because ū was an arbitrary point on the infinite ray starting at a and
extending rightwards, we conclude that every point on this ray is socially indifferent to a.
An analogous argument to that just given shows also that every point on the infi-
nite ray starting at a and extending upwards is also socially indifferent to a. Because W
is strictly increasing, no other points can be indifferent to a and therefore the union of
these two rays is the social indifference curve through a. Because a was an arbitrary point
on the 45◦ line, the social indifference map for W is therefore as shown in Fig. 6.7, with
indifference curves further from the origin receiving higher social utility because W is
strictly increasing. Thus W has the same indifference map as the function min[u1 , u2 ], as
desired.
Finally, we note that if W = min[u1 , . . . , uN ] then A and HE are easily shown to
be satisfied. Moreover, if ψ : R → R is strictly increasing, then W(ψ(u1 ), . . . , ψ(uN )) =
ψ(W(u1 , . . . , uN )) and therefore W(ψ(u1 ), . . . , ψ(uN )) ≥ W(ψ(ũ1 ), . . . , ψ(ũN )) if and
only if W(u1 , . . . , uN ) ≥ W(ũ1 , . . . , ũN ). Hence, W is utility-level invariant.
8 In fact, W(I) > W(ū) because N = 2 and W is strictly increasing, but we will not need the strict inequality.
284 CHAPTER 6
u2
45⬚ u1
~
uT
u
~
u
45⬚ ⫺1
u1
0
these transforms to ũ and obtain (ψ 1 (ũ1 ), ψ 2 (ũ2 )) = ū, and apply them to ū to obtain
(ψ 1 (ū1 ), ψ 2 (ū2 )) = ũT . So, these transforms map ũ into ū and map ū into ũT . Thus, if
W(ū) > W(ũ), as we have assumed, then by the invariance requirement, we must likewise
have W(ũT ) > W(ū). But together these imply W(ũT ) > W(ũ), violating A, so W(ū) >
W(ũ) cannot hold. If, instead, we suppose W(ũ) > W(ū), then by using a similar argu-
ment, we get a similar contradiction. We therefore conclude that W(ū) = W(ũ). Condition
A then tells us W(ũT ) = W(ū) = W(ũ). Now recall that ũ was chosen arbitrarily in Ω, so
the same argument can be made for any point in that set, and so we have W(Ω) = W(ū).
Because W is strictly increasing, every point north-east of Ω must be strictly pre-
ferred to every point in Ω, and every point south-west must be strictly worse. Thus, Ω
is indeed a social indifference curve, and the social indifference map is a set of parallel
straight lines, each with a slope of −1, with social preference increasing north-easterly.
This, of course, implies the social welfare function can be chosen to be of the form
W = u1 + u2 , completing the proof.
If we drop the requirement of anonymity, the full range of generalised utilitar-
ian orderings is allowed. These are represented by linear social welfare functions of
the form W = i ai ui , where ai ≥ 0 for all i and aj > 0 for some j. Under generalised
utilitarian criteria, the welfare sum is again the important issue, but the welfare of different
individuals can be given different ‘weight’ in the social assessment.
matters. Then the social welfare function need not be invariant to strictly increasing trans-
formations unless they are identical and linear, (i.e., ψ(ui ) = bui , where b > 0 is common
to all individuals) because only these are guaranteed to maintain the ordering of percent-
age changes in utility both for and across individuals. If the social welfare function f is
permitted to depend only on the ordering of percentage changes in utility for and across
individuals, then it must be invariant to arbitrary, but common, strictly increasing indi-
vidual transformations of utility of the form ψ(ui ) = bui , where b > 0 is common to all
individuals and we will then say that f is utility-percentage invariant.
Consequently, both the Rawlsian and utilitarian social welfare functions are permit-
ted here. Indeed, a whole class of social welfare functions are now admitted as possibilities.
When a continuous social welfare function satisfies strict welfarism, and is invariant to
identical positive linear transformations of utilities, social indifference curves must be
negatively sloped and radially parallel.
To see this, consider Fig. 6.9. First, choose an arbitrary point ū. Clearly, as in
the example sketched, the social indifference curve through ū must be negatively sloped
because, by strict welfarism, W is strictly increasing. Now choose any other point on the
ray OA through ū. This point must be of the form bū for some constant b > 0. Now choose
any other point ũ such that W(ū) = W(ũ). By the invariance requirement, we must also
have W(bū) = W(bũ), where ũ and bũ are on the ray OB, as indicated.
We want to show that the slope of the tangent to the social indifference curve at
ū is equal to the slope of the tangent at bū. First, note that the slope of the chord CC
approximates the slope of the tangent at ū, and the slope of the chord DD approximates
the slope of the tangent at bū. Because the triangles OCC and ODD are similar, the slope
of CC is equal to the slope of DD. Now imagine choosing our point ũ closer and closer
to ū along the social indifference curve through ū. As ũ approaches ū, correspondingly bũ
approaches bū along the social indifference curve through bū, and the chords CC and DD
remain equal in slope. In the limit, the slope of CC converges to the slope of the tangent at
ū, and the slope of DD converges to the slope of the tangent at bū. Thus, the slope of the
D bu
B
~
bu
C u D
~
u
C
u1
0
SOCIAL CHOICE AND WELFARE 287
social indifference curve at ū must be equal to the slope of the curve at bū. Because ū and
b > 0 were arbitrarily chosen, the slope of every social indifference curve must be the same
at every point along a given ray, though, of course, slopes can differ across different rays.
A function’s level curves will be radially parallel in this way if and only if the func-
tion is homothetic. Thus, strict welfarism and utility-percentage invariance allow any con-
tinuous, strictly increasing, homothetic social welfare function. If condition A is added, the
function must be symmetric, and so its social indifference curves must be ‘mirror images’
around the 45◦ line. Sometimes a convexity assumption is also added. When the social wel-
fare function is quasiconcave the ‘socially at least as good as’ sets are convex, and the ethi-
cal implication is that inequality in the distribution of welfare, per se, is not socially valued.
Under strict quasiconcavity, there is a strict bias in favour of equality. (Do you see why?)
Because every homothetic function becomes a linear homogeneous function under
some positive monotonic transform, for simplicity let us think in terms of linear homoge-
neous forms alone. Finally, suppose in addition to WP, A, and convexity, we add the strong
separability requirement that the marginal rate of (social) substitution between any two
individuals is independent of the welfare of all other individuals. Then the social welfare
function must be a member of the CES family:
N 1/ρ
W= (ui )ρ , (6.13)
i=1
where 0 = ρ < 1, and σ = 1/(1 − ρ) is the (constant and equal) elasticity of social
substitution between any two individuals.
This is a very flexible social welfare function. Different values for ρ give different
degrees of ‘curvature’ to the social indifference curves, and therefore build in different
degrees to which equality is valued in the distribution of welfare. Indeed, the utilitarian
form – which implies complete social indifference to how welfare is distributed – can
be seen as a limiting case of (6.13) as ρ → 1 (σ → ∞). As ρ → −∞ (σ → 0), (6.13)
approaches the Rawlsian form, where the social bias in favour of equality is absolute. The
range of possibilities is illustrated in Fig. 6.10.
u2 u2 u2
→1 ⫺⬁ ⬍ ⬍ 1 → ⫺⬁
u1 u1 u1
(a) (b) (c)
6.4 JUSTICE
Beyond the technical question of what must be assumed in the way of measurability and
comparability of utility to sensibly apply a given social welfare function, there is the basic
reality that the choice among such functions is effectively a choice between alternative sets
of ethical values. On this score, then, matters of opinion really are involved. They rightfully
belong in the very first stage of any analysis aimed at assessing the social significance of
economic policies or institutions, when the choice of social welfare function is made.
The literature in economics and the literature in philosophy – one and the same in
the days before Adam Smith – have combined again more recently to jointly consider the
moral character of the choice that must be made. Guidance has been sought by appeal
to axiomatic theories of justice that accept the social welfare approach to social decision
making. Two broad historical traditions on these questions can be distinguished. One is
the utilitarian tradition, associated with Hume, Smith, Bentham, and Mill. The other is the
‘contractarian’ tradition, associated with Locke, Rousseau, and Kant. More recently, these
two traditions have been refined and articulated through the work of Harsanyi (1953, 1955,
1975) and Rawls (1971), respectively.
Both Harsanyi and Rawls accept the notion that a ‘just’ criterion of social welfare
must be one that a rational person would choose if he were ‘fair-minded’. To help ensure
that the choice be fair-minded, each imagines an ‘original position’, behind what Rawls
calls a ‘veil of ignorance’, in which the individual contemplates this choice without know-
ing what his personal situation and circumstances in society actually will be. Thus, each
imagines the kind of choice to be made as a choice under uncertainty over who you will
end up having to be in the society you prescribe. The two differ, however, in what they see
as the appropriate decision rule to guide the choice in the original position.
Harsanyi’s approach is remarkably straightforward. First, he accepts the von
Neumann-Morgenstern axiomatic description of rationality under conditions of uncer-
tainty. Thus, a person’s preferences can be represented by a VNM utility function over
social states, ui (x), which is unique up to positive affine transforms. By the principle of
insufficient reason, he then suggests that a rational person in the original position must
assign an equal probability to the prospect of being in any other person’s shoes within the
society. If there are N people in society, there is therefore a probability 1/N that i will
end up in the circumstances of any other person j. Person i therefore must imagine those
circumstances and imagine what his preferences, uj (x), would be. Because a person might
end up with any of N possible ‘identities’, a ‘rational’ evaluation of social state x then
would be made according to its expected utility:
N
(1/N)ui (x). (6.14)
i=1
In a social choice between x and y, the one with the higher expected utility in (6.14) must
be preferred. But this is equivalent to saying that x is socially preferred to y if and only if
N
N
ui (x) > ui (y),
i=1 i=1
a purely utilitarian criterion.
SOCIAL CHOICE AND WELFARE 289
Rawls rejects Harsanyi’s utilitarian rule for several reasons. Among them, he objects
to the assignment of any probability to the prospect of being any particular individual
because in the original position, there can be no empirical basis for assigning such prob-
abilities, whether equal or not. Thus, the very notion of choice guided by expected utility
is rejected by Rawls. Instead, he views the choice problem in the original position as one
under complete ignorance. Assuming people are risk averse, he argues that in total igno-
rance, a rational person would order social states according to how he or she would view
them were they to end up as society’s worst-off member. Thus, x will be preferred to y as
N
N
W= vi (x) ≡ − ui (x)−a . (6.16)
i=1 i=1
Because the ordering of states given by (6.16) has only ordinal significance, it will be
exactly the same under the positive monotonic transform of W given by
N −1/a
W ∗ = (−W)−1/a ≡ ui (x)−a (6.17)
i=1
For ρ ≡ −a < 0, this is in the form of (6.11). We have already noted that as ρ → −∞
(a → ∞), this approaches the maximin criterion (6.13) as a limiting case. Thus, Rawls’
maximin criterion – far from being incompatible with Harsanyi’s utilitarianism – instead
can be seen as a very special case of it, namely, the one that arises when individuals are
infinitely risk averse.
On reflection, this makes a good deal of sense. Maximin decision rules are appealing
in strategic situations where the interests of some rational and fully informed opponent
are diametrically opposed to your own. In the kind of thought experiment required in
290 CHAPTER 6
the original position, there is little obvious justification for adopting such a decision rule,
unless, of course, you are extremely (irrationally?) pessimistic.
Once again, your choice of social welfare function is a choice of distributional values
and, therefore, a choice of ethical system. The choice is yours.
9 Another possibility is to attempt to infer an individual’s preferences from his observed choice behaviour. But
this too is problematic since an individual can alter his choice behaviour to profitably portray to society false
preferences.
10 Not all treatments of this topic include the full range condition in the definition of a social choice function,
choosing instead to add the range condition separately. The present treatment is more convenient for our purposes.
SOCIAL CHOICE AND WELFARE 291
Fix for the moment the preference profile, R−i , of all individuals but i and consider
two possible preferences, Ri and R̃i , for individual i. Let c(Ri , R−i ) = x and c(R̃i , R−i ) =
y. Altogether then, we have a situation in which, when the others report the profile R−i ,
individual i, by choosing to report either Ri or R̃i can choose to make the social state
either x or y. When would individual i have an incentive to lie about his preferences? Well,
suppose his true preferences happen to be Ri and that given these preferences he strictly
prefers y to x. If he reports honestly, the social state will be x. But if he lies and instead
reports R̃i , the social state will be y, a choice he strictly prefers. Hence, in this case, he has
an incentive to misreport his preferences.
What property would a social choice function have to have so that under no circum-
stance would any individual have an incentive to misreport his preferences? It must have
the following property called strategy-proofness.
12 Muller and Satterthwaite (1977) show that strategy-proofness is equivalent to what they call strong-positive
association, which is equivalent to monotonicity when individual preferences do not display indifference.
SOCIAL CHOICE AND WELFARE 293
The second part of the proof, like our first proof of Arrow’s theorem, will use a series
of well-chosen preference profiles to uncover a dictator. Given the results from Part 1, we
can and will freely use the fact that c(·) is both monotonic and Pareto efficient. Also, in
each of the particular figures employed in this proof, all individual rankings are strict.
That is, no individual is indifferent between any two social states. We emphasise that this
is not an additional assumption – we are not ruling out indifference. It just so happens that
we are able to provide a proof of the desired result by considering a particular subset of
preferences that do not exhibit indifference.
Step 1. Consider any two distinct social states x, y ∈ X and a profile of strict rankings
in which x is ranked highest and y lowest for every individual i = 1, . . . , N. Pareto effi-
ciency implies that the social choice at this profile is x. Consider now changing individual
1’s ranking by strictly raising y in it one position at a time. By monotonicity, the social
choice remains equal to x so long as y is below x in 1’s ranking. But when y finally does
rise above x, monotonicity implies that the social choice either changes to y or remains
equal to x (see Exercise 6.18(a)). If the latter occurs, then begin the same process with
individual 2, then 3, etc. until for some individual n, the social choice does change from
x to y when y rises above x in n’s ranking. (There must be such an individual n because
y will eventually be at the top of every individual’s ranking and by Pareto efficiency the
social choice will then be y.) Figs. 6.11 and 6.12 depict the situations just before and just
after individual n’s ranking of y is raised above x.
Step 2. This is perhaps the trickiest step in the proof, so follow closely. Consider
Figs. 6.13 and 6.14 below. Fig. 6.13 is derived from Fig. 6.11 (and Fig. 6.14 from Fig. 6.12)
by moving x to the bottom of individual i’s ranking for i < n and moving it to the second
last position in i’s ranking for i > n. We wish to argue that these changes do not affect the
social choices, i.e., that the social choices are as indicated in the figures.
294 CHAPTER 6
First, note that the social choice in Fig. 6.14 must, by monotonicity, be y because
the social choice in Fig. 6.12 is y and no individual’s ranking of y versus any other social
state changes in the move from Fig. 6.12 to Fig. 6.14 (see Exercise 6.18(b)). Next, note
that the profiles in Figs. 6.13 and 6.14 differ only in individual n’s ranking of x and y.
So, because the social choice in Fig. 6.14 is y, the social choice in Fig. 6.13 must, by
monotonicity, be either x or y (we used this same logic in Step 1 – see Exercise 6.18(a)). But
SOCIAL CHOICE AND WELFARE 295
if the social choice in Fig. 6.13 is y, then by monotonicity (see Exercise 6.18(b)), the social
choice in Fig. 6.11 must be y, a contradiction. Hence, the social choice in Fig. 6.13 is x.
Step 3. Because there are at least three social states, we may consider a social state
z ∈ X distinct from x and y. Since the (otherwise arbitrary) profile of strict rankings in
Fig. 6.15 can be obtained from the Fig. 6.13 profile without changing the ranking of x
versus any other social state in any individual’s ranking, the social choice in Fig. 6.15
must, by monotonicity, be x (see Exercise 6.18(b)).
Step 4. Consider the profile of rankings in Fig. 6.16 derived from the Fig. 6.15 profile
by interchanging the ranking of x and y for individuals i > n. Because this is the only
difference between the profiles in Figs. 6.15 and 6.16, and because the social choice in
Fig. 6.15 is x, the social choice in Fig. 6.16 must, by monotonicity, be either x or y (see
Exercise 6.18(a)). But the social choice in Fig. 6.16 cannot be y because z is ranked above
y in every individual’s Fig. 6.16 ranking, and monotonicity would then imply that the
social choice would remain y even if z were raised to the top of every individual’s ranking,
contradicting Pareto efficiency. Hence the social choice in Fig. 6.16 is x.
Step 5. Note that an arbitrary profile of strict rankings with x at the top of individual
n’s ranking can be obtained from the profile in Fig. 6.16 without reducing the ranking
of x versus any other social state in any individual’s ranking. Hence, monotonicity (see
Exercise 6.18(b)) implies that the social choice must be x whenever individual rankings are
strict and x is at the top of individual n’s ranking. You are asked to show in Exercise 6.19
that this implies that even when individual rankings are not strict and indifferences are
296 CHAPTER 6
present, the social choice must be at least as good as x for individual n whenever x is at
least as good as every other social state for individual n. So, we may say that individual
n is a dictator for the social state x. Because x was arbitrary, we have shown that for each
social state x ∈ X, there is a dictator for x. But there cannot be distinct dictators for distinct
social states (see Exercise 6.20). Hence there is a single dictator for all social states and
therefore the social choice function is dictatorial.
The message you should take away from the Gibbard-Satterthwaite theorem is that,
in a rich enough setting, it is impossible to design a non-dictatorial system in which social
choices are made based upon self-reported preferences without introducing the possibil-
ity that individuals can gain by lying. Fortunately, this does not mean that all is lost. In
Chapter 9 we will impose an important and useful domain restriction, known as quasi-
linearity, on individual preferences. This will allow us to escape the conclusion of the
Gibbard-Satterthwaite theorem and to provide an introduction to aspects of the theory of
mechanism design. Thus, the Gibbard-Satterthwaite theorem provides a critically impor-
tant lesson about the limits of designing systems of social choice based on self-reported
information and points us in the direction of what we will find to be rather fertile ground.
But before we can develop this further, we must become familiar with the essential and
powerful tools of game theory, the topic of our next chapter.
6.6 EXERCISES
6.1 Arrow (1951) shows that when the number of alternatives in X is restricted to just two, the method of
majority voting does yield a social welfare relation that satisfies the conditions of Assumption 6.1.
Verify, by example or more general argument, that this is indeed the case.
6.2 Show that the weak Pareto condition WP in Arrow’s theorem can be replaced with the even weaker
Pareto condition VWP (very weak Pareto) without affecting the conclusion of Arrow’s theorem,
where VWP is as follows.
VWP. ‘If xPi y for all i, then xPy’.
6.3 (a) Show that the social welfare function that coincides with individual i’s preferences satisfies U,
WP, and IIA. Call such a social welfare function an individual i dictatorship.
(b) Suppose that society ranks any two social states x and y according to individual 1’s preferences
unless he is indifferent in which case x and y are ranked according to 2’s preferences unless he
is indifferent, etc. Call the resulting social welfare function a lexicographic dictatorship. Show
that a lexicographic dictatorship satisfies U, WP and IIA and that it is distinct from an individual
i dictatorship.
(c) Describe a social welfare function distinct from an individual i dictatorship and a lexicographic
dictatorship that satisfies U, WP and IIA.
6.4 Suppose that X is a non-singleton convex subset of RK and that f is a social welfare function satisfy-
ing U in the sense that it maps every profile of continuous utility functions u(·) = (u1 (·), . . . , uN (·))
on X into a continuous social utility function fu : X → R. Suppose also that f satisfies IIA, WP,
and PI.
SOCIAL CHOICE AND WELFARE 297
Throughout this question you may assume that for any finite number of social states in X and
any utility numbers you wish to assign to them, there is a continuous utility function defined on all
of X assigning to those states the desired utility numbers. (You might wish to try and prove this. The
hints section provides a solution.)
(a) Using U, IIA, and PI, show that if u(x) = v(x ) and u(y) = v(y ), then fu (x) ≥ fu (y) if and only
if fv (x ) ≥ fv (y ).
Define the binary relation on RN as follows: (a1 , . . . , aN ) (b1 , . . . , bN ) if fu (x) ≥ fu (y) for
some vector of continuous utility functions u(·) = (u1 (·), . . . , uN (·)) and some pair of social states
x and y satisfying ui (x) = ai and ui (y) = bi for all i.
(b) Show that is complete.
(c) Use the fact that f satisfies WP to show that is strictly monotonic.
(d) Use the result from part (a) to show that is transitive. It is here where at least three social
states are needed. (Of course, being non-singleton and convex, X is infinite so that there are
many more states than necessary for this step.)
(e) It is possible to prove, using in particular the fact that X is non-singleton and convex, that is
continuous. But the proof is technically demanding. Instead, simply assume that is continuous
and use Theorems 1.1 and 1.3 to prove that there is a continuous and strictly increasing function
W : RN → R that represents . (You will need to provide a small argument to adjust for the
fact that the domain of W is RN while the domain of the utility functions in Chapter 1 is RN + .)
(f) Show that for every profile of continuous utility functions u(·) = (u1 (·), . . . , uN (·)) on X and all
pairs of social states x and y,
fu (x) ≥ fu (y) if and only if W(u1 (x), . . . , uN (x)) ≥ W(u1 (y), . . . , uN (y)).
(a) Suppose N = 2. As in Fig. 6.5, fix a utility vector (ū1 , ū2 ) in the plane and sketch the sets
of utility vectors that are socially preferred, socially worse and socially indifferent to (ū1 , ū2 )
under a lexicographic dictatorship where individual 1’s preferences come first and 2’s second.
Compare with Fig. 6.5. Pay special attention to the indifference sets.
(b) Conclude from Exercise 6.3 that our first proof of Arrow’s theorem does not rule out the possi-
bility of a lexicographic dictatorship and conclude from part (a) of this exercise that our second
diagrammatic proof does rule out lexicographic dictatorship. What accounts for the stronger
result in the diagrammatic proof?
6.6 In the diagrammatic proof of Arrow’s theorem, the claim was made that in Fig. 6.4, we could show
either W(ū) < W(IV) or W(ū) > W(IV). Provide the argument.
6.7 Provide the argument left out of the proof of Theorem 6.2 that the ray starting at a and extending
upward is part of a social indifference curve.
6.8 This exercise considers Theorem 6.2 for the general case of N ≥ 2. So, let W : RN → R be
continuous, strictly increasing and satisfy HE.
298 CHAPTER 6
(a) Suppose that min[u1 , . . . , uN ] = α. Show that W(u1 + ε, . . . , uN + ε) > W(α, α, . . . , α) for
every ε > 0 because W is strictly increasing. Conclude by the continuity of W that
W(u1 , . . . , uN ) ≥ W(α, α, . . . , α).
(b) Suppose that uj = min[u1 , . . . , uN ] = α and that ui > α. Using HE, show that W(α +
ε, uj , u−ij ) ≥ W(ui , uj − ε, u−ij ) for all ε > 0 sufficiently small, where u−ij ∈ RN−2 is the
vector (u1 , . . . , uN ) without coordinates i and j.
(c) Using the continuity of W, conclude from (b) that if min[u1 , . . . , uN ] = α, then for every indi-
vidual i, W(α, u−i ) ≥ W(u1 , . . . , uN ), where u−i ∈ RN−1 is the vector (u1 , . . . , uN ) without
coordinate i.
(d) By successively applying the result from (c) one individual after another, show that if
min[u1 , . . . , uN ] = α, then W(α, α, . . . , α) ≥ W(u1 , . . . , uN ).
(e) Using (a) and (d) and the fact that W is strictly increasing, show first that
W(u1 , . . . , uN ) = W(ũ1 , . . . , ũN ) if and only if min(u1 , . . . , uN ) = min(ũ1 , . . . , ũN ) and then
that W(u1 , . . . , uN ) ≥ W(ũ1 , . . . , ũN ) if and only if min(u1 , . . . , uN ) ≥ min(ũ1 , . . . , ũN ).
6.9 There are three individuals in society, {1, 2, 3}, three social states, {x, y, z}, and the domain of pref-
erences is unrestricted. Suppose that the social preference relation, R, is given by pairwise majority
voting (where voters break any indifferences by voting for x first then y then z) if this results in a
transitive social order. If this does not result in a transitive social order the social order is xPyPz. Let
f denote the social welfare function that this defines.
(a) Consider the following profiles, where Pi is individual i’s strict preference relation:
(b) Use your findings in part (a) to give an alternative proof of the First Welfare Theorem 5.7.
6.12 The Borda rule is commonly used for making collective choices. Let there be N individuals and
suppose X contains a finite number of alternatives. Individual i assigns a Borda count, Bi (x), to
every alternative x, where Bi (x) is the number of alternatives in X to which x is preferred by agent i.
Alternatives are then ranked according to their total Borda count as follows:
N
N
xRy ⇐⇒ Bi (x) ≥ Bi (y).
i=1 i=1
(a) Show that the Borda rule satisfies U, WP, and D in Assumption 6.1.
(b) Show that it does not satisfy IIA.
6.13 Individual i is said to be decisive in the social choice between x and y if xPi y implies xPy, regardless
of others’ preferences. Sen (1970b) interprets ‘liberal values’ to imply that there are certain social
choices over which each individual should be decisive. For example, in the social choice between
individual i’s reading or not reading a certain book, the preference of individual i should determine
the social preference. Thus, we can view liberalism as a condition on the social welfare relation
requiring that every individual be decisive over at least one pair of alternatives. Sen weakens this
requirement further, defining a condition he calls minimal liberalism as follows:
L∗ : there are at least two people k and j and two pairs of distinct alternatives (x, y) and (z, w)
such that k and j are decisive over (x, y) and (z, w), respectively.
Prove that there exists no social welfare relation that satisfies (merely) the conditions U, WP, and L∗ .
6.14 Atkinson (1970) proposes an index of equality in the distribution of income based on the notion
of ‘equally distributed equivalent income’, denoted ye . For any strictly increasing, symmetric, and
quasiconcave social welfare function over income vectors, W(y1 , . . . , yN ), income ye is defined as
that amount of income which, if distributed to each individual, would produce the same level of
social welfare as the given distribution. Thus, letting e ≡ (1, . . . , 1) and y ≡ (y1 , . . . , yN ), we have
W(ye e) ≡ W(y).
Letting μ be the mean of the income distribution y, an index of equality in the distribution of income
then can be defined as follows:
ye
I(y) ≡ .
μ
(b) Show that the index I(y) is always ‘normatively significant’ in the sense that for any two income
distributions, y1 , y2 with the same mean, I(y1 ) is greater than, equal to, or less than I(y2 ) if and
only if W(y1 ) is greater than, equal to, or less than W(y2 ), respectively.
6.15 Blackorby and Donaldson (1978) built upon the work of Atkinson described in the preceding exer-
cise. Let W(y) be any strictly increasing, symmetric, and quasiconcave social welfare function
defined over income distributions. The authors define a ‘homogeneous implicit representation of
W’ as follows:
where w ∈ R is any ‘reference level’ of the underlying social welfare function. They then define
their index of equality in the distribution of income as follows:
F(w, y)
E(w, y) ≡ ,
F(w, μe)
denoting ‘perfect equality’, regardless of the distribution of income. What do you conclude from
this?
(e) Derive the index E(w, y) when the social welfare function is the CES form
N 1/ρ
W(y) = (yi )ρ , 0 = ρ < 1.
i=1
6.16 Let x ≡ (x1 , . . . , xN ) be an allocation of goods to agents, and let the economy’s feasible
set of
allocations be T. Suppose x∗ maximises the utilitarian social welfare function, W = N i=1 u (x ),
i i
subject to x ∈ T.
(a) Let ψ i fori = 1, . . . , N be an arbitrary set of increasing functions of one variable. Does x∗
maximise N i=1 ψ (u (x )) over x ∈ T? Why or why not?
i i i
(b) If in part (a), ψ i = ψ for all i, what would your answer be?
(c) If ψ i ≡ ai + bi ui (xi ) for arbitrary ai and bi > 0, what would your answer be?
(d) If ψ i ≡ ai + bui (xi ) for arbitrary ai and b > 0, what would your answer be?
(e) How do you account for any similarities and differences in your answers to parts (a) through (d)?
SOCIAL CHOICE AND WELFARE 301
6.17 From the preceding exercise, let x∗ maximise the Rawlsian social welfare function, W =
min[u1 (x1 ), . . . , uN (xN )] over x ∈ T.
(a) If ψ i for i = 1, . . . , N is an arbitrary set of increasing functions of one variable, must x∗
maximise the function, min [ψ 1 (u1 (x1 )), . . . , ψ N (uN (xN ))], over x ∈ T? Why or why not?
(b) If in part (a), ψ i = ψ for all i, what would your answer be?
(c) How do you account for your answers to parts (a) and (b)?
(d) How do you account for any differences or similarities in your answers to this exercise and the
preceding one?
6.18 Suppose that c(·) is a monotonic social choice function and that c(R) = x, where R1 , . . . , RN are
each strict rankings of the social states in X.
(a) Suppose that for some individual i, Ri ranks y just below x, and let R̃i be identical to Ri except that
y is ranked just above x – i.e., the ranking of x and y is reversed. Prove that either c(R̃i , R−i ) = x
or c(R̃i , R−i ) = y.
(b) Suppose that R̃1 , . . . , R̃N are strict rankings such that for every individual i, the ranking of x
versus any other social state is the same under R̃i as it is under Ri . Prove that c(R̃) = x.
6.19 Let c(·) be a monotonic social choice function and suppose that the social choice must be x whenever
all individual rankings are strict and x is at the top of individual n’s ranking. Show the social choice
must be at least as good as x for individual n when the individual rankings are not necessarily strict
and x is at least as good for individual n as any other social state.
6.20 Let x and y be distinct social states. Suppose that the social choice is at least as good as x for
individual i whenever x is at least as good as every other social state for i. Suppose also that the
social choice is at least as good as y for individual j whenever y is at least as good as every other
social state for j. Prove that i = j.
6.21 Call a social choice function strongly monotonic if c(R) = x implies c(R̃) = x whenever for every
individual i and every y ∈ X, xRi y
⇒ xR̃i y.
Suppose there are two individuals, 1 and 2, and three social states, x, y, and z. Define the
social choice function c(·) to choose individual 1’s top-ranked social state unless it is not unique, in
which case the social choice is individual 2’s top-ranked social state among those that are top-ranked
for individual 1, unless this too is not unique, in which case, among those that are top-ranked for
both individuals, choose x if it is among them, otherwise choose y.
(a) Prove that c(·) is strategy-proof.
(b) Show by example that c(·) is not strongly monotonic. (Hence, strategy-proofness does not imply
strong monotonicity, even though it implies monotonicity.)
6.22 Show that if c(·) is a monotonic social choice function and the finite set of social states is X, then for
every x ∈ X there is a profile, R, of strict rankings such that c(R) = x. (Recall that, by definition,
every x in X is chosen by c(·) at some preference profile.)
6.23 Show that when there are just two alternatives and an odd number of individuals, the majority rule
social choice function (i.e., that which chooses the outcome that is the top ranked choice for the
majority of individuals) is Pareto efficient, strategy-proof and non-dictatorial.
PART III
STRATEGIC
BEHAVIOUR
CHAPTER 7
GAME THEORY
When a consumer goes shopping for a new car, how will he bargain with the salesperson?
If two countries negotiate a trade deal, what will be the outcome? What strategies will be
followed by a number of oil companies each bidding on an offshore oil tract in a sealed-bid
auction?
In situations such as these, the actions any one agent may take will have conse-
quences for others. Because of this, agents have reason to act strategically. Game theory
is the systematic study of how rational agents behave in strategic situations, or in games,
where each agent must first know the decision of the other agents before knowing which
decision is best for himself. This circularity is the hallmark of the theory of games, and
deciding how rational agents behave in such settings will be the focus of this chapter.
The chapter begins with a close look at strategic form games and proceeds to con-
sider extensive form games in some detail. The former are games in which the agents
make a single, simultaneous choice, whereas the latter are games in which players may
make choices in sequence.
Along the way, we will encounter a variety of methods for determining the out-
come of a game. You will see that each method we encounter gives rise to a particular
solution concept. The solution concepts we will study include those based on dominance
arguments, Nash equilibrium, Bayesian-Nash equilibrium, backward induction, subgame
perfection, and sequential equilibrium. Each of these solution concepts is more sophisti-
cated than its predecessors, and knowing when to apply one solution rather than another is
an important part of being a good applied economist.
of the two firms. Each firm understands well that its optimal action depends on the action
taken by the other firm.
To further illustrate the significance of strategic decision making consider the classic
duel between a batter and a pitcher in baseball. To keep things simple, let us assume that
the pitcher has only two possible pitches – a fastball and a curve. Also, suppose it is well
known that this pitcher has the best fastball in the league, but his curve is only average.
Based on this, it might seem best for the pitcher to always throw his fastball. However,
such a non-strategic decision on the pitcher’s part fails to take into account the batter’s
decision. For if the batter expects the pitcher to throw a fastball, then, being prepared for
it, he will hit it. Consequently, it would be wise for the pitcher to take into account the
batter’s decision about the pitcher’s pitch before deciding which pitch to throw.
To push the analysis a little further, let us assign some utility numbers to the various
outcomes. For simplicity, we suppose that the situation is an all or nothing one for both
players. Think of it as being the bottom of the ninth inning, with a full count, bases loaded,
two outs, and the pitcher’s team ahead by one run. Assume also that the batter either hits
a home run (and wins the game) or strikes out (and loses the game). Consequently, there
is exactly one pitch remaining in the game. Finally, suppose each player derives utility 1
from a win and utility −1 from a loss. We may then represent this situation by the matrix
diagram in Fig. 7.1.
In this diagram, the pitcher (P) chooses the row, F (fastball) or C (curve), and the
batter (B) chooses the column. The batter hits a home run when he prepares for the pitch
that the pitcher has chosen, and strikes out otherwise. The entries in the matrix denote
the players’ payoffs as a result of their decisions, with the pitcher’s payoff being the first
number of each entry and the batter’s the second. Thus, the entry (1, −1) in the first row
and second column indicates that if the pitcher throws a fastball and the batter prepares for
a curve, the pitcher’s payoff is 1 and the batter’s is −1. The other entries are read in the
same way.
Although we have so far concentrated on the pitcher’s decision, the batter is obvi-
ously in a completely symmetric position. Just as the pitcher must decide on which pitch
to throw, the batter must decide on which pitch to prepare for. What can be said about
their behaviour in such a setting? Even though you might be able to provide the answer for
yourself already, we will not analyse this game fully just yet.
However, we can immediately draw a rather important conclusion based solely on
the ideas that each player seeks to maximise his payoff, and that each reasons strategically.
Batter
F C
F −1, 1 1, −1
Pitcher
C 1, −1 −1, 1
Here, each player must behave in a manner that is ‘unpredictable’. Why? Because if the
pitcher’s behaviour were predictable in that, say, he always throws his fastball, then the
batter, by choosing F, would be guaranteed to hit a home run and win the game. But this
would mean that the batter’s behaviour is predictable as well; he always prepares for a
fastball. Consequently, because the pitcher behaves strategically, he will optimally choose
to throw his curve, thereby striking the batter out and winning the game. But this con-
tradicts our original supposition that the pitcher always throws his fastball! We conclude
that the pitcher cannot be correctly predicted to always throw a fastball. Similarly, it must
be incorrect to predict that the pitcher always throws a curve. Thus, whatever behaviour
does eventually arise out of this scenario, it must involve a certain lack of predictability
regarding the pitch to be thrown. And for precisely the same reasons, it must also involve
a lack of predictability regarding the batter’s choice of which pitch to prepare for.
Thus, when rational individuals make decisions strategically, each taking into
account the decision the other makes, they sometimes behave in an ‘unpredictable’ man-
ner. Any good poker player understands this well – it is an essential aspect of successful
bluffing. Note, however, that there is no such advantage in non-strategic settings – when
you are alone, there is no one to ‘fool’. This is but one example of how outcomes among
strategic decision makers may differ quite significantly from those among non-strategic
decision makers. Now that we have a taste for strategic decision making, we are ready to
develop a little theory.
Note that this definition is general enough to cover our batter–pitcher duel. The
strategic form game describing that situation, when the pitcher is designated player 1,
is given by
S1 = S2 = {F, C},
u1 (F, F) = u1 (C, C) = −1,
u1 (F, C) = u1 (C, F) = 1, and
u2 (s1 , s2 ) = −u1 (s1 , s2 ) for all (s1 , s2 ) ∈ S1 × S2 .
Note that two-player strategic form games with finite strategy sets can always be
represented in matrix form, with the rows indexing the strategies of player 1, the columns
indexing the strategies of player 2, and the entries denoting their payoffs.
L R
U 3, 0 0, −4
D 2, 4 −1, 8
L M R
U 3, 0 0, −5 0, −4
C 1, −1 3, 3 −2, 4
D 2, 4 4, 1 −1, 8
In the game of Fig. 7.2, we noted that U was strictly dominant for player 1. We
were therefore able to eliminate D from consideration. Once done, we were then able to
conclude that player 2 would choose L, or what amounts to the same thing, we were able to
eliminate R. Note that although R is not strictly dominated in the original game, it is strictly
dominated (by L) in the reduced game in which 1’s strategy D is eliminated. This left the
unique solution (U, L). In the game of Fig. 7.3, we first eliminated C for 1 and M for 2
(each being strictly dominated); then (following the Fig. 7.2 analysis) eliminated D for 1;
then eliminated R for 2. This again left the unique strategy pair (U, L). Again, note that D is
not strictly dominated in the original game, yet it is strictly dominated in the reduced game
in which C has been eliminated. Similarly, R becomes strictly dominated only after both C
and D have been eliminated. We now formalise this procedure of iteratively eliminating
strictly dominated strategies.
Let Si0 = Si for each player i, and for n ≥ 1, let Sin denote those strategies of player
i surviving after the nth round of elimination. That is, si ∈ Sin if si ∈ Sin−1 is not strictly
dominated in Sn−1 .
L R
U 1, 1 0, 0
D 0, 0 0, 0
With this in mind, let Wi0 = Si for each player i, and for n ≥ 1, let Win denote those
strategies of player i surviving after the nth round of elimination of weakly dominated
strategies. That is, si ∈ Win if si ∈ Win−1 is not weakly dominated in W n−1 = W1n−1 × · · ·
× WNn−1 .
1 Depending on the number of players, other numbers may be weakly dominated as well. This is explored in the
exercises.
312 CHAPTER 7
such a situation, there is no tendency or necessity for anyone’s behaviour to change. These
regularities in behaviour form the basis for making predictions.
With a view towards making predictions, we wish to describe potential regularities in
behaviour that might arise in a strategic setting. At the same time, we wish to incorporate
the idea that the players are ‘rational’, both in the sense that they act in their own self-
interest and that they are fully aware of the regularities in the behaviour of others. In the
strategic setting, just as in the demand–supply setting, regularities in behaviour that can be
‘rationally’ sustained will be called equilibria. In Chapter 4, we have already encountered
the notion of a Nash equilibrium in the strategic context of Cournot duopoly. This concept
generalises to arbitrary strategic form games. Indeed, Nash equilibrium, introduced in
Nash (1951), is the single most important equilibrium concept in all of game theory.
Informally, a joint strategy ŝ ∈ S constitutes a Nash equilibrium as long as each
individual, while fully aware of the others’ behaviour, has no incentive to change his own.
Thus, a Nash equilibrium describes behaviour that can be rationally sustained. Formally,
the concept is defined as follows.
F C
F −1, 1 1, −1
C 1, −1 −1, 1
the batter. When (F, F) is played, the batter receives a payoff of 1. By switching to C, the
joint strategy becomes (F, C) (remember, we must hold the pitcher’s strategy fixed at F),
and the batter receives −1. Consequently, the batter cannot improve his payoff by switch-
ing. What about the pitcher? At (F, F), the pitcher receives a payoff of −1. By switching to
C, the joint strategy becomes (C, F) and the pitcher receives 1, an improvement. Thus, the
pitcher can improve his payoff by unilaterally switching his strategy, and so (F, F) is not a
pure strategy Nash equilibrium. A similar argument applies to the other three possibilities.
Of course, this was to be expected in the light of our heuristic analysis of the batter–
pitcher duel at the beginning of this chapter. There we concluded that both the batter and
the pitcher must behave in an unpredictable manner. But embodied in the definition of a
pure strategy Nash equilibrium is that each player knows precisely which strategy each
of the other players will choose. That is, in a pure strategy Nash equilibrium, everyone’s
choices are perfectly predictable. The batter–pitcher duel continues to escape analysis. But
we are fast closing in on it.
expected utility by randomising between F and C with equal probabilities. In short, the
players’ randomised choices form an equilibrium: each is aware of the (randomised) man-
ner in which the other makes his choice, and neither can improve his expected payoff by
unilaterally changing the manner in which his choice is made.
To apply these ideas to general strategic form games, we first formally introduce the
notion of a mixed strategy.
Thus, a mixed strategy is the means by which players randomise their choices. One
way to think of a mixed strategy is simply as a roulette wheel with the names of various
pure strategies printed on sections of the wheel. Different roulette wheels might have larger
sections assigned to one pure strategy or another, yielding different probabilities that those
strategies will be chosen. The set of mixed strategies is then the set of all such roulette
wheels.
Each player i is now allowed to choose from the set of mixed strategies Mi rather than
Si . Note that this gives each player i strictly more choices than before, because every pure
strategy s̄i ∈ Si is represented in Mi by the (degenerate) probability distribution assigning
probability one to s̄i .
Let M = ×N i=1 Mi denote the set of joint mixed strategies. From now on, we shall
drop the word ‘mixed’ and simply call m ∈ M a joint strategy and mi ∈ Mi a strategy for
player i.
If ui is a von Neumann-Morgenstern utility function on S, and the strategy m ∈ M is
played, then player i’s expected utility is
ui (m) ≡ m1 (s1 ) · · · mN (sN )ui (s).
s∈S
This formula follows from the fact that the players choose their strategies independently.
Consequently, the probability that the pure strategy s = (s1 , . . . , sN ) ∈ S is chosen
is the product of the probabilities that each separate component is chosen, namely
m1 (s1 ) · · · mN (sN ). We now give the central equilibrium concept for strategic form games.
Thus, in a Nash equilibrium, each player may be randomising his choices, and no
player can improve his expected payoff by unilaterally randomising any differently.
It might appear that checking for a Nash equilibrium requires checking, for every
player i, each strategy in the infinite set Mi against m̂i . The following result simplifies this
task by taking advantage of the linearity of ui in mi .
According to the theorem, statements (b) and (c) offer alternative methods for check-
ing for a Nash equilibrium. Statement (b) is most useful for computing Nash equilibria. It
says that a player must be indifferent between all pure strategies given positive weight by
his mixed strategy and that each of these must be no worse than any of his pure strategies
given zero weight. Statement (c) says that it is enough to check for each player that no pure
strategy yields a higher expected payoff than his mixed strategy in order that the vector of
mixed strategies forms a Nash equilibrium.
Proof: We begin by showing that statement (a) implies (b). Suppose first that m̂ is a Nash
equilibrium. Consequently, ui (m̂) ≥ ui (mi , m̂−i ) for all mi ∈ Mi . In particular, for every
si ∈ Si , we may choose mi to be the strategy giving probability one to si , so that ui (m̂) ≥
ui (si , m̂−i ) holds in fact for every si ∈ Si . It remains to show that ui (m̂) = ui (si , m̂−i )
for every si ∈ Si given positive weight by m̂i . Now, if any of these numbers differed
from ui (m̂), then at least one would be strictly larger because ui (m̂) is a strict convex
combination of them. But this would contradict the inequality just established.
Because it is obvious that statement (b) implies (c), it remains only to establish that
(c) implies (a). So, suppose that ui (m̂) ≥ ui (si , m̂−i ) for every si ∈ Si and every player i.
Fix a player i and mi ∈ Mi . Because the number ui (mi , m̂−i ) is a convex combination of
the numbers {ui (si , m̂−i )}si ∈Si , we have ui (m̂) ≥ ui (mi , m̂−i ). Because both the player and
the chosen strategy were arbitrary, m̂ is a Nash equilibrium of G.
EXAMPLE 7.1 Let us consider an example to see these ideas at work. You and a colleague
are asked to put together a report that must be ready in an hour. You agree to split the work
into halves. To your mutual dismay, you each discover that the word processor you use is
not compatible with the one the other uses. To put the report together in a presentable fash-
ion, one of you must switch to the other’s word processor. Of course, because it is costly
to become familiar with a new word processor, each of you would rather that the other
switched. On the other hand, each of you prefers to switch to the other’s word processor
rather than fail to coordinate at all. Finally, suppose there is no time for the two of you to
316 CHAPTER 7
WP MW
WP 2, 1 0, 0
MW 0, 0 1, 2
waste discussing the coordination issue. Each must decide which word processor to use in
the privacy of his own office.
This situation is represented by the game of Fig. 7.6. Player 1’s word processor is
WP, and player 2’s is MW. They each derive a payoff of zero by failing to coordinate, a
payoff of 2 by coordinating on their own word processor, and a payoff of 1 by coordinating
on the other’s word processor. This game possesses two pure strategy Nash equilibria,
namely, (WP, WP) and (MW, MW).
Are there any Nash equilibria in mixed strategies? If so, then it is easy to see from
Fig. 7.6 that both players must choose each of their pure strategies with strictly positive
probability. Let then p > 0 denote the probability that player 1 chooses his colleague’s
word processor, MW, and let q > 0 denote the probability that player 2 chooses his col-
league’s word processor WP. By part (b) of Theorem 7.1, each player must be indifferent
between each of his pure strategies. For player 1, this means that
Solving these yields p = q = 1/3. Thus, the (mixed) strategy in which each player chooses
his colleague’s word processor with probability 1/3 and his own with probability 2/3 is a
third Nash equilibrium of this game. There are no others.
strategy that player 1 will choose. So, for example, in our game of Fig. 7.6, player 1’s
equilibrium strategy placing probability 1/3 on MW and 2/3 on WP can be interpreted to
reflect player 2’s uncertainty regarding the pure strategy that player 1 will choose. Player 2
believes that player 1 will choose MW with probability 1/3 and WP with probability 2/3.
Similarly, player 2’s equilibrium mixed strategy here need not reflect the idea that player
2 deliberately randomises between WP and MW, rather it can be interpreted as player 1’s
beliefs about the probability that player 2 will choose one pure strategy or the other.
Thus, we now have two possible interpretations of mixed strategies at our disposal.
On the one hand, they may constitute actual physical devices (roulette wheels) that players
use to deliberately randomise their pure strategy choices. On the other hand, a player’s
mixed strategy may merely represent the beliefs that the others hold about the pure strat-
egy that he might choose. In this latter interpretation, no player is explicitly randomising
his choice of pure strategy. Whether we choose to employ one interpretation or the other
depends largely on the context. Typically, the roulette wheel interpretation makes sense
in games like the batter–pitcher duel in which the interests of the players are opposing,
whereas the beliefs-based interpretation is better suited for games like the one of Fig. 7.6,
in which the players’ interests, to some extent, coincide.
Does every game possess at least one Nash equilibrium? Recall that in the case of
pure strategy Nash equilibrium, the answer is no (the batter–pitcher duel). However, once
mixed strategies are introduced, the answer is yes quite generally.
Step 2: Because the numerator defining fij is continuous in m, and the denominator
is both continuous in m and bounded away from zero (indeed, it is never less than one), fij
is a continuous function of m for every i and j. Consequently, f is a continuous function
mapping the non-empty, compact, and convex set M into itself. We therefore may apply
Brouwer’s fixed-point theorem (Theorem A1.11) to conclude that f has a fixed point, m̂.
Step 3: Because f (m̂) = m̂, we have fij (m̂) = m̂ij for all players i and pure strate-
gies j. Consequently, by the definition of fij ,
m̂ij + max(0, ui (j, m̂−i ) − ui (m̂))
m̂ij =
n
1+ max(0, ui (j , m̂−i ) − ui (m̂))
j =1
or
n
m̂ij max(0, ui (j , m̂−i ) − ui (m̂)) = max(0, ui (j, m̂−i ) − ui (m̂)).
j =1
Multiplying both sides of this equation by ui (j, m̂−i ) − ui (m̂) and summing over j
gives:
n
n
m̂ij [ui (j, m̂−i ) − ui (m̂)] max(0, ui (j , m̂−i ) − ui (m̂))
j=1 j =1
(P.1)
n
= [ui (j, m̂−i ) − ui (m̂)] max(0, ui (j, m̂−i ) − ui (m̂)).
j=1
Now, a close look at the left-hand side reveals that it is zero, because
n
n
m̂ij [ui (j, m̂−i ) − ui (m̂)] = m̂ij ui (j, m̂−i ) − ui (m̂)
j=1 j=1
= ui (m̂) − ui (m̂)
= 0,
where the first equality follows because the mij ’s sum to one over j. Consequently, (P.1)
may be rewritten
n
0= [ui (j, m̂−i ) − ui (m̂)] max(0, ui (j, m̂−i ) − ui (m̂)).
j=1
GAME THEORY 319
But the sum on the right-hand side can be zero only if ui (j, m̂−i ) − ui (m̂) ≤ 0 for every
j. (If ui (j, m̂−i ) − ui (m̂) > 0 for some j, then the jth term in the sum is strictly positive.
Because no term in the sum is negative, this would render the entire sum strictly positive.)
Hence, by part (c) of Theorem 7.1, m̂ is a Nash equilibrium.
Theorem 7.2 is quite remarkable. It says that no matter how many players are
involved, as long as each possesses finitely many pure strategies there will be at least
one Nash equilibrium. From a practical point of view, this means that the search for a
Nash equilibrium will not be futile. More importantly, however, the theorem establishes
that the notion of a Nash equilibrium is coherent in a deep way. If Nash equilibria rarely
existed, this would indicate a fundamental inconsistency within the definition. That Nash
equilibria always exist in finite games is one measure of the soundness of the idea.
strategies. Therefore, ui (s, t) is player i’s von Neumann-Morgenstern utility when the joint
pure strategy is s and the joint type-vector is t. Allowing player i’s payoff to depend on
another player’s type allows us to analyse situations where information possessed by one
player affects the payoff of another. For example, in the auctioning of offshore oil tracts,
a bidder’s payoff as well as his optimal bid will depend upon the likelihood that the tract
contains oil, something about which other bidders may have information.
Finally, we introduce the extra ingredient that allows us to use the solutions we
have developed in previous sections. The extra ingredient is a specification, for each
player i and each of his types ti , of the beliefs he holds about the types that the others
might be. Formally, for each player i and each type ti ∈ Ti , let pi (t−i |ti ) denote the prob-
ability player i assigns to the event that the others’ types are t−i ∈ T−i when his type
is ti . Being a probability, we require each pi (t−i |ti ) to be in [0, 1], and we also require
t−i ∈T−i pi (t−i |ti ) = 1.
It is often useful to specify the players’ beliefs so that they are in some sense consis-
tent with one another. For example, one may wish to insist that two players would agree
about which types of a third player have positive probability. A standard way to achieve
this sort of consistency and more is to suppose that the players’ beliefs are generated from
a single probability distribution
p over the joint type space T. Specifically, suppose that for
each t ∈ T, p(t) > 0 and t∈T p(t) = 1. If we think of the players’ joint type-vector t ∈ T
as being chosen by Nature according to p, then according to Bayes’ rule (see also section
7.3.7.), player i’s beliefs about the others’ types when his type is ti can be computed from
p as follows:
pi (t−i |ti ) = p(ti , t−i ) p(ti , t−i ).
∈T
t−i −i
If all the pi can all be computed from p according to this formula, we say that p is a
common prior.
The assumption that there is a common prior can be understood in at least two ways.
The first is that p is simply an objective empirical distribution over the players’ types, one
that has been borne out through many past observations. The second is that the common
prior assumption reflects the idea that differences in beliefs arise only from differences
in information. Consequently, before the players are aware of their own types – and are
therefore in an informationally symmetric position – each player’s beliefs about the vector
of player types must be identical, and equal to p.
Our ability to analyse a situation with incomplete information will not require the
common prior assumption. We therefore shall not insist that the players’ beliefs, the pi ,
be generated from a common prior. Thus, we permit situations in which, for example,
some type of player 1 assigns probability zero to a type of player 3 that is always assigned
positive probability by player 2 regardless of his type. (Exercise 7.20 asks you to show that
this situation is impossible with a common prior.)
Before we describe how to analyse a situation with incomplete information, we place
all of these elements together.
GAME THEORY 321
EXAMPLE 7.2 Two firms are engaged in Bertrand price competition as in Chapter 4,
except that one of them is uncertain about the other’s constant marginal cost. Firm 1’s
marginal cost of production is known, and firm 2’s is either high or low, with each pos-
sibility being equally likely. There are no fixed costs. Thus, firm 1 has but one type, and
firm 2 has two types – high cost and low cost. The two firms each have the same strategy
set, namely the set of non-negative prices. Firm 2’s payoff depends on his type, but firm
1’s payoff is independent of firm 2’s type; it depends only on the chosen prices.
To derive from this game of incomplete information a strategic form game, imagine
that there are actually three firms rather than two, namely, firm 1, firm 2 with high cost, and
firm 2 with low cost. Imagine also that each of the three firms must simultaneously choose
a price and that firm 1 believes that each of the firm 2’s is equally likely to be his only
competitor. Some thought will convince you that this way of looking at things beautifully
captures all the relevant strategic features of the original situation. In particular, firm 1 must
choose its price without knowing whether its competitor has high or low costs. Moreover,
firm 1 understands that the competitor’s price may differ according to its costs.
2 We assume here that the type sets T , . . . , T are mutually disjoint. This is without loss of generality since the
1 n
type sets, being finite, can always be defined to be subsets of integers and we can always choose these integers
so that ti < tj if i < j. Hence, there is no ambiguity in identifying a player in G∗ by his type alone.
322 CHAPTER 7
Let si (ti ) ∈ Si denote the pure strategy chosen by player ti ∈ Ti . Given a joint
pure strategy s∗ = (s1 (t1 ), . . . , sN (tN ))t1 ∈T1 ,...,tN ∈TN ∈ S∗ , the payoff to player ti is defined
to be,
vti (s∗ ) = pi (t−i |ti )ui (s1 (t1 ), . . . , sN (tN ), t1 , . . . , tN ).
t−i ∈T−i
Having defined finite sets of players, their finite pure strategy sets, and their payoffs
for any joint pure strategy, this completes the definition of the strategic form game G∗ .3
captures the idea that player i is uncertain of the other players’ types – i.e., he uses pi (t−i |ti )
to assess their probability – and also captures the idea that the other players’ behaviour may
depend upon their types – i.e., for each j, the choice sj (tj ) ∈ Sj depends upon tj .
By associating with each game of incomplete information G the well-chosen strate-
gic form game, G∗ , we have reduced the study of games of incomplete information to the
study of games with complete information, that is, to the study of strategic form games.
Consequently, we may apply any of the solutions that we have developed to G∗ . It is par-
ticularly useful to consider the set of Nash equilibria of G∗ and so we give this a separate
definition.
With the tools we have developed up to now, it is straightforward to deal with the
question of existence of Bayesian-Nash equilibrium.
Proof: By Definition 7.12, it suffices to show that the associated strategic form game pos-
sesses a Nash equilibrium. Because the strategic form game associated with a finite game
of incomplete information is itself finite, we may apply Theorem 7.2 to conclude that the
associated strategic form game possesses a Nash equilibrium.
EXAMPLE 7.3 To see these ideas at work, let us consider in more detail the two firms
discussed in Example 7.2. Suppose that firm 1’s marginal cost of production is zero. Also,
suppose firm 1 believes that firm 2’s marginal cost is either 1 or 4, and that each of these
‘types’ of firm 2 occur with probability 1/2. If the lowest price charged is p, then market
demand is 8 − p. To keep things simple, suppose that each firm can choose only one of
three prices, 1, 4, or 6. The payoffs to the firms are described in Fig. 7.7. Firm 1’s payoff
is always the first number in any pair, and firm 2’s payoff when his costs are low (high) are
given by the second number in the entries of the matrix on the left (right).
In keeping with the Bertrand-competition nature of the problem, we have instituted
the following convention in determining payoffs when the firms choose the same price. If
both firms’ costs are strictly less than the common price, then the market is split evenly
between them. Otherwise, firm 1 captures the entire market at the common price. The latter
uneven split reflects the idea that if the common price is above only firm 1’s cost, firm 1
could capture the entire market by lowering his price slightly (which, if we let him, he
could do and still more than cover his costs), whereas firm 2 would not lower his price
(even if we let him) because this would result in losses.
We have now described the game of incomplete information. The associated strategic
form game is one in which there are three players: firm 1, firm 2l (low cost), and firm 2h
(high cost). Each has the same pure strategy set, namely, the set of prices {1, 4, 6}. Let
p1 , pl , ph denote the price chosen by firms 1, 2l, and 2h, respectively.
Fig. 7.8 depicts this strategic form game. As there are three players, firm 1’s choice
of price determines the matrix, and firms 2l and 2h’s prices determine the row and col-
umn, respectively, of the chosen matrix. For example, according to Fig. 7.8, if firm 1
pl = 6 pl = 4 pl = 1 ph = 6 ph = 4 ph = 1
p1 = 6 6, 5 0, 12 0, 0 p1 = 6 6, 2 0, 0 0, −21
p1 = 4 16, 0 8, 6 0, 0 p1 = 4 16, 0 16, 0 0, −21
p1 = 1 7, 0 7, 0 7, 0 p1 = 1 7, 0 7, 0 7, 0
Firm 1 chooses p1 = 1
ph = 6 ph = 4 ph = 1
pl = 6 7, 0, 0 7, 0, 0 7, 0, 0
pl = 4 7, 0, 0 7, 0, 0 7, 0, 0
pl = 1 7, 0, 0 7, 0, 0 7, 0, 0
4 This assumes that it is impossible for the used-car salesperson to prove that the car has been repaired. In prac-
tice, this is not so far from the truth. Are higher prices a signal that the car was repaired? If so, how might an
unscrupulous seller behave? For now, we wish only to observe that in this rather commonplace economic setting,
the players move in sequence, yet the second mover (the buyer) is only partially informed of the choices made
by the first mover.
326 CHAPTER 7
previous moves of the others when it is his turn to play; and (iv) the payoffs to the players.
Formally, these elements are contained in the following definition.5
A node, or history, is then simply a complete description of the actions that have
been taken so far in the game.
We shall use the terms history and node interchangeably. For future reference, let
A(x) ≡ {a ∈ A | (x, a) ∈ X}
denote the set of actions available to the player whose turn it is to move after the
history x ∈ X\{x0 }.
4. A set of actions, A(x0 ) ⊆ A, and a probability distribution, π, on A(x0 ) to describe
the role of chance in the game. Chance always moves first, and just once, by
randomly selecting an action from A(x0 ) using the probability distribution π.
Thus, (a1 , a2 , . . . , ak ) ∈ X\{x0 } implies that ai ∈ A(x0 ) for i = 1 and only i = 1.6
5. A set of end nodes, E ≡ {x ∈ X | (x, a) ∈ / X for all a ∈ A}. Each end node
describes one particular complete play of the game from beginning to end.
5 The convention to employ sequences of actions to define histories is taken from Osborne and Rubinstein (1994).
A classic treatment can be found in von Neumann and Morgenstern (1944).
6 Allowing chance but one move at the start of the game might appear to be restrictive. It is not. Consider, for
example, the board game Monopoly. Suppose that in a typical 2-hour game, the dice are rolled no more than once
every 5 seconds. Thus, a conservative upper bound on the number of rolls of the dice is 2000. We could then
equally well play Monopoly by having a referee roll dice and secretly choose 2000 numbers between 1 and 12 at
the start of the game and then simply reveal these numbers one at a time as needed. In this way, it is without loss
of generality that chance can be assumed to move exactly once at the beginning of the game.
GAME THEORY 327
Admittedly, this definition appears pretty complex, but read it over two or three
times. You will soon begin to appreciate how remarkably compact it is, especially when
you realise that virtually every parlour game ever played – not to mention a plethora of
applications in the social sciences – is covered by it! Nevertheless, a few examples will
help to crystallise these ideas.
EXAMPLE 7.4 Let us begin with the game of take-away described earlier. There are two
players, so N = {1, 2}. A player can remove up to three coins on a turn, so let r1 , r2 , and r3
denote the removal of one, two, or three coins, respectively. To formally model the fact that
chance plays no role in this game, let A(x0 ) ≡ {ā} (i.e., chance has but one move). Thus,
the set of actions is A = {ā, r1 , r2 , r3 }. A typical element of X\{x0 } then looks something
like x̄ = (ā, r1 , r2 , r1 , r3 , r3 ). This would indicate that up to this point in the game, the
numbers of coins removed alternately by the players were 1, 2, 1, 3, and 3, respectively.
Consequently, there are 11 coins remaining and it is player 2’s turn to move (because player
7 A partition of a set is a collection of disjoint non-empty subsets whose union is the original set. Thus, an element
of a partition is itself a set.
328 CHAPTER 7
1 removes the first coin). Thus, ι(x̄) = 2. In addition, because each player is fully informed
of all past moves, I (x) = {x} for every x ∈ X. Two examples of end nodes in take-
away are e1 = (ā, r1 , r2 , r1 , r3 , r3 , r3 , r3 , r3 , r2 ), and e2 = (ā, r3 , r3 , r3 , r3 , r3 , r3 , r2 , r1 ),
because each indicates that all 21 coins have been removed. The first indicates a win for
player 2 (because player 1 removed the last two coins), and the second indicates a win for
player 1. Thus, if a payoff of 1 is assigned to the winner, and −1 to the loser, we have
u1 (e1 ) = u2 (e2 ) = −1, and u1 (e2 ) = u2 (e1 ) = 1.
EXAMPLE 7.5 To take a second example, consider the buyer and seller of the used car.
To keep things simple, assume that the seller, when choosing a price, has only two choices:
high and low. Again there are two players, so N = {S, B}, where S denotes seller, and B,
buyer. The set of actions that might arise is A = {repair, don’t repair, price high, price low,
accept, reject}. Because chance plays no role here, rather than give it a single action, we
simply eliminate chance from the analysis. A node in this game is, for example, x =(repair,
price high). At this node x, it is the buyer’s turn to move, so that ι(x) = B. Because at this
node, the buyer is informed of the price chosen by the seller, but not of the seller’s repair
decision, I (x) = {(repair, price high), (don’t repair, price high)}. That is, when node x is
reached, the buyer is informed only that one of the two histories in I (x) has occurred; he
is not informed of which one, however.
a
1
r1
r2 r3
2
2 2
r1 r1
r3 r2 r1
r2
1
x 1 1
r1 r2 1 1 1
r1 ⫺1 r1 ⫺1 ⫺1
2 e
⫺1 ⫺1 ⫺1
r1 1 1 1
1
⫺1
S
x0
Repair Don’t repair
S S
Price Price
low low
B
Price
high Accept Reject Accept Reject
Price
high
information set is simply left alone.8 So, for example, the initial node, x0 , and the node
(x0 , repair) are each the sole elements of two distinct information sets. Each information
set is labelled with the single player whose turn it is to move whenever a node within that
information set is reached. In this game, only the buyer has information sets that are not
singletons.
Extensive form games in which every information set is a singleton, as in take-away,
are called games of perfect information. All other games, like the buyer–seller game, are
called games with imperfect information.
because he would be forced to remove the last coin. Thus, one coin (remaining on the
table) is a losing position. What about two coins? This is a winning position because the
player whose turn it is can remove one coin, thereby leaving one coin remaining, which
we already know to be a losing position for the other player. Similarly, both three and four
coins are winning positions because removing two and three coins, respectively, leaves the
opponent in the losing position of one coin. What about five coins? This must be a losing
position because removing one, two, or three coins places one’s opponent in the winning
positions four, three, or two, respectively. Continuing in this manner, from positions near-
est the end of the game to positions nearest the beginning, shows that positions 1, 5, 9, 13,
17, 21 are losing positions, and all others are winning positions.
Consequently, if two experts play take-away with 21 coins, the second player can
always guarantee a win, regardless of how the first one plays. To see this, consider the
following strategy for the second player that is suggested by our analysis of winning and
losing positions: ‘Whenever possible, always remove just enough coins so that the result-
ing position is one of the losing ones, namely, 1, 5, 9, 13, 17, 21; otherwise, remove one
coin’. We leave it to the reader to verify that if the second player has done so on each of his
previous turns, he can always render the position a losing one for his opponent. Because
his opponent begins in a losing position, this completes the argument.
Note well the technique employed to analyse this game. Rather than start at the
beginning of the game with all 21 coins on the table, we began the analysis at the end of
the game – with one coin remaining, then two, and so on. This technique lies at the heart of
numerous solution concepts for extensive form games. It is called backward induction.
We shall return to it a little later. But before getting too far ahead of ourselves, we pause
to formalise the idea of an extensive form game strategy.
L M
x 2
r r
l l
2
y
l r
example, Fig. 7.11 depicts the following pure strategy for player 2: choose l if I (x) is
reached, and choose r if I (y) is reached.
It is important to note that a pure strategy for a player is indeed a complete descrip-
tion of how to play the game as that player. For example, suppose you are playing the
black pieces in chess. A very small part of your pure strategy requires you to specify what
move to make after white’s first move. It is not enough to specify how you would react
to a single opening move of white – say, P–K4 – even if you are virtually certain that this
will be white’s first move. Specifying a pure strategy requires you to say how you would
react to every possible opening move of white. Indeed, you must specify how you would
react to every possible (legal) sequence of moves ending with a move by white. Only then
will you have specified a single pure strategy for black in the game of chess. The exercises
ask you to formulate pure strategies for the games we have considered so far. You will see
there that this alone can be a challenge.
and so on. We may continue this process until we inevitably (because the game is finite)
reach an end node, e, say, yielding payoff ui (e) for each player i ∈ N. Consequently, given
any joint pure strategy s ∈ ×i∈NSi , Nature’s probability distribution π on A(x0 ) determines
player i’s expected utility, which we will denote by ui (s).
Note that the tuple (Si , ui )i∈N is then a strategic form game. It is called the strategic
form of , and we will refer back to it a little later.9 For the moment, it suffices to note that
therefore we can apply all of our strategic form game solution concepts to finite extensive
form games. For example, a dominant strategy in the extensive form game is simply a
strategy that is dominant in the strategic form of ; a Nash equilibrium for the extensive
form game is simply a joint strategy that is a Nash equilibrium of the strategic form of
, and so on.
9 Note that we have transformed an arbitrary finite extensive form game (which may well reflect a very complex,
dynamic strategic situation) into a strategic form game. Thus, our earlier impression that strategic form games
were only useful for modelling situations in which there are no explicit dynamics was rather naive. Indeed, based
on our ability to construct the strategic form of any extensive form game, one might argue just the opposite; that
from a theoretical point of view, it suffices to consider only strategic form games, because all extensive form
games can be reduced to them! Whether or not the strategic form of an extensive form game is sufficient for
carrying out an analysis of it is a current topic of research among game theorists. We will not develop this theme
further here.
10 Note the similarity here with our investigation of the solution to take-away. Here as there, one cannot assess
the soundness of moves early in the game without first analysing how play will proceed later in the game.
334 CHAPTER 7
Stay Enter
out
Incumbent
0
2 Fight Acquiesce
⫺1 1
⫺1 1
Stay Enter
out
0 1
2 1
So let us simply assume that the entrant has entered. What is best for the incumbent
at this point in the game? Obviously, it is best for him to acquiesce because by doing so he
receives a payoff of 1 rather than −1. Consequently, from the entrant’s point of view, the
game reduces to that given in Fig. 7.13, where we have simply replaced the incumbent’s
decision node and what follows with the payoff vector that will result once his decision
node is reached. Clearly, the entrant will choose to enter because this yields a payoff of
1 rather than 0. Thus, once again we have arrived at a pair of strategies for the players
by solving the game backwards. The strategies are the entrant enters and the incumbent
acquiesces on entry.
Let us try this backward induction technique to solve the slightly more complex
game of perfect information depicted in Fig. 7.14. We begin by analysing decision nodes
preceding only end nodes. There are two such penultimate nodes and they are labelled x
and y. Both belong to player 1. At x, player 1 does best to choose R , and at y he does
best to choose L . Consequently, the game of Fig. 7.14 can be reduced to that of Fig. 7.15,
where the decision nodes x and y, and what follows, have been replaced by the payoffs
that are now seen to be inevitable once x and y are reached. We now repeat this process
on the reduced game. Here both w and z are penultimate decision nodes (this time belong-
ing to player 2). If w is reached, player 2 does best by choosing r, and if z is reached,
player 2 does best by choosing l . Using these results to reduce the game yet again results
in Fig. 7.16, where it is clear that player 1 will choose R. We conclude from this anal-
ysis that player 1 will choose the strategy (R, R , L ) and player 2 the strategy (r, l ).11
The outcome of employing these two strategies is that each player will receive a payoff
of zero.
11 The notation (R, R , L ) means that player 1 will choose R on his first move, R if decision node x is reached,
and L if decision node y is reached. Player 2’s strategy (r, l ) has a similar meaning.
GAME THEORY 335
L R
2 w z 2
l r l⬘ r⬘
1 x y 1
⫺1 0
L⬘ R⬘ 2 0 L⬙ R⬙
0 1 4 3
4 0 ⫺1 3
2 w z 2
l r l⬘ r⬘
1 ⫺1 0 4
0 2 0 ⫺1
L R
⫺1 0
2 0
It may seem a little odd that the solution to this game yields each player a payoff
of zero when it is possible for each to derive a payoff of 3 by playing ‘right’ whenever
possible. However, it would surely be a mistake for player 2 to play r if node z is reached
because player 1 will rationally choose L at y, not R , because the former gives player 1
a higher payoff. Thus, player 2, correctly anticipating this, does best to choose l , for this
yields player 2 a payoff zero, which surpasses the alternative of −1.12
12 One might argue that the players ought to enter into a binding agreement to ensure the payoff vector (3, 3).
However, by definition, the extensive form game includes all the possible actions that are available to the players.
Consequently, if it is possible for the players to enter into binding agreements, this ought to be included in the
extensive form game to begin with. Because in the game depicted these are not present, they are simply not
available.
336 CHAPTER 7
The preceding procedure can be used to obtain strategies in every game of perfect
information. Such strategies are called backward induction strategies. To prepare for the
definition, let us say that y strictly follows x if y = (x, a1 , . . . , ak ) for some a1 , . . . , ak ∈ A
and that y immediately follows x if k = 1. We say that y weakly follows x if y = x or y
strictly follows x.
Proof: Because a Nash equilibrium of is simply a Nash equilibrium of its strategic form
(Si , ui )i∈N , it suffices to show that ui (s) ≥ ui (si , s−i ) for every player i and every si ∈ Si .
So, suppose that this is not the case. Then ui (si , s−i ) > ui (s) for some i and si ∈ Si .
Consequently, there must be an action, a1 , taken by Nature, such that the end nodes e and
e induced respectively by s and s = (si , s−i ) given that action, satisfy ui (e ) > ui (e).
Therefore, the set of decision nodes x, where, were the game to begin there, player
i could do better by using a strategy different from si , is non-empty because x = a1 is a
member of this set. Let x̄ be a member of this set having no strict followers in the set.14
Thus, when the game begins at x̄ and the other players employ s−i thereafter, player
i’s payoff is strictly higher if he employs some strategy si rather than si . Furthermore,
because x̄ has no strict followers among the set from which it was chosen, (1) x̄ belongs to
player i, and (2) all actions dictated by si at nodes belonging to i strictly following x̄ cannot
be improved upon.
13 Thefiniteness of the game ensures that this process terminates.
14 Such a node x̄ exists (although it need not be unique) because the set of nodes from which it is chosen is finite
and non-empty (see the exercises).
GAME THEORY 337
We may conclude, therefore, that when the game begins at x̄ and the others employ
s−i thereafter, i’s payoff if he takes the action at x̄ specified by si , but subsequently employs
si , exceeds that when i employs si beginning at x̄ as well as subsequently. But the latter
payoff is i’s backward induction payoff when the backward induction algorithm reaches
node x̄, and therefore must be the largest payoff that i can obtain from the actions available
at x̄ given that s (the backward induction strategies) will be employed thereafter. This
contradiction completes the proof.
Thus, every backward induction joint strategy tuple constitutes a Nash equilibrium.
Because the backward induction algorithm always terminates in finite games with perfect
information, we have actually established the following.
L R
l r l r
1 0 0 3
3 0 0 1
this isolates player 2’s information set, i.e., the point in the game reached after player 1
has chosen IN and then L or R.
Note that when it is player 2’s turn to play, taking either action l or r will result in
the end of the game. Now, according to the backward induction algorithm, the next step
is to choose an optimal action for player 2 there. But now we are in trouble because it is
not at all clear which action is optimal for player 2. This is because player 2’s best action
depends on the action taken by player 1. If player 1 chose L, then 2’s best action is l,
whereas if player 1 chose R, then 2 should instead choose r. There is no immediate way
out of this difficulty because, by definition of the information set, player 2 does not know
which action player 1 has taken.
Recall the reason for solving the game backwards in the first place. We do so because
to determine optimal play early in the game, we first have to understand how play will
proceed later in the game. But in the example at hand, the reverse is also true. To determine
optimal play later in the game (i.e., at player 2’s information set), we must first understand
how play proceeds earlier in the game (i.e., did player 1 choose L or R?). Thus, in this
game (and in games of imperfect information quite generally), we must, at least to some
extent, simultaneously determine optimal play at points both earlier and later in the game.
Let us continue with our analysis of the game of Fig. 7.17. Although we would like
to first understand how play will proceed at the ‘last’ information set, let us give up on this
for the preceding reasons, and do the next best thing. Consider moving one step backward
in the tree to player 1’s second decision node. Can we determine how play will proceed
from that point of the game onwards? If so, then we can replace that ‘portion’ of the game,
or subgame, with the resulting payoff vector, just as we did in the backward induction
algorithm. But how are we to determine how play will proceed in the subgame beginning
at player 1’s second information set?
The idea, first developed in Selten (1965, 1975), is to consider the subgame as a
game in its own right. (See Fig. 7.18.) Consider now applying the Nash equilibrium solu-
tion concept to the game of Fig. 7.18. There are two pure strategy Nash equilibria of this
GAME THEORY 339
L R
l r l r
1 0 0 3
3 0 0 1
L R
1 OUT
2 2
2
l r l r IN
1 0 0 3 1
3 0 0 1 3
(a) (b)
game: (L, l), and (R, r).16 Let us suppose that when this subgame is reached in the course of
playing the original game, one of these Nash equilibria will be played. For concreteness,
suppose it is (L, l). Consequently, the resulting payoff vector will be (1, 3) if the sub-
game is reached. We now can proceed analogously to the backward induction algorithm
by replacing the entire subgame by the resulting payoff vector (1, 3). (See Fig. 7.19.) Once
done, it is clear that player 1 will choose OUT at his first decision node, because given the
behaviour in the subgame, player 1 is better off choosing OUT, yielding a payoff of 2, than
choosing IN and ultimately yielding a payoff of 1.
Altogether, the strategies previously derived are as follows. For player 1: OUT at his
first decision node and L at his second; for player 2: l at his information set.
A couple of similarities with the perfect information case are worth noting. First,
these strategies reflect the look-ahead capability of player 1 in the sense that his play at his
first decision node is optimal based on the Nash equilibrium play later in the game. Thus,
not only is player 1 looking ahead, but he understands that future play will be ‘rational’
16 Thereis also a mixed-strategy Nash equilibrium, but the discussion will be simplified if we ignore this for the
time being.
340 CHAPTER 7
in the sense that it constitutes a Nash equilibrium in the subgame. Second, these strategies
form a Nash equilibrium of the original game.
The strategies we have just derived are called subgame perfect equilibrium strate-
gies. As you may recall, there were two pure strategy Nash equilibria in the subgame, and
we arbitrarily chose one of them. Had we chosen the other, the resulting strategies would
have been quite different. Nonetheless, these resulting strategies, too, are subgame perfect
according to the following definition. You are asked to explore this in an exercise.
To give a formal definition of subgame perfect equilibrium strategies, we must first
introduce some terminology.
Thus, if a node x defines a subgame, then every player on every turn knows whether x
has been reached. Fig. 7.20(a) shows a node x defining a subgame, and Fig. 7.20(b) shows
a node x that does not. In the game depicted in Fig. 7.20(a), every node within player 1’s
non-singleton information set follows x. In contrast, nodes y and z are both members of
player 3’s information set in Fig. 7.20(b), yet only y follows x.
The subgame defined by a node such as x in Fig. 7.20(a) is denoted by x . x consists
of all nodes following x, and it inherits its information structure and payoffs from the
original game . Fig. 7.21 depicts the subgame x derived from the game in Fig. 7.20(a).
Given a joint pure strategy s for , note that s naturally induces a joint pure strategy
in every subgame x of . That is, for every information set I in x , the induced pure
strategy takes the same action at I that is specified by s at I.
1 1
2 x 2 2 x 2
0
2
3
1 1 z 3 y
1
1
1 0 2 0 0 1 3 0 0 1 4 1
0 1 1 0 0 2 1 0 0 3 4 1
3 0 0 1 4 1
(a) (b)
Figure 7.20. (a) Node x defines a subgame; (b) node x does not define a subgame.
GAME THEORY 341
2 0 0 1
1 0 0 2
1 OUT 1
2
IN
2
2
L R L R
1 1
l r l r l r l r
0 3 ⫺1 0 0 3 ⫺1 0
0 1 3 0 0 1 3 0
(a) (b)
Figure 7.22. (a) A Nash, but not subgame perfect, equilibrium; (b) player 2’s best
response in the subgame.
Note that because for any extensive form game , the game itself is a subgame, a
pure strategy subgame perfect equilibrium of is also a pure strategy Nash equilibrium
of . Consequently, the subgame perfect equilibrium concept is a refinement of the Nash
equilibrium concept. Indeed, this refinement is strict, as the example shown in Fig. 7.22
demonstrates.
The pure strategy depicted by the arrows in Fig. 7.22(a) is a Nash equilibrium
because neither player can improve his payoff by switching strategies given the strategy
of the other player. However, it is not subgame perfect. To see this, note that the strategies
induced in the subgame beginning at player 2’s node do not constitute a Nash equilib-
rium of the subgame. This is shown in Fig. 7.22(b) where the subgame has been isolated
342 CHAPTER 7
and the double arrow indicates a deviation that strictly improves player 2’s payoff in the
subgame.17
The next theorem shows that subgame perfection, which is applicable to all exten-
sive form games, is a generalisation of backward induction, which applies only to perfect
information games.
Proof: We first argue that every backward induction strategy is subgame perfect. So let s
denote a backward induction strategy. Because in a game with perfect information every
node defines a subgame (see the exercises), we must argue that s induces a Nash equilib-
rium in the subgame defined by x for all x. But for each x, x , the subgame defined by x, is
of course a perfect information game, and the strategy induced by s is clearly a backward
induction strategy for the subgame. (To see this, think about how the backward induction
strategy s is constructed, and then think about how backward induction strategies for the
subgame would be constructed.) Consequently, we may apply Theorem 7.4 and conclude
that the strategies induced by s form a Nash equilibrium of x .
Next we argue that every pure strategy subgame perfect equilibrium is a backward
induction strategy. Let s be subgame perfect. It suffices to verify that s can be derived
through the backward induction algorithm. Consider then any penultimate decision node.
This node defines a one-player subgame, and because s is subgame perfect, it must assign
a payoff-maximising choice for the player whose turn it is to move there (otherwise, it
would not be a Nash equilibrium of the one-player subgame). Consequently, the action
specified by s there is consistent with the backward induction algorithm. Consider now any
decision node x having only penultimate decision nodes following it. This node defines a
subgame in which at all nodes following it, the strategy s specifies a backward induction
action. Because s induces a Nash equilibrium in this subgame, it must specify a payoff-
maximising choice for player ι(x) at node x given that the choices to follow are backward
induction choices (i.e., the choices induced by s). Consequently, the action specified at any
such x is also consistent with the backward induction algorithm. Working our way back
through the game tree in this manner establishes the result.
Just as pure strategy Nash equilibria may fail to exist in some strategic form games,
pure strategy subgame perfect equilibria need not always exist. Consider, for example, the
game depicted in Fig. 7.23. Because the only subgame is the game itself, the set of pure
strategy subgame perfect equilibria coincides with the set of pure strategy Nash equilibria
17 Note that although player 2’s payoff can be increased in the subgame, it cannot be increased in the original
game. This is because the subgame in question is not reached by the original strategies. Indeed, Nash equilibrium
strategies of the original game induce Nash equilibria in all subgames that are reached by the original strategies.
Thus, it is precisely subgame perfection’s treatment of unreached subgames that accounts for its distinction from
Nash equilibrium. See the exercises.
GAME THEORY 343
l r l r
1 ⫺1 ⫺1 1
⫺1 1 1 ⫺1
of this game. However, it is easy to verify that among the four possible joint pure strategies
none constitutes a Nash equilibrium.
To guarantee the existence of at least one subgame perfect equilibrium, we must
allow players the opportunity to randomise. The next section considers randomisation in
extensive form games.
1 1
L R L R
2 2
l r l r
1 1
L R L R L R L R
(a) (b)
1 1
1 1
L R 2 2
L R
2 2
l r l r
1 1
L R L R 2 1 2 1
3 L R 3 3 L R 3
(c) (d)
behavioural) employed by the other players. Similarly, for each behavioural strategy, there
is an equivalent mixed strategy.18
EXAMPLE 7.6 Figs. 7.24(a) to 7.24(c) depict three pure strategies for player 1 in the
extensive form game there, namely, LL, RL, and RR. Consider the mixed strategy placing
probability 1/2, 1/3, and 1/6, respectively, on these pure strategies. What then is this
mixed strategy’s equivalent behavioural strategy? To find out, we simply calculate the
induced probability that each action is taken conditional on the information set at which it
is available having been reached. For example, because player 1’s first information set is
necessarily reached, the induced probability that the action L is chosen there is 1/2, as is
the probability that R is chosen there [see Fig. 7.24(d)]. For player 1’s second information
set, note that it is reached only by the pure strategies RL and RR. Consequently, conditional
on one of these pure strategies having been chosen, the probability that L is chosen at player
1’s second information set is 2/3, and the probability that R is chosen is 1/3. Putting this
together, Fig. 7.24(d) depicts the equivalent behavioural strategy.
As mentioned, all games that will concern us in this text have the property that mixed
and behavioural strategies are equivalent. This property is shared by all games with perfect
recall.
Perfect recall says that each player always remembers what he knew in the past about
the history of play. In particular, Definition 7.18 implies that any two histories (i.e., y and
w) that a player’s information set does not allow him to distinguish between can differ only
in the actions taken by other players. So, in particular, no player ever forgets an action that
he has taken in the past.
Fig. 7.25 depicts an extensive form game without perfect recall. Note that there is
no behavioural strategy that is equivalent to the mixed strategy placing probability 1/2 on
1 0 0 1
18 Certain mixed (behavioural) strategies may admit multiple equivalent behavioural (mixed) strategies. See Kuhn
(1953) for a complete analysis of the equivalence of mixed and behavioural strategies.
346 CHAPTER 7
each of the pure strategies Ll and Rr, because any such behavioural strategy must place
positive probability on both choices L and R at player 1’s first information set, and it must
also place positive probability on both choices l and r at 1’s second information set. But it
will then also place positive probability on the end nodes (L, r) and (R, l) that the original
mixed strategy does not do.
Because of the equivalence of mixed and behavioural strategies in games with per-
fect recall, we have the luxury of using whichever is most convenient. Consequently, we
shall restrict our attention to the sets of behavioural strategies for each player.
Proof: The proof employs a technique reminiscent of the backward induction algorithm.
We shall construct the desired behavioural strategy in stages working from the end of the
game back to the beginning.
Choose a subgame that contains no subgame but itself. This is always possible
because the game is finite. By Theorem 7.2, this subgame has a Nash equilibrium in mixed
strategies. Because the original game has perfect recall, the subgame does as well, and so
the mixed strategy (in the subgame) has an equivalent behavioural strategy counterpart. Of
course, being equivalent, this behavioural strategy also constitutes a Nash equilibrium in
the subgame.
Now replace the subgame with the payoff vector determined by the equilibrium strat-
egy in it. We have thus reduced the size of the game and have determined that part of the
overall behavioural strategy within the subgame. We may now repeat the process for the
GAME THEORY 347
reduced game, and so on, until we have completely determined a joint behavioural strat-
egy for the original game. Observe that this algorithm must terminate because the game is
finite.
That the behavioural strategy so determined constitutes a subgame perfect equilib-
rium follows in a manner that parallels the first half of the proof of Theorem 7.5. You are
asked to fill in the details in an exercise.
It is important to note that the assumption of perfect recall cannot be dispensed with.
In a game without it, a subgame perfect equilibrium may not exist. This is considered in
one of the exercises.
The process described in the proof is illustrated in Fig. 7.26(a) to 7.26(c). Note how
subgame perfection echoes the theme of backward induction, namely, that optimal play
early in the game is determined by that later in the game. In the next section, we develop a
further refinement of Nash equilibrium in order to more fruitfully apply this central idea.
L R
M
2 2
l' r'
l r l r
1
3 1 1 3
L' R' L' R' ⫺1 1 1 –1
1 0 0 3
3 0 0 1
(a)
L R
M
2
1
3
3 1 1 3
⫺1 1 1 –1
(b)
1
1
(0) 2
1
2
2 2
(1) (0) 1 1 1 1
2 2 2 2
1
3 1 1 3
(1) (0) (1) (0) ⫺1 1 1 –1
1 0 0 3
3 0 0 1
(c)
Figure 7.26. (a) Finding subgame perfect equilibria. The subgame defined by player 2’s singleton
information set contains no subgame but itself. The arrows depict a Nash equilibrium in this
subgame. Replacing this subgame by the equilibrium payoff vector yields the reduced game in (b).
(b) The reduced game. This game has only one subgame, namely, itself. It is not hard to verify that
it possesses a unique Nash equilibrium: player 1 chooses M and R with probability 1/2 each, and
player 2 chooses l and r with probability 1/2 each. (c) A subgame perfect equilibrium. (Can you
find another?)
GAME THEORY 349
L R
M
x 2 y
0
5 l m r l m r
4 ⫺1 0 0 ⫺1 4
0 1 4 4 1 0
1
2 (0) + 2 (4) + p(y) 12 (4) + 12 (0) = 2(p(x) + p(y)) = 2,
1
p(x)
which is strictly larger than the expected utility of 1 obtained by choosing m. Thus, regard-
less of the beliefs that player 2 might hold, at least one of l or r produces a strictly higher
expected utility for player 2 than does m.
Consequently, contrary to the given subgame perfect equilibrium, player 2 will not
play m if reached. Hence, this subgame perfect equilibrium is not a sensible one.
The reason that this subgame perfect equilibrium fails to be sensible is that subgame
perfection does not discipline the behaviour of player 2 at his unreached information set.
It fails to discipline 2’s behaviour there because the unreached information set is not a
singleton and therefore does not define a subgame.
However, as we have seen, by introducing beliefs for player 2 over the nodes within
his information set once it has been reached, we can sensibly discipline his behaviour
there. This can have a profound impact on the set of equilibrium outcomes. You are invited
to show in an exercise that the only subgame perfect equilibrium in which m is given
probability zero by player 2 (as we have argued it ought to be) has player 1 choosing L
with probability zero.
We now formally introduce beliefs for the players over the nodes within their infor-
mation sets for the purposes of refining the set of subgame perfect equilibria in the spirit
of backward induction.
been reached. Thus, we must have x∈I (y) p(x) = 1 for every decision node y. The func-
tion p(·) is called a system of beliefs because it embodies the beliefs of all players at each
of their information sets regarding the history of play up to that point in the game.
In a game tree diagram, we will represent the system of beliefs, p, by placing the
probability assigned to each node within each information set beside the respective node
and in square brackets.
Because a player’s beliefs about the history of play will typically have an important
influence on his current behaviour, it is vital that these beliefs are formed in a sensible
manner.
The question of interest to us is this: for a given behavioural strategy b, which sys-
tems of beliefs, p, are sensible? It is convenient to give the name assessment to a system
of beliefs/behavioural strategy pair (p, b). Given such an ordered pair, (p, b), the beliefs p
are interpreted as those that are held by the players given that the behavioural strategy b is
being played. To rephrase our question then, which assessments are sensible?
For example, consider the game of Fig. 7.28. In it is depicted player 1’s behavioural
strategy as well as player 2’s beliefs (left unspecified as α, β, and γ ). Now, given player
1’s strategy, player 2 can calculate the probability that each of his nodes has been reached
given that one of them has by simply employing Bayes’ rule. Thus, the only sensible
beliefs for player 2, given 1’s strategy, are α = 1/3, β = 1/9, and γ = 5/9.
Thus, for an assessment (p, b) to be sensible, the system of beliefs p ought to be
derived from the given joint behavioural strategy b using Bayes’ rule whenever possible.
That is, letting P(x | b) denote the probability that node x is reached given the behavioural
1
4 5
15 15
1
2 3 15
15 15
[␣] 2 [] [␥]
Figure 7.28. Using Bayes’ rule. To see why Bayes’ rule makes
sense, imagine this game being played 1500 times with the strategy
depicted for player 1. Out of the 1500 plays, on average, the two
leftmost choices of player 1 would occur 400 + 200 = 600 times,
and the other choices would occur 300 + 100 + 500 = 900 times.
Therefore, 2’s information set would be reached 900 times. Out of
these 900, the leftmost node is reached 300 times, the middle node
is reached 100 times, and the rightmost node is reached 500 times.
Thus, from a frequency point of view, given that 2’s information
set has been reached, the probability of the leftmost node is
α = 300/900 = 1/3; the middle node is β = 100/900 = 1/9; and
the rightmost node is γ = 500/900 = 5/9.
GAME THEORY 351
(1) (0)
2
1 2
3 3
[␣] 3 [1⫺ ␣ ]
strategy b, Bayes’ rule states that for every information set I, and every x ∈ I,
P(x | b)
p(x) =
y∈I P(y | b)
whenever the denominator is positive – that is, whenever the information set is reached
with positive probability according to b.19
We state this as our first principle.
Bayes’ Rule: Beliefs must be derived from strategies using Bayes’ rule when possible.
The phrase ‘when possible’ means at all information sets reached with positive probability
according to the given joint strategy. Consequently, it is not always possible to employ
Bayes’ rule. For example, in the game of Fig. 7.29, given the behavioural strategies of
players 1 and 2, player 3’s information set is not reached (i.e., it is reached with probabil-
ity zero). Thus, we cannot formally apply Bayes’ rule in this circumstance to obtain 3’s
beliefs. Nonetheless, given player 2’s strategy there does appear to be a unique sensible
belief for player 3, namely α = 1/3.
The reason that this is the only sensible belief is that player 2’s behavioural strategy,
strictly interpreted, means that he will play left with probability 1/3 if player 1 plays
right, even though player 1 is supposed to play left with probability one. Thus, player 2’s
mixed action already takes into account that player 1 must deviate from his strategy for
2’s strategy to come into play. Consequently, when player 3 is reached, his only sensible
belief is to place probability 1/3 on player 2 having played left.
Are there still further restrictions we might consider imposing on the beliefs that
accompany a given behavioural strategy? Well, in a word, yes. Consider Figs. 7.30 and
7.31, both of which specify a behavioural strategy for players 1 and 2. In each game,
any choice of α and β between zero and one will suffice to render the resulting assess-
ment compatible with Bayes’ rule. Moreover, the type of argument used in the example of
Fig. 7.29 is simply unavailable. Nonetheless, there is good reason to insist that in each case
19 To keep the notation simple, we have not emphasised the fact that P(x | b) also depends on chance’s
distribution π.
352 CHAPTER 7
[␣ ] 2 [1⫺ ␣ ]
r (1) r (1)
(0) l (0)
l
(0)
[ ] 3 [1 ⫺  ]
2
(1)
(1)
[␣ ] 3 [1 ⫺ ␣ ] [ ] 3 [1 ⫺  ]
α = β. Indeed this equality follows from two additional principles that we intentionally
describe only informally. They are as follows.
Independence: Beliefs must reflect that players choose their strategies independently.
To see how these two principles lead to α = β, consider Fig. 7.30. When player 2’s
information set is reached, α is the probability that player 2 places on player 1 having
chosen L. Now, although this is not represented in the diagram, the principle of common
beliefs implies that player 3 also places probability α on player 1 having chosen L at
this point in the game (i.e., when given exactly the same information as player 2). But
by independence of the players’ strategies, finding out the strategy choice of player 2
provides player 3 with no information whatever regarding the strategy chosen by player
GAME THEORY 353
1.20 Consequently, player 3’s beliefs about player 1 must remain unchanged (i.e., equal to
α on L by player 1) even after finding out that player 2 has chosen L. But this means that
β = α.
Similar reasoning can be applied to Fig. 7.31. Finding out whether or not player 1
played Left or Right should not (by independence) affect 3’s beliefs about the probability
that player 2 chose Left versus Middle, that is, α = β. Note that the common beliefs prin-
ciple is not needed in this case because the two information sets in question are owned by
the same player.
Altogether, the three principles – Bayes’ rule, independence, and common beliefs –
suffice to yield all of the restrictions on beliefs we have considered so far in all previous
examples. (You are asked in an exercise to show that independence yields the restriction
α = 1/3 in the game of Fig. 7.29.) Of course, this claim is only an informal one because the
independence and common beliefs principles are stated only informally. What we really
need is a formal definition of what it means for an assessment to be ‘sensible’, and this is
what we now provide. After stating the definition, we will talk about how it relates to the
three principles: Bayes’ rule, independence, and common beliefs.
To prepare for the definition requires a little terminology. A behavioural strategy in
a finite extensive form game is called completely mixed if it assigns strictly positive prob-
ability to every action at every information set. You are asked to show as an exercise that
under a completely mixed strategy every information set is reached with strictly positive
probability. Consequently, for such strategies, Bayes’ rule alone uniquely determines the
players’ beliefs.
20 Note that the independence principle applies even if player 2’s single information set in Fig. 7.30 is split into
two singleton information sets. In this case, player 2’s decision of l or r may well depend on player 1’s choice of
L or R. Consequently, finding out whether player 2 chose l or r does provide player 3 with information regarding
player 1’s strategy choice. However, this does not violate the independence principle because in the new game,
player 2’s strategy set is {ll, lr, rl, rr}, not {l, r}, and according to the independence principle finding out which
strategy player 2 has chosen must not provide player 3 with any information regarding the strategy choice of
player 1.
354 CHAPTER 7
(i) Players are able to assign relative probabilities, possibly infinite, to any pair
of joint pure strategies.
(ii) The players’ relative probabilities satisfy standard probability laws (e.g.,
Bayes’ rule).
(iii) The players’ relative probabilities coincide with the relative probabilities of
an outside observer (common beliefs).
(iv) The outside observer’s relative probabilities for the present strategic situ-
ation would not change after observing the outcome of any finite number
of identical strategic situations (a form of independence related to ‘infinite
experience’).
In our opinion, the equivalence of consistency with these four principles indicates that
consistency is an idealised restriction on beliefs. Of course, not all practical settings will
conform to these ideals and one must therefore be careful not to apply consistency inap-
propriately. However, if one’s goal is to understand strategic behaviour among idealised
‘rational’ players, then in light of the above equivalence, consistency is entirely reasonable.
Sequential Rationality
Now that we have explored the relationship between beliefs and strategies, we can return
to the task of developing a sensible notion of backward induction for general extensive
form games.
For games with perfect information, the backward induction solution amounts to
insisting that all players make choices that are payoff-maximising whenever it is their
turn to move. Subgame perfection attempts to extend this idea beyond perfect information
games. However, as the example of Fig. 7.27 illustrates, subgame perfection is not quite
strong enough to rule out behaviour that is suboptimal at every information set.
Now that we have endowed each player with beliefs about the history of play when-
ever it is that player’s turn to move, it is straightforward to require that the choices made
at each information set of every player be optimal there. Once this is done, we will have
appropriately extended the backward induction idea to general extensive form games. We
now formally pursue this line of thought.
Fix a finite extensive form game. Consider an assessment, (p, b), and an information
set, I, belonging to player i. To check that player i’s behavioural strategy, bi , is optimal for
i once his information set I is reached, we must be able to calculate the payoff to i of any
other strategy he might employ once I is reached.
Let us first calculate i’s payoff according to the assessment (p, b) given that his infor-
mation set I has been reached. For each node x in I, we can use b to calculate player i’s
payoff beginning from x. To do this, simply treat x as if it defined a subgame. For each
such x, let ui (b | x) denote this payoff number. Thus, ui (b | x) is the payoff to i if node x
in I has been reached. Of course, player i does not know which node within I has been
reached. But the system of beliefs, p, describes the probabilities that i assigns to each node
in I. Consequently, player i’s payoff according to (p, b), given that I has been reached, is
simply the expected value of the numbers ui (b | x) according to the system of beliefs p,
namely,
p(x)ui (b | x).
x∈I
We denote this payoff by vi (p, b | I). See Fig. 7.32 and 7.33 for an example in which
vi (p, b | I) is calculated.
Now that we know how to calculate i’s payoff from an arbitrary assessment condi-
tional on one of his information sets having been reached, it is straightforward to compare
this payoff to what he can obtain by changing his strategy at that point in the game, and
this is the basis for the central definition of this section.
356 CHAPTER 7
1
2 2 1
5 2 10
3 2 I
5 5
1 1 1
2 x y 1 z
2 3 6
(1) (0) 1 2 2 1 2
3 3 3 3 3
1 1
5 6 3 0 6
1 1 3
2 2
3
1 4
3 1 3 1
4 4 4 4
8 12 4 12
Figure 7. 32. Payoffs conditional on an information set. See Fig. 7.33 for the calculation of 1’s
payoff conditional on I having been reached.
x y z
1 2 1 2 1 2
3 3 3 3 3 3
6 3 0 6
3 1 3 1
4 4 4 4
8 12 4 12
(a) (b) (c)
Figure 7. 33. Calculating payoffs at an information set. Treating separately each node, x, y,
and z within 1’s information set labelled I in Fig. 7.32, we see from (a) that u1 (b | x) =
3 (6) + 3 (3) = 4, from (b) that u1 (b | y) = 3 [ 4 (8) + 4 (12)] + 3 [0] = 3, and from (c) that
1 2 1 3 1 2
We also call a joint behavioural strategy, b, sequentially rational if for some system of
beliefs, p, the assessment (p, b) is sequentially rational as above.
[0] 2 [1]
H T H T
(1) (0) (1) (0)
1 ⫺1 ⫺1 1
⫺1 1 1 ⫺1
22 In effect, the players are making their choices simultaneously. Thus, this extensive form game is equivalent to
the strategic form game in which the players’ choices are in fact made simultaneously (i.e., the strategic form
game of section 7.2, which we called the batter–pitcher duel; it is more commonly known in the game theory
literature as matching pennies). In this sense, any strategic form game can be modelled as an extensive form game
in which each of the players moves once in some fixed (but arbitrary) order and in which no player is informed
of the choice made by any previous player.
358 CHAPTER 7
The unique Nash equilibrium of this game, and hence the unique subgame perfect
equilibrium, is for both players to randomise by choosing Heads and Tails with probability
1/2 each. However, consider the assessment depicted in the figure in which both players
choose Heads with probability 1, and player 2’s beliefs place probability 1 on player 1
having chosen Tails. This assessment, although not a Nash equilibrium, is sequentially
rational because player 1 is obtaining his highest possible payoff, and according to 2’s
beliefs, when his information set is reached, he, too, obtains his highest possible payoff.
This is because according to 2’s beliefs, player 1 has chosen Tails with probability one.
Consequently, by choosing Heads, player 2’s payoff is maximised – again, according to
his beliefs.
Thus, sequentially rational assessments need not even be Nash equilibria. Clearly
the difficulty with this example is that player 2’s beliefs are not derived from the strategies
via Bayes’ rule.
Putting the notion of sequential rationality together with the three principles
connecting beliefs with strategies discussed in the previous subsection – Bayes’ rule, inde-
pendence, and common beliefs – leads to the following important equilibrium concept
introduced in Kreps and Wilson (1982).
Because (as you are asked to show in an exercise) consistent assessments do indeed
satisfy Bayes’ rule, the unique sequential equilibrium of the matching pennies game of
Fig. 7.34 has each player choosing Heads with probability 1/2.
It is instructive to apply the sequential equilibrium concept to a less transparent
example.
EXAMPLE 7.7 Consider a variant of matching pennies, which we will call ‘sophisticated
matching pennies’. There are three players, each in possession of a penny. The objectives
of the players are as follows: player 3 wishes to match the choice of player 1, and player
1 wishes for just the opposite. Player 2’s role is to ‘help’ player 3 try to match player 1’s
choice. Thus, you can think of players 2 and 3 as being team members (although making
independent choices) playing against player 1. There are four dollars at stake.
How exactly is player 2 allowed to help player 3 guess player 1’s choice of Heads or
Tails? The answer is of course embodied in the precise rules of the game, which we have
not yet spelled out. They are as follows: player 1 begins by secretly placing his coin either
Heads up or Tails up in his palm. Player 2 then does the same. Players 1 and 2 then reveal
their coins to a referee (being careful not to reveal either coin to player 3). The referee
then informs player 3 of whether or not the coins of players 1 and 2 match. Player 3 must
then decide whether to choose Heads or Tails. If 3’s choice matches 1’s, then player 1 pays
players 2 and 3 two dollars each. Otherwise, players 2 and 3 each pay player 1 two dollars.
To make the game a little more interesting, we will also give players 1 and 2 the choice to
GAME THEORY 359
1 Quit ⫺2
1
H T 1
2 Quit Quit 2
⫺1 [␣1 ] 2 [␣ 2 ] ⫺1
⫺1 ⫺1
H T H T
[1 ] 3 [ 2 ]
H T H T
⫺4 4 4 ⫺4
2 ⫺2 ⫺2 2
2 ⫺2 ⫺2 2
[␥ 1 ] 3 [␥ 2 ]
H T H T
⫺4 4 4 ⫺4
2 ⫺2 ⫺2 2
2 ⫺2 ⫺2 2
quit on their turns. Quitting costs two dollars. So if player 1 quits, he must pay players 2
and 3 one dollar each, and if player 2 quits, players 2 and 3 each pay player 1 one dollar.
The entire game is depicted in Fig. 7.35. You should check that the figure is compatible
with the description just given.
This game possesses multiple sequential equilibria. It is instructive to demonstrate
how one of these is calculated. You are asked to find the others in an exercise.
In the figure, we have indicated player 2’s beliefs by αi , and player 3’s by βi and γi .
For expositional ease, we shall refer to player 3’s information set with beliefs indicated by
βi as 3’s ‘beta’ information set, and the other as 3’s ‘gamma’ information set.
Let x and y denote the probabilities that players 1 and 2, respectively, place on
Heads, and let x̄ and ȳ denote the probabilities they place on Tails. Let zβ and zγ denote
the probabilities that player 3 places on Heads at his information sets beta and gamma,
respectively.
Thus, the vector (α1 , β1 , γ1 ; x, x̄, y, ȳ, zβ , zγ ) is an assessment for the game. We shall
now search for a sequential equilibrium in which each of x, x̄, y, ȳ, zβ , zγ is strictly between
zero and one and in which players 1 and 2 never quit. Of course, there is no guarantee that
such a sequential equilibrium exists. But if there is one, our search will discover it.
Let us then assume that each of x, x̄, y, ȳ, zβ , zγ is strictly between zero and one
and that x + x̄ = y + ȳ = 1. Consequently, each information set is reached with positive
360 CHAPTER 7
probability, and so for the assessment to be consistent, it suffices that the beliefs be derived
using Bayes’ rule (see the exercises). Thus, for consistency, we must have
xȳ xy
α1 = x, β1 = , and γ1 = . (E.1)
xȳ + yx̄ xy + x̄ȳ
xȳ yx̄
v3 (H | I3β ) = (2) + (−2),
xȳ + yx̄ xȳ + yx̄
xȳ yx̄
v3 (T | I3β ) = (−2) + (2), (E.3)
xȳ + yx̄ xȳ + yx̄
xy x̄ȳ
v3 (H | I3γ ) = (2) + (−2),
xy + x̄ȳ xy + x̄ȳ
xy x̄ȳ
v3 (T | I3γ ) = (−2) + (2),
xy + x̄ȳ xy + x̄ȳ
where, for example, (E.2) gives player 1’s payoff of playing Heads at his information set
denoted I1 , and (E.3) gives player 3’s payoff of playing Tails at his beta information set
denoted I3β .
Now by the comment above, x, x̄,y, ȳ, zβ , and zγ must yield the following indiffer-
ences:
v1 (H | I1 ) = v1 (T | I1 ),
v2 (H | I2 ) = v2 (T | I2 ),
v3 (H | I3β ) = v3 (T | I3β ), and
v3 (H | I3γ ) = v3 (T | I3γ ).
obtaining x = x̄ = y = ȳ = 1/2. Given this, the first two indifferences imply that zβ =
zγ = 1/2 as well.
Because player 3 has exactly two choices at each information set and he is indiffer-
ent between them, his behaviour is payoff-maximising at each of his information sets. It
remains only to check that players 1 and 2 are maximising their payoffs. Thus, we must
check that neither does better by quitting. That this is in fact the case follows because by
quitting, players 1 and 2 obtain a negative payoff, whereas choosing Heads or Tails yields
a payoff of 0.
Thus, the assessment (α1 , β1 , γ1 ; x, x̄, y, ȳ, zβ , zγ ) in which every entry is 1/2 is a
sequential equilibrium.
Note that in the sequential equilibrium calculated here, each player receives a payoff
of zero. Thus, player 3 is actually getting no significant help from player 2, because without
player 2, the game would be a standard matching pennies game between players 1 and 3.
In the exercises, you are asked to find all other sequential equilibria. You will discover that
players 2 and 3 fare better in other equilibria.
There is more we can learn from this example. Indeed, it is instructive to consider
the assessment (α1 , β1 , γ1 ; x, x̄, y, ȳ, zβ , zγ ) = (1, 0, 0; 1, 0, 0, 0, 0, 0), in which player 1
chooses Heads with probability 1, player 2 quits with probability 1, and player 3 chooses
Tails with probability 1.
This assessment seems rather silly because even though player 1 is sure to choose
Heads, and player 3 would like to match it, player 3 chooses Tails, regardless of the choice
of player 2. Despite this, the assessment is sequentially rational, and satisfies Bayes’ rule!
To see sequential rationality, note that player 1 is certainly maximising at his information
set since player 2 quits. Also player 2 is maximising at his information set, because given
his beliefs (which place probability 1 on player 1 having chosen Heads) and the strategy of
player 3 (to choose Tails no matter what), player 3 is certain not to match player 1. Thus, it
is best for 2 to quit. Finally, given that player 3 believes at each of his information sets that
(if reached) player 1 has chosen Tails, it is indeed best for player 3 to also choose Tails. To
verify Bayes’ rule, simply note that the only non-singleton information set reached by the
joint behavioural strategy is player 2’s, and his beliefs are indeed those induced by Bayes’
rule from the strategy.
Although this assessment satisfies Bayes’ rule and is sequentially rational, it is not
a sequential equilibrium. Indeed, it is not consistent. Intuitively, one senses that there
is something wrong with player 3’s beliefs. Before showing how the assessment for-
mally violates the consistency condition embodied in Definition 7.20, it is helpful to
first think about it intuitively. To do so, recall that consistency embodies three princi-
ples: Bayes’ rule, independence, and common beliefs. Although the given assessment does
satisfy Bayes’ rule, it does not satisfy independence. Indeed, we shall argue that indepen-
dence implies that one of β2 or γ2 must be zero. (Yet both are equal to 1 in the given
assessment.)
Let b1 (b2 ) denote the left (right) node in player 3’s beta information set, and let
g1 (g2 ) denote the left (right) node in 3’s gamma information set. Given the strategies, but
before they are carried out, consider the following question pondered by player 3. ‘What
362 CHAPTER 7
is the likelihood of node g1 relative to node b2 ?’ We wish to argue that player 3’s answer
is: ‘Node g1 is infinitely more likely than node b2 ’.23
The reason is as follows. From Fig. 7.35, note that the question can be rephrased as:
‘Given that player 2 chooses Heads, what is the likelihood that player 1 chooses Heads
relative to Tails?’ But by independence, player 3 gains no information about 1’s strategy
choice by finding out the strategy choice of player 2. Consequently, the above question
must have the same answer as the question: ‘Given that player 2 chooses Quit, what is the
likelihood that player 1 chooses Heads relative to Tails?’ But the answer to the latter ques-
tion must be that Heads by 1 is infinitely more likely than Tails given 2’s choice to Quit,
because this is precisely what the proposed strategies indicate. Hence, by independence,
the answer to the original question must be the same – that g1 is infinitely more likely
than b2 .
An analogous argument shows that the answer to the question ‘What is the likelihood
of node b1 relative to node g2 ?’ must be that the former is infinitely more likely than the
latter. (Provide the argument.)
Finally, consider player 3’s question: ‘What is the likelihood of node g1 relative to
node b1 ?’ Although we cannot be certain of 3’s answer, there are only two possibilities.
Either g1 is more likely (not necessarily infinitely more) than b1 or it is not. If it is, then
because b1 is infinitely more likely than g2 , it must be the case that g1 is infinitely more
likely than g2 . But this is equivalent to saying that γ1 = 1 and γ2 = 0. Thus, in this case,
γ2 = 0.
On the other hand, if b1 is at least as likely as g1 , then because g1 is infinitely more
likely than b2 , it must be the case that b1 is infinitely more likely than b2 . But this is
equivalent to saying that β1 = 1 and β2 = 0.
Consequently, independence implies that either γ2 = 0, or β2 = 0. We conclude that
the given assessment does not satisfy independence.
This intuitive account does not constitute a formal demonstration that the assessment
fails to be consistent. It is meant only to provide you with a little more insight into the
nature of the difficulty with it. We will now formally show that the assessment fails to be
consistent – and therefore that it is not a sequential equilibrium – by proving the following
result.
(α1 )2 β2 γ2 = (α2 )2 β1 γ1 .
Before we give a proof of the claim, note that when α1 = 1 (as in the assessment we
are analysing), the equation says that one of β2 or γ2 must be zero, precisely as we argued
23 To say that one event is infinitely more likely than another simply means that conditional on one of the two
having occurred, the one is assigned probability one, and the other is assigned probability zero. So, for example,
given the players’ strategies, and before the game begins, we would say that the choice of Heads by player 1 is
infinitely more likely than the choice of Tails because the former has probability one and latter probability zero.
GAME THEORY 363
above using independence. Consequently, proving the claim does indeed demonstrate that
the given sequentially rational assessment is not consistent and therefore not a sequential
equilibrium.
Proof of the Claim: If the assessment (α1 , β1 , γ1 , x, x̄, y, ȳ, zβ , zγ ) is consistent, then
according to Definition 7.20, there is a completely mixed sequence of behavioural strate-
gies xn , x̄n ,yn , ȳn ,znβ , znγ converging to x, x̄, y, ȳ, zβ , zγ , respectively, whose associated
sequences of Bayes’ rule induced beliefs α1n , β1n , γ1n converge to α1 , β1 , γ1 , respectively.
Now, because all behavioural strategy probabilities are strictly positive along the
sequence, we have the identity
2
xn x̄n yn x̄n ȳn
= 1, for all n.
x̄n xn ȳn xn yn
α1n xn
= n,
α2n x̄
β2n x̄n yn
= , and
β1n xn ȳn
γ2n x̄n ȳn
= n n
γ1n x y
for all n. Consequently, we may substitute these expressions into the identity and rearrange
to obtain
n 2 n n n 2 n n
α1 β2 γ2 = α2 β1 γ1 for every n.
The desired result now follows by taking the limit of both sides as n tends to infinity.
We end this section with the following theorem, which, on the one hand, indicates
the overall coherence of the sequential equilibrium notion, and on the other shows that
sequential equilibrium is indeed an extension of backward induction to general extensive
form games.
In the next chapter, we shall make good use of the game theoretic ideas we have
developed here to understand the important economic consequences of informational
asymmetries.
7.4 EXERCISES
7.1 Formulate the strategic form games associated with both Cournot and Bertrand duopoly.
7.2 For iterative elimination of strictly dominated strategies, show that the sets are nested and that the
procedure terminates in finitely many rounds if the game is finite. Can you provide a tight upper
bound on the number of iterations that might be required?
7.3 Our procedures for iteratively eliminating (weakly or strictly) dominated strategies eliminate all
possible strategies each round. One might consider eliminating only some of those strategies that
are dominated in each round. In this sense, one can alter the order in which dominated strategies are
eliminated.
(a) Use the following game to show that the order in which weakly dominated strategies are
eliminated can affect the outcomes that remain.
L M R
U 2, 1 1, 1 0, 0
C 1, 2 3, 1 2, 1
D 2, −2 1, −1 −1, −1
(b) Prove that in a finite game, the order of elimination does not matter when one is eliminating
strictly dominated strategies.
7.4 We have seen that one pure strategy can strictly dominate another pure strategy. Mixed strategies can
also strictly dominate pure strategies, and they can strictly dominate other mixed strategies, too. To
illustrate, consider the following two-player game.
L M R
U 3, 0 0, −3 0, −4
D 2, 4 4, 5 −1, 8
(a) Convince yourself that neither of player 2’s pure strategies L or R strictly dominates his pure
strategy M.
(b) Show that the pure strategy M is strictly dominated by the mixed strategy in which player 2
chooses L and R each with probability 1/2.
GAME THEORY 365
7.5 Consider the ‘guess-the-average’ game discussed at the end of section 7.2.1.
(a) Show that no pure strategy strictly dominates any other.
(b) Find a mixed strategy that strictly dominates 100.
(c) Show that 99 is not strictly dominated.
(d) Show that iterative elimination of strictly dominated strategies yields the unique choice of 1 for
each of the N players, and that this requires 99 rounds of elimination.
(e) Show that when there are N = 3 players, and one applies the procedure of iterative weak
dominance, then Wi1 = {1, 2, . . . , 14}, Wi2 = {1, 2}, and Wi3 = {1} for every player i.
7.6 Show that any strategy surviving iterative weak dominance also survives iterative strict dominance.
7.7 A two-person game is called zero-sum if the players’ payoffs always sum to zero. Let u(x, y) denote
player 1’s payoff in a two-person, zero-sum game when player 1 chooses x ∈ X, and player 2 chooses
y ∈ Y; consequently, player 2’s payoff is −u(x, y). Both X and Y are finite sets of pure strategies.
The following questions all refer to this two-person, zero-sum game.
(a) Prove the minimax theorem. That is, prove that there exists a pair of mixed strategies m∗1 , m∗2
such that
L R
U 1, 1 0, 0
D 0, 0 0, 0
Also show that there are two Nash equilibria, but only one in which neither player plays a weakly
dominated strategy.
366 CHAPTER 7
(b)
L R
U 1, 1 0, 1
D 1, 0 −1, −1
Also show that there are infinitely many Nash equilibria, only one of which has neither player
playing a weakly dominated strategy.
(c)
L l m M
U 1, 1 1, 2 0, 0 0, 0
Also show that there is a unique strategy determined by iteratively eliminating weakly dominated
strategies.
7.11 Two hunters are on a stag hunt. They split up in the forest and each have two strategies: hunt for a
stag (S), or give up the stag hunt and instead hunt for rabbit (R). If they both hunt for a stag, they
will succeed and each earn a payoff of 9. If one hunts for stag and the other gives up and hunts for
rabbit, the stag hunter receives 0 and the rabbit hunter 8. If both hunt for rabbit then each receives 7.
Compute all Nash equilibria for this game, called ‘The Stag Hunt’, depicted below. Which of these
equilibria do you think is most likely to be played? Why?
S R
S 9, 9 0, 8
R 8, 0 7, 7
7.12 Call two games with the same strategy sets but different payoffs strategically equivalent if for each
player i and any mixed strategies of the others, player i’s ranking of his pure strategies in one game
coincides with his ranking in the other. Consider again the Stag Hunt game, but suppose that player
1’s payoff when the other player hunts stag is reduced by α ≥ 0 so that the game becomes,
S R
S 9 − α, 9 0, 8
R 8 − α, 0 7, 7
GAME THEORY 367
(a) Show that this game is strategically equivalent to the Stag Hunt game.
(b) Using only the operation of subtracting a constant from a player’s payoff while holding fixed
the other player’s strategy, show that the Stag Hunt game is strategically equivalent to the pure
coordination game,
S R
S 1, 1 0, 0 .
R 0, 0 7, 7
Which equilibrium do you think is most likely to be played in the pure coordination game?
Why? Compare your answers to those you gave in Exercise 7.11. (If your answers are different ask
yourself why, in light of the strategic equivalence of the two games.)
7.13 Consider the penalty kick in soccer. There are two players, the goalie and the striker. The striker has
two strategies: kick to the goalie’s right (R) or to the goalie’s left (L). The goalie has two strategies:
move left (L) or move right (R). Let α be the probability that the kick is stopped when both choose
L and let β be the probability that the kick is stopped when both choose R. Assume that 0 < α <
β < 1. Consequently, the striker is more skilled at kicking to the goalie’s left. The payoff matrix is
as follows.
Kicker
L R
L α, 1 − α 0, 1
Goalie
R 0, 1 β, 1 − β
(a) Before analysing this game, informally answer the following questions.
(i) Would you expect a striker who is more skilled at kicking to the goalie’s left than to his
right, to score more often when he kicks to the goalie’s left?
(ii) If a striker’s ability to score when kicking to the goalie’s left rises (i.e. α decreases) how will
this affect the percentage of times the striker scores when he chooses to kick to the goalie’s
left? Will it affect his scoring percentage when he kicks right?
(b) Find the unique Nash equilibrium.
(c) Answer again the questions in part (a). Based upon this, would it be wise to judge a striker’s
relative scoring ability in kicking left versus right by comparing the fraction of times he scores
when he kicks right versus the fraction of times he scores when he kicks left?
(d) Show that knowing the fraction of times a goal was scored when both players chose L and the
fraction of times a goal was scored when both players chose R would permit you to correctly
deduce the player’s scoring ability when kicking left and right.
(e) Could you correctly deduce the player’s scoring ability when kicking left and right if you only
had access to the striker’s choice? If not, what can be deduced?
368 CHAPTER 7
(f) Could you correctly deduce the player’s scoring ability when kicking left versus right if you only
had access to the goalie’s choice? If not, what can be deduced?
7.14 Three firms use water from a lake for production purposes. Each firm has two pure strategies: purify
sewage (P), or dump raw sewage (D). If zero or one firm chooses D, the water remains pure, but if
two or three firms choose D, the water becomes polluted and each firm suffers a loss of 3. The cost
of purification, P, is 1. Compute all Nash equilibria of this game.
7.15 Show that every finite game possesses a Nash equilibrium in which no player places positive
probability on a weakly dominated pure strategy.
(a) Improve on this result by showing that every finite game possesses a Nash equilibrium m such
that for each player i, mi is not weakly dominated.
(b) Show that the result of part (a) requires finiteness by considering the Bertrand duopoly game
introduced in Chapter 4.
7.16 Show that in a finite strategic form game, the set of strategies surviving iterative weak dominance is
non-empty.
7.17 Consider the strategic form game depicted below. Each of two countries must simultaneously decide
on a course of action. Country 1 must decide whether to keep its weapons or to destroy them. Country
2 must decide whether to spy on country 1 or not. It would be an international scandal for country 1
if country 2 could prove that country 1 was keeping its weapons. The payoff matrix is as follows.
Spy Don’t Spy
Keep −1, 1 1, −1
Destroy 0, 2 0, 2
7.18 Reconsider the two countries from the previous exercise, but now suppose that country 1 can be
one of two types, ‘aggressive’ or ‘non-aggressive’. Country 1 knows its own type. Country 2 does
not know country 1’s type, but believes that country 1 is aggressive with probability ε > 0. The
aggressive type places great importance on keeping its weapons. If it does so and country 2 spies
on the aggressive type this leads to war, which the aggressive type wins and justifies because of the
spying, but which is very costly for country 2. When country 1 is non-aggressive, the payoffs are as
before (i.e., as in the previous exercise). The payoff matrices associated with each of the two possible
types of country 1 are given below.
Country 1 is ‘aggressive’ Country 1 is ‘non-aggressive’
Probability ε Probability 1 − ε
(a) What action must the aggressive type of country 1 take in any Bayesian-Nash equilibrium?
(b) Assuming that ε < 1/5, find the unique Bayes-Nash equilibrium. (Can you prove that it is
unique?)
7.19 A community is composed of two types of individuals, good types and bad types. A fraction ε > 0
are bad, while the remaining fraction, 1 − ε > 0 are good. Bad types are wanted by the police, while
good types are not. Individuals can decide what colour car to drive, red or blue. Red cars are faster
than blue cars. All individuals prefer fast cars to slow cars. Each day the police decide whether to
stop only red cars or to stop only blue cars, or to stop no cars at all. They cannot stop all cars.
Individuals must decide what colour car to drive. Individuals do not like being stopped, and police
do not like stopping good individuals. A bad individual always tries to get away if stopped by the
police and is more likely to get away if driving a red car. The payoff matrices associated with this
daily situation are as follows.
7.23 In this exercise we allow each player in a Bayesian game to have infinitely many types and we allow
a player’s beliefs about other players’ types to be given by a probability density function. Payoff
formulas and the definition of a Bayes-Nash equilibrium are precisely analogous to the finite type
case with summations over types being replaced by integrals.
Consider a first-price, sealed-bid auction in which bidders simultaneously submit bids with
the object going to the highest bidder at a price equal to their bid. Suppose that there are two bidders
and that their values for the object are chosen independently from a uniform distribution over [0, 1].
Think of a player’s type as being the value that player places on the object. A player’s payoff is v − b
when he wins the object by bidding b and his value is v; his payoff is 0 if he does not win the object.
(a) Formulate this as a Bayesian game and find its associated strategic form game. Note that the
associated strategic form game has infinitely many players.
(b) Let bi (v) denote the bid made by player i of type v. Show that there is a Bayesian-Nash
equilibrium in which bi (v) = α + βv for all i and v. Determine the values of α and β.
7.24 Modify the first-price, sealed-bid auction in the preceding exercise so that the loser also pays his bid
(but does not win the object). This modified auction is called an all-pay auction.
(a) Show that there is a Bayesian-Nash equilibrium in which bi (v) = γ + δv + φv2 for all i and v.
(b) How do the players’ bids compare to those in the first-price auction? What is the intuition behind
this difference in bids?
(c) Show that, ex ante, the first-price auction and the all-pay auction generate the same expected
revenue for the seller.
7.25 Fully describe two distinct pure strategies for each player in both the buyer–seller game and the
simplified game of take-away. Calculate the payoffs associated with all four pairs of strategies.
7.26 List all pure strategies for both players in the extensive form games of Figs. 7.12 and 7.14. In
addition, depict their associated strategic forms in a matrix diagram.
7.27 In Fig. 7.36, an insurance company (C), must consider whether to offer a cheap or costly automobile
insurance policy to a driver (D). The company cannot observe whether the driver drives safely or
recklessly, but can observe whether the driver has had an accident or not. The probability of an
accident depends upon whether the driver drives safely or recklessly. If the driver drives safely he
has an accident with probability 1/5. If he drives recklessly he has an accident with probability 4/5.
If the driver drives safely he will not purchase the costly policy. This situation is modelled as an
extensive form game in the figure below. The accident probabilities are modelled as randomisation
by Nature and are given in square brackets in the figure. The driver’s payoff is the top number in
each payoff vector. For example, in the payoff vector (2, −1) the driver’s payoff is 2.
(a) Is there a Nash equilibrium in which the driver drives safely with probability one?
(b) Find the Nash equilibrium which maximises the probability that the driver drives safely.
7.28 Derive backward induction strategies for the games shown in Fig. 7.37 (p. 372).
Reckless Safe
Nature Nature
Accident No Accident Accident No Accident
[4/5] [1/5] [1/5] [4/5]
0 2 ⫺1 1
0 ⫺1 0 1
C
0 2 ⫺1 1
0 ⫺1 0 1
(c) Give an example of a finite game of perfect information in which the backward induction
strategies are not unique, but the payoff vector is.
7.29 The following game, taken from Reny (1992), is called ‘Take-it-or-leave-it’. A referee is equipped
with N dollars. He places one dollar on the table. Player 1 can either take the dollar or leave it. If
he takes it, the game ends. If he leaves it, the referee places a second dollar on the table. Player
two is now given the option of taking the two dollars or leaving them. If he takes them, the game
ends. Otherwise the referee places a third dollar on the table and it is again player 1’s turn to take or
leave the three dollars. The game continues in this manner with the players alternately being given
the choice to take all the money the referee has so far placed on the table and where the referee
adds a dollar to the total whenever a player leaves the money. If the last player to move chooses to
leave the N dollars the game ends with neither player receiving any money. Assume that N is public
information.
(a) Without thinking too hard, how would you play this game if you were in the position of player
1? Would it make a difference if N were very large (like a million) or quite small (like 5)?
(b) Calculate the backward induction strategies. Do these make sense to you?
(c) Prove that the backward induction strategies form a Nash equilibrium.
(d) Prove that the outcome that results from the backward induction strategies is the unique outcome
in any Nash equilibrium. Is there a unique Nash equilibrium?
7.30 Consider the extensive form without payoffs in Fig. 7.38. Suppose that the game either ends in a
win for one player and a loss for the other, or a tie. That is, there are only three possible payoff
372 CHAPTER 7
1 2
2
0
1
0
1 2 2
0
1
1 1 0 5 1 3
0 2 4 1 2 1
(a) (b)
1
2
2
0
1 0 0 3
1 0 0 1
(c)
Figure 7. 37.
2 2
1 1
Figure 7. 38.
GAME THEORY 373
vectors: (0, 0), (1, −1), and (−1, 1). Construct four different games by assigning these payoffs in
some fashion to the endpoints.
(a) Show that in each case, one of the players can ensure a win, or both can ensure a draw.
(b) Can you generalise this finding to some well-known parlour games (noughts and crosses,
draughts, chess)?
7.31 Let Y denote a finite subset of nodes of some extensive form game. Prove that Y contains a node
having no strict follower in Y.
7.32 Provide an example of a finite game of imperfect information and perfect recall in which there is
no ‘last’ information set. That is, for every information set, there is a node, x, within it such that
(x, a) ∈ X is not an end node for some action a.
7.33 Find all subgame perfect equilibria in the game of Fig. 7.17.
7.34 Prove that for every extensive form game, the game itself is a subgame.
7.35 Show that if s is a pure strategy Nash equilibrium of an extensive form game, then s induces a Nash
equilibrium in every subgame that is reached by s.
7.36 Argue that in every game of perfect information, every node defines a subgame.
7.37 Answer the following questions.
(a) Prove that every finite extensive form game with perfect information possesses at least one pure
strategy subgame perfect equilibrium.
(b) Provide an example of a finite extensive form game having no pure strategy subgame perfect
equilibrium.
7.38 Complete the proof of Theorem 7.6 on the existence of subgame perfect equilibrium.
7.39 Find all subgame perfect equilibria of the game in Fig. 7.26(a).
7.40 Answer the following questions for the game shown in Fig. 7.39.
(a) Calculate a Nash equilibrium for the game.
(b) Show that this game has no Nash equilibrium in behavioural strategies.
4 0 0 1 1 0 0 4
⫺1 1 1 0 0 1 1 ⫺1
Figure 7. 39.
374 CHAPTER 7
(c) Conclude that this game does not possess a subgame perfect equilibrium.
(d) What is the source of the failure of existence?
7.41 Argue that for finite extensive form games, if a behavioural strategy, b, is completely mixed, then
(a) every information set is reached with positive probability.
(b) the assessment (p, b) is consistent if and only if p is derived from b using Bayes’ rule.
7.42 Answer the following questions.
(a) Argue that the principle of independence implies that given the behavioural strategy depicted in
the game of Fig. 7.29, the value of α must be 1/3.
(b) Verify that given the behavioural strategies depicted in Figs. 7.30 and 7.31, consistency implies
that in both cases the beliefs must satisfy α = β.
7.43 Prove that if an assessment is consistent, then it satisfies Bayes’ rule, and even Bayes’ rule in every
subgame. (The original assessment induces an assessment in each subgame. When each subgame is
treated as a game in its own right, and the induced assessment on that subgame satisfies Bayes’ rule,
the original assessment is said to satisfy Bayes’ rule in every subgame.)
7.44 Consider the game of Fig. 7.27. Let b denote the behavioural strategy in which player 1 plays L and
player 2 plays m. Prove that for every system of beliefs, p, the assessment (p, b) is not sequentially
rational. Find all sequentially rational assessments.
7.45 Find all sequential equilibria for the game of Fig. 7.35.
7.46 (a) Argue that the class of Bayesian games with a common prior is a subset of the class of extensive
form games by showing that every Bayesian game with a common prior has strategy sets and
payoff functions that are equivalent to those in the extensive form game in which Nature first
chooses the players’ types according to the prior, after which each player is simultaneously
informed only of his own type, after which each player simultaneously takes an action.
(b) Consider a two-player Bayesian game with two types for each player and where all four vec-
tors of types are equally likely. Draw the extensive form game (without specifying payoffs) as
described in part (a).
(c) Prove that Bayes-Nash equilibrium of a Bayesian game induces a sequential equilibrium of the
extensive form game described in part (a) and vice versa.
7.47 Consider the extensive form game in Fig. 7.40. An entrant must decide whether to enter a market
in each of two periods. The market is already occupied by an incumbent, who may either be ‘weak’
or ‘strong’. A weak incumbent is unable to cut its prices low enough to drive the entrant out if
it attempts to enter. A strong incumbent can do so however. The probability that Nature chooses
the strong incumbent is 1/4. The incumbent’s payoff is the top number in each payoff vector. For
example, in the payoff vector (8, 0), the incumbent receives the payoff 8 and the entrant receives the
payoff 0.
(a) Find a Nash equilibrium that is not subgame perfect.
(b) Find a joint behavioural strategy, b, that forms a subgame perfect equilibrium but that cannot be
part of any sequentially rational assessment.
(c) Find an assessment, (p, b), that is sequentially rational and satisfies Bayes’ rule.
GAME THEORY 375
Nature
[1/4] [3/4]
Entrant
Out In Out In
8 Incumbent 8 Incumbent
0 0
Compete Compete
Fight Fight
4
4 4
4
Entrant
Out In Out In
Incumbent Incumbent
7 Fight 5 Fight
−1
Compete −1 Compete
6 2
−2 5 −2 3
1 1
Figure 7. 40.
376 CHAPTER 7
1 a 2 a 3 a
8
−1
d d d 1
1
2
3 l r l r
3
1 0 0 1
0 1 1 2
0 0 2 0
Figure 7. 41.
7.48 Consider the extensive form game in Fig 7.41. Each of players 1, 2, and 3 can play down (d) or
across (a), and player 1 can also play left (l) or right (r).
(a) Identify all subgames.
(b) Find a pure strategy subgame perfect equilibrium, b, such that (p, b) is not sequentially rational
for any system of beliefs p.
(c) Find an assessment, (p, b), that is sequentially rational and satisfies Bayes’ rule in every
subgame.
GAME THEORY 377
1 out −2
0
0
L R
out 2 2 out
−3 −3
−2 −2
0 0
in in
l r l r
3 −3 −9 1
3 −3 −9 1
−2 1 2 −4
Figure 7. 42.
In the neoclassical theory of consumer and firm behaviour, consumers have perfect infor-
mation about important features of the commodities they buy, such as their quality
and durability. Firms have perfect information about the productivity of the inputs they
demand. Because of this, it was possible to develop separately the theories of consumer
demand and producer supply, and thereafter simply put them together by insisting on
market-clearing prices.
One might hope that extending consumer and producer theory to include imper-
fect information would be as simple as incorporating decision making under uncertainty
into those neoclassical models of consumer and producer behaviour. One might then
derive theories of demand and supply under imperfect information, and simply put the
two together once again to construct a theory of market equilibrium. Unfortunately, this
approach would only make sense if the sources of the uncertainty on both sides of the
market were exogenous and so not under the control of any agent involved.
Of course, the quality and durability of a commodity, for example, are not exogenous
features. They are characteristics that are ultimately chosen by the producer. If consumers
cannot directly observe product quality before making a purchase, then it may well be in
the interest of the producer to produce only low-quality items. Of course, knowing this,
consumers will be able to infer that product quality must be low and they will act accord-
ingly. Thus, we cannot develop an adequate equilibrium theory of value under imperfect
information without taking explicit account of the relevant strategic opportunities avail-
able to the agents involved. Notably, these strategic opportunities are significantly related
to the distribution of information across economic agents.
A situation in which different agents possess different information is said to be one
of asymmetric information. As we shall see, the strategic opportunities that arise in the
presence of asymmetric information typically lead to inefficient market outcomes, a form
of market failure. Under asymmetric information, the First Welfare Theorem no longer
holds generally.
Thus, the main theme to be explored in this chapter is the important effect of
asymmetric information on the efficiency properties of market outcomes. In the interest
of simplicity and clarity, we will develop this theme within the context of one specific
market: the market for insurance. By working through the details in our models of the
380 CHAPTER 8
insurance market, you will gain insight into how theorists would model other markets with
similar informational asymmetries. By the end, we hope to have stimulated you to look for
analogies and applications in your own field of special interest.
Symmetric Information
Consider the case in which each consumer’s accident probability can be identified by
the insurance companies. Thus, there is no asymmetry of information here. What is the
competitive (Walrasian) outcome in this benchmark setting in which all information is
public?
To understand the competitive outcome here, it is important to recognise that the
price of any particular commodity may well depend on the ‘state of the world’. For exam-
ple, an umbrella in the state ‘rain’ is a different commodity than an umbrella in the state
‘sunny’. Consequently, these distinct commodities could command distinct prices.
The same holds true in this setting where a state specifies which subset of consumers
have accidents. Because the state in which consumer i has an accident differs from that in
which consumer j does, the commodity (policy) paying L dollars to consumer i when he
has an accident differs from that paying L dollars to j when he does. Consequently, policies
benefiting distinct consumers are in fact distinct commodities and may then command
distinct prices.
So, let pi denote the price of the policy paying L dollars to consumer i should he have
an accident. For simplicity, let us refer to this as the ith policy. We wish then to determine,
for each i = 1, 2, . . . , m, the competitive equilibrium price p∗i of policy i.
Let us first consider the supply of policy i. If pi is less than πi L, then selling such
a policy will result in expected losses. Hence, the supply of policy i will be zero in this
case. On the other hand, if pi is greater than πi L, positive expected profits can be earned, so
the supply of such policies will be infinite. Finally, if pi = πi L, then insurance companies
break even on each policy i sold and hence are willing to supply any number of such
policies.
On the demand side, if pi is less than πi L, then consumer i, being risk averse, will
demand at least one policy i. This follows from our analysis in Chapter 2 where we showed
that risk-averse consumers strictly prefer to fully insure than not to insure at all whenever
actuarially fair insurance is available (i.e., whenever pi = πi L). The same analysis shows
that if pi exceeds πi L, consumer i will purchase at most one policy i. (Recall that fractional
policies cannot be purchased.)
By putting demand and supply together, the only possibility for equilibrium is when
pi = πi L. In this case, each consumer i demands exactly one policy i and it is supplied
by exactly one insurance company (any one will do). All other insurance companies are
content to supply zero units of policy i because at price pi = πi L all would earn zero
expected profits.
We conclude that when information is freely available to all, there is a unique
competitive equilibrium. In it, p∗i = πi L for every policy i = 1, 2, . . . , m. Note that in
this competitive equilibrium, all insurance companies earn zero expected profits, and all
consumers are fully insured.
We wish to argue that the competitive outcome is Pareto efficient – no consumer
or insurance company can be made better off without making some other consumer or
insurance company worse off. By constructing an appropriate pure exchange economy, one
can come to this conclusion by appealing to the First Welfare Theorem. You are invited to
do so in Exercise 8.1. We shall give a direct argument here.
In this setting, an allocation is an assignment of wealth to consumers and insurance
companies in each state. An allocation is feasible if in every state, the total wealth assigned
is equal to the total consumer wealth.
We now argue that no feasible allocation Pareto dominates the competitive alloca-
tion. Suppose, by way of contradiction, that some feasible allocation does Pareto dominate
the competitive one. Without loss of generality, we may assume that the competitive
allocation is dominated by a feasible allocation in which each consumer’s wealth is the
same whether or not he has an accident. (See Exercise 8.6.) Consequently, the dominat-
ing outcome guarantees each consumer i wealth w̄i . For this allocation to dominate the
competitive one, it must be the case that w̄i ≥ w − πi L for each i.
Now, because each consumer’s wealth is certain, we may assume without loss that
according to the dominating allocation, there is no transfer of wealth between any two
consumers in any state. (Again, see Exercise 8.6.) Therefore, each consumer’s wealth is
directly transferred only to (or from) insurance companies in every state.
382 CHAPTER 8
Consider then a particular consumer, i, and the insurance companies who are
providing i with insurance in the dominating allocation. In aggregate, their expected profits
from consumer i are
(1 − πi )(w − w̄i ) + πi (w − L − w̄i ) = w − πi L − w̄i , (8.1)
because w̄i − w (resp., w̄i + L − w) is the supplement to consumer i’s wealth in states in
which he does not have (resp., has) an accident, and the feasibility of the allocation implies
that this additional wealth must be offset by a change in the aggregate wealth of insurance
companies.
But we have already determined that the right-hand side of (8.1) is non-positive. So,
j
letting EPi denote company j’s expected profits from consumer i, we have shown that in
the dominating allocation,
j
w − πi L − w̄i = EPi ≤ 0 for every consumer i. (8.2)
j
Summing (8.2) over i and (8.3) over j shows that each of the two inequalities must be
equalities for every i and j. Consequently, each consumer’s constant wealth and each firm’s
expected profits in the dominating allocation are identical to their competitive allocation
counterparts. But this contradicts the definition of a dominating allocation and completes
the argument that the competitive allocation is efficient.
2 Ifthere are finitely many consumers and therefore finitely many accident probabilities, this means simply that
both π and π̄ are given positive probability by F. More generally, it means that all non-degenerate intervals of
the form [π , a) and (b, π̄ ] are given positive probability by F.
INFORMATION ECONOMICS 383
π ∈ [π , π̄], F(π ) denotes the fraction of consumers having accident probability less than
or equal to π. Equivalently, F(π ) denotes the probability that any particular consumer has
accident probability π or lower. Insurance companies are otherwise exactly as before. In
particular, they each sell only full insurance.
The impact of asymmetric information is quite dramatic. Indeed, even though poli-
cies sold to different consumers can potentially command distinct prices, in equilibrium
they will not. The reason is quite straightforward. To see it, suppose to the contrary that
the equilibrium price paid by consumer i exceeds that paid by consumer j. Because both
consumers are actually purchasing a policy, the expected profits on each sale must be non-
negative – otherwise the insurance company supplying the money-losing policy would not
be profit-maximising. Consequently, because consumers i and j are identical to insurance
companies from an accident probability point of view, the policy sold to consumer i must
earn strictly positive expected profits. But then each insurance company would wish to
supply an infinite amount of such a policy, which cannot be the case in equilibrium. This
contradiction establishes the result: There is a single equilibrium price of the full insurance
policy for all consumers.
Then let p denote this single price of the full insurance policy. We wish now to
determine its equilibrium value, p∗ .
Because positive expected profits result in infinite supply and negative expected prof-
its result in zero supply, a natural guess would be to set p∗ = E(π )L, where E(π ) =
π̄
π πdF(π ) is the expected accident probability. Such a price is intended to render
insurance companies’ expected profits equal to zero. But does it?
To see that it might not, note that this price might be so high that only those
consumers with relatively high accident probabilities will choose to purchase insurance.
Consequently, companies would be underestimating the expected accident probability by
using the unconditional expectation, E(π ), rather than the expectation of the accident
probability conditional on those consumers actually willing to purchase the policy. By
underestimating this way, profits would be strictly negative on average. Thus to find p∗ we
must take this into account.
For any accident probability π, the consumer buys a policy for price p only if the
expected utility from doing so exceeds the expected utility from remaining uninsured: that
is, only if3
Rearranging, and defining the function h(p), we find that the policy will be purchased
only if
u(w) − u(w − p)
π≥ ≡ h(p).
u(w) − u(w − L)
3 For simplicity, we assume that a consumer who is indifferent between buying the policy or not does in fact
buy it.
384 CHAPTER 8
profits are zero. Here, the asymmetry in information causes a significant market failure
in the insurance market. Effectively, no trades take place and therefore opportunities for
Pareto improvements go unrealised.
To understand why prices are unable to produce an efficient equilibrium here, con-
sider a price at which expected profits are negative for insurance companies. Then, other
things being equal, you might think that raising the price will tend to increase expected
profits. But in insurance markets, other things will not remain equal. In general, whenever
the price of insurance is increased, the expected utility a consumer receives from buying
insurance falls, whereas the expected utility from not insuring remains the same. For some
consumers, it will no longer be worthwhile to buy insurance, so they will quit doing so.
But who continues to buy as the price increases? Only those for whom the expected loss
from not doing so is greatest, and these are precisely the consumers with the highest acci-
dent probabilities. As a result, whenever the price of insurance rises, the pool of customers
who continue to buy insurance becomes riskier on average.
This is an example of adverse selection, and it tends here to have a negative influ-
ence on expected profits. If, as in our example, the negative impact of adverse selection
on expected profits outweighs the positive impact of higher insurance prices, there can
fail to be any efficient equilibrium at all, and mutually beneficial potential trades between
insurance companies and relatively low-risk consumers can fail to take place.
The lesson is clear. In the presence of asymmetric information and adverse selection,
the competitive outcome need not be efficient. Indeed, it can be dramatically inefficient.
One of the advantages of free markets is their ability to ‘evolve’. Thus, one might
well imagine that the insurance market would somehow adjust to cope with adverse selec-
tion. In fact, real insurance markets do perform a good deal better than the one we just
analysed. The next section is devoted to explaining how this is accomplished.
8.1.2 SIGNALLING
Consider yourself a low-risk consumer stuck in the inefficient equilibrium we have just
described. The equilibrium price of insurance is so high that you have chosen not to pur-
chase any. If only there were some way you could convince one of the insurance companies
that you are a low risk. They would then be willing to sell you a policy for a price you
would be willing to pay.
In fact, there often will be ways consumers can credibly communicate how risky they
are to insurance companies, and we call this kind of behaviour signalling. In real insurance
markets, consumers can and do distinguish themselves from one another – and they do it by
purchasing different types of policies. Although we ruled this out in our previous analysis
by assuming only one type of policy, we can now adapt our analysis to allow it.
To keep things simple, we will suppose there are only two possible accident proba-
bilities, π and π̄, where 0 < π < π̄ < 1. We assume that the fraction of consumers having
accident probability π is α ∈ (0, 1). Consumers with accident probability π are called
low-risk consumers, and those with accident probability π̄ are called high-risk consumers.
To model the idea that consumers can attempt to distinguish themselves from others
by choosing different policies, we shall take a game theoretic approach.
386 CHAPTER 8
Consider then the following extensive form game, which we will refer to as the
insurance signalling game, involving two consumers (low-risk and high-risk) and a single
insurance company:
• Nature moves first and determines which consumer will make a proposal to the
insurance company. The low-risk consumer is chosen with probability α, and the
high-risk consumer is chosen with probability 1 − α.
• The chosen consumer moves second. He chooses a policy (B, p), consisting of
a benefit B ≥ 0 the insurance company pays him if he has an accident, and a
premium 0 ≤ p≤w he pays to the insurance company whether or not he has an
accident.5
• The insurance company moves last, not knowing which consumer was chosen
by Nature, but knowing the chosen consumer’s proposed policy. The insurance
company either agrees to accept the terms of the consumer’s policy or rejects
them.
The extensive form of this game is shown in Fig. 8.1. When interpreting the game,
think of the insurance company as being one of many competing companies, and think
of the chosen consumer as a randomly selected member from the set of all consumers, of
whom a fraction α are low-risk types and a fraction 1 − α are high-risk types.
Nature
Low High
(␣) risk (1 ⫺ ␣)
risk
Insurance
company
(B, p)
A R A R
Insurance
company
A R A R
5 Notethe slight change in our use of the term policy. It now refers to a benefit–premium pair, (B, p), rather than
simply the benefit. Restricting p to be no higher than w ensures that the consumer does not go bankrupt.
INFORMATION ECONOMICS 387
Conditions (1) and (3) ensure that the assessment is sequentially rational, whereas
condition (2) ensures that the insurance company’s beliefs satisfy Bayes’ rule. Because we
are restricting attention to pure strategies, Bayes’ rule reduces to something rather simple.
If the different risk types choose different policies in equilibrium, then on observing the
low- (high-) risk consumer’s policy, the insurance company infers that it faces the low-
(high-) risk consumer. This is condition 2.(b). If, however, the two risk types choose the
same policy in equilibrium, then on observing this policy, the insurance company’s beliefs
remain unchanged and equal to its prior belief. This is condition 2.(c).
The basic question is this: can the low-risk consumer distinguish himself from the
high-risk one here, and as a result achieve a more efficient outcome? It is not obvious
that the answer is yes. For note that there is no direct connection between a consumer’s
risk type and the policy he proposes. That is, the act of purchasing less insurance does
not decrease the probability that an accident will occur. In this sense, the signals used by
consumers – the policies they propose – are unproductive.
However, despite this, the low-risk consumer can still attempt to signal that he is low-
risk by demonstrating his willingness to accept a decrease in the benefit for a smaller com-
pensating premium reduction than would the high-risk consumer. Of course, for this kind
of (unproductive) signalling to be effective, the risk types must display different marginal
rates of substitution between benefit levels, B, and premiums, p. As we shall shortly
demonstrate, this crucial difference in marginal rates of substitution is indeed present.
denote the expected utility of the policy (B, p) for the low- and high-risk consumer,
respectively.
The following facts are easily established.
FACTS:
(a) ul (B, p) and uh (B, p) are continuous, differentiable, strictly concave in (B, p),
strictly increasing in B, and strictly decreasing in p,
(b) MRSl (B, p) is greater than, equal to or less than π as B is less than, equal to, or
greater than L. MRSh (B, p) is greater than, equal to, or less than π̄ as B is less
than, equal to, or greater than L.
(c) MRSl (B, p) < MRSh (B, p) for all (B, p).
The last of these is often referred to as the single-crossing property. As its name
suggests, it implies that indifference curves for the two consumer types intersect at most
INFORMATION ECONOMICS 389
p⬘
pl⬙
Direction of increasing
ph⬙ utility
B
0 B ⬙ B⬘
once. Equally important, it shows that the different risk types display different marginal
rates of substitution when faced with the same policy.
Fig. 8.2 illustrates facts (a) and (c). In accordance with fact (c), the steep indifference
curves belong to the high-risk consumer and the flatter ones to the low-risk consumer.
The difference in their marginal rates of substitution indicates that beginning from a given
policy (B , p ), the low-risk consumer is willing to accept a decrease in the benefit to B
for a smaller compensating premium reduction than would the high-risk consumer. Here,
reducing the benefit is less costly to the low-risk consumer because he is less likely to
have an accident.
The insurance company maximises expected profits. Now, in case it knows that the
consumer is low-risk, it will accept any policy (B, p) satisfying p > πB, because such a
policy yields positive profits. Similarly, it will reject the policy if p < πB. It is indifferent
between accepting and rejecting the policy if p = πB. If the insurance company knows the
consumer is high-risk, then it accepts the policy (B, p) if p > π̄B and rejects it if p < π̄B.
Fig. 8.3 illustrates the two zero-profit lines for the insurance company. The line p =
πB contains those policies (B, p) yielding zero expected profits for the insurance company
when the consumer is known to be low-risk. The line p = π̄B contains those policies
yielding zero expected profits when the consumer is known to be high-risk. These two
lines will play an important role in our analysis. Note that the low-risk zero profit line has
slope π, and the high-risk zero profit line has slope π̄.
Now is a good time to think back to the competitive equilibrium for the case in
which the insurance company can identify the risk types. There we showed that in the
unique competitive equilibrium the price of full insurance, where B = L, is equal to πL
for the low-risk consumer, and π̄L for the high-risk consumer. This outcome is depicted
in Fig. 8.4. The insurance company earns zero profits on each consumer, each consumer
purchases full insurance, and, by fact (b) above, each consumer’s indifference curve is
tangent to the insurance company’s respective zero-profit line.
Returning to the game at hand, we begin characterising its sequential equilibria by
providing lower bounds on each of the consumers’ expected utilities, conditional on having
been chosen by Nature. Note that the most pessimistic belief the insurance company might
390 CHAPTER 8
B
0
B
0 L
have is that it faces the high-risk consumer. Consequently, both consumer-types’ utilities
ought to be bounded below by the maximum utility they could obtain when the insurance
company believes them to be the high-risk consumer. This is the content of the next lemma.
LEMMA 8.1 Let (ψl , ψh , σ (·), β(·)) be a sequential equilibrium, and let u∗l and u∗h denote the equilib-
rium utility of the low- and high-risk consumer, respectively, given that he has been chosen
by Nature. Then
where ũl ≡ max(B,p) ul (B, p) s.t. p = π̄B ≤ w, and uch ≡ uh (L, π̄L) denotes the high-risk
consumer’s utility in the competitive equilibrium with full information.
Proof: Consider a policy (B, p) lying above the high-risk zero-profit line, so that p > π̄B.
We wish to argue that in equilibrium, the insurance company must accept this policy.
INFORMATION ECONOMICS 391
To see this, note that by accepting it, the company’s expected profits given its beliefs
β(B, p) are
Consequently, accepting is strictly better than rejecting the policy because rejecting results
in zero profits. We conclude that all policies (B, p) above the high-risk zero-profit line are
accepted by the insurance company.
Thus, for any policy satisfying π̄B < p ≤ w, the low-risk consumer, by proposing it,
can guarantee utility ul (B, p), and the high-risk consumer can guarantee utility uh (B, p).
Therefore, because each risk type maximises expected utility in equilibrium, the following
inequalities must hold for all policies satisfying π̄B < p ≤ w:
Continuity of ul and uh implies that (P.1) and (P.2) must in fact hold for all policies
satisfying the weak inequality π̄B ≤ p ≤ w. Thus, (P.1) and (P.2) may be rewritten as
But (P.3) is equivalent to (1) because utility is decreasing in p, and (P.4) is equivalent
to (2) because, among all no better than fair insurance policies, the full insurance one
uniquely maximises the high-risk consumer’s utility.
Fig. 8.5 illustrates Lemma 8.1. A consequence of the lemma that is evident from
the figure is that the high-risk consumer must purchase insurance in equilibrium. This is
because without insurance his utility would be uh (0, 0) which, by strict risk aversion, is
strictly less than uch , a lower bound on his equilibrium utility.
The same cannot be said for the low-risk consumer even though it appears so
from Fig. 8.5. We have drawn Fig. 8.5 for the case in which MRSl (0, 0) > π̄, so that
ul (0, 0) < ũl . However, in the equally plausible case in which MRSl (0, 0) < π̄ we have
ul (0, 0) ≥ ũl . In this latter case, the low-risk consumer may choose not to purchase insur-
ance in equilibrium (by making a proposal that is rejected) without violating the conclusion
of Lemma 8.1.
The preceding lemma applies to every sequential equilibrium. We now separate the
set of equilibria into two kinds: separating and pooling.
An equilibrium is a separating equilibrium if the different types of consumers pro-
pose different policies. In this way, the consumers separate themselves from one another
and can be identified by the insurance company by virtue of the chosen policy. In contrast,
an equilibrium is a pooling equilibrium if both consumer types propose the same policy.
Consequently, the consumer types cannot be identified by observing the policy they
propose. In summary, we have the following definition.
With only two possible types of consumers, a pure strategy sequential equilibrium is
either separating or pooling. Thus, it is enough for us to characterise the sets of separating
and pooling equilibria. We begin with the former.
Separating Equilibria
In a separating equilibrium, the two risk types will propose different policies if chosen by
Nature, and on the basis of this the insurance company will be able to identify them. Of
course, each risk type therefore could feign the identity of the other simply by behaving
as the other would according to the equilibrium.7 The key conceptual point to grasp, then,
is that in a separating equilibrium, it must not be in the interest of either type to mimic the
behaviour of the other. Based on this idea, we can characterise the policies proposed and
accepted in a separating pure strategy sequential equilibrium as follows.
1. ψl = ψh = (L, π̄ L).
2. pl ≥ πBl .
7 There are other ways to feign the identity of the other type. For example, the low-risk type might choose
a proposal that neither type is supposed to choose in equilibrium, but one that would nonetheless induce the
insurance company to believe that it faced the high-risk consumer.
INFORMATION ECONOMICS 393
Proof: Suppose first that ψl = (Bl , pl ) and ψh = (L, π̄L) satisfy (1) to (4). We must con-
struct a strategy σ (·) and beliefs β(·) for the insurance company so that the assessment
(ψl , ψh , σ (·), β(·)) is a sequential equilibrium. It then will be clearly separating. The
following specifications will suffice:
1, if (B, p) = ψl ,
β(B, p) =
0, if (B, p) = ψl .
A, if (B, p) = ψl , or p ≥ π̄B,
σ (B, p) =
R, otherwise.
According to the beliefs β(·), any policy proposed other than ψl induces the insur-
ance company to believe that it faces the high-risk consumer with probability one. On the
other hand, when the policy ψl is proposed, the insurance company is sure that it faces the
low-risk consumer. Consequently, the insurance company’s beliefs satisfy Bayes’ rule.
In addition, given these beliefs, the insurance company’s strategy maximises its
expected profits because, according to that strategy, the company accepts a policy if and
only if it results in non-negative expected profits.
For example, the proposal ψl = (Bl , pl ) is accepted because, once proposed, it
induces the insurance company to believe with probability one that it faces the low-
risk consumer. Consequently, the insurance company’s expected profits from accepting
the policy are pl − πBl , which, according to (2), is non-negative. Similarly, the proposal
ψh = (L, π̄L) is accepted because it induces the insurance company to believe with proba-
bility one that it faces the high-risk consumer. In that case, expected profits from accepting
the policy are π̄L − π̄L = 0.
All other policy proposals (B, p) induce the insurance company to believe with prob-
ability one that it faces the high-risk consumer. Its expected profits from accepting such
policies are then p − π̄B. Thus, these policies are also accepted precisely when they yield
non-negative expected profits given the insurance company’s beliefs.
We have shown that given any policy (p, B), the insurance company’s strategy max-
imises its expected profits given its beliefs. It remains to show that given the insurance
company’s strategy, both consumers are choosing policies that maximise their utility.
To complete this part of the proof, we show that no policy proposal yields the low-
risk consumer more utility than ψl nor the high-risk consumer more than ψh . Note that
because the insurance company accepts the policy (0, 0), and this policy is equivalent to
a rejection by the insurance company (regardless of which policy was rejected), both con-
sumers can maximise their utility by making a proposal that is accepted by the insurance
company. We therefore may restrict our attention to the set of such policies that we denote
by A; i.e.,
But (P.1) follows from (3), and (P.2) follows from (1), (3), (4), and because (L, π̄L) is best
for the high-risk consumer among all no better than fair policies.
We now consider the converse. So, suppose that (ψl , ψh , σ (·), β(·)) is a separating
equilibrium in which the equilibrium policies are accepted by the insurance company. We
must show that (1) to (4) hold. We take each in turn.
1. The definition of a separating equilibrium requires ψl = ψh . To see that ψh ≡
(Bh , ph ) = (L, π̄L), recall that Lemma 8.1 implies uh (ψh ) = uh (Bh , ph ) ≥ uh (L, π̄ L).
Now because the insurance company accepts this proposal, it must earn non-negative prof-
its. Hence, we must have ph ≥ π̄Bh because in a separating equilibrium, the insurance
company’s beliefs must place probability one on the high-risk consumer subsequent to the
high-risk consumer’s equilibrium proposal ψh . But as we have argued before, these two
inequalities imply that ψh = (L, π̄L) (see, for example, Fig. 8.4).
2. Subsequent to the low-risk consumer’s equilibrium proposal, (Bl , pl ), the insur-
ance company places probability one on the low-risk consumer by Bayes’ rule. Accepting
the proposal then would yield the insurance company expected profits pl − πBl . Because
the insurance company accepts this proposal by hypothesis, this quantity must be
non-negative.
3. This follows from (1) of Lemma 8.1.
4. According to the insurance company’s strategy, it accepts policy ψl . Because the
high-risk consumer’s equilibrium utility is uh (ψh ), we must have uh (ψh ) ≥ uh (ψl ).
Fig. 8.6 illustrates the policies that can arise in a separating equilibrium according
to Theorem 8.1. The high-risk consumer obtains policy ψhc ≡, (L, π̄L) and the low-risk
consumer obtains the policy ψl = (Bl , pl ), which must lie somewhere in the shaded region.
Note the essential features of the set of low-risk policies. Each is above the low-
risk zero-profit line to induce acceptance by the insurance company, above the high-risk
consumer’s indifference curve through his equilibrium policy to ensure that he has no
incentive to mimic the low-risk consumer, and below the indifference curve giving utility
ũl to the low-risk consumer to ensure that he has no incentive to deviate and be identified
as a high-risk consumer.
Theorem 8.1 restricts attention to those equilibria in which both consumers propose
acceptable policies. Owing to Lemma 8.1, this is a restriction only on the low-risk con-
sumer’s policy proposal. When MRSl (0, 0) ≤ π̄, there are separating equilibria in which
the low-risk consumer’s proposal is rejected in equilibrium. However, you are asked to
show in an exercise that each of these is payoff equivalent to some separating equilibrium
in which the low-risk consumer’s policy proposal is accepted. Finally, one can show that
the shaded region depicted in Fig. 8.6 is always non-empty, even when MRSl (0, 0) ≤ π̄.
This requires using the fact that MRSl (0, 0) > π. Consequently, a pure strategy separating
equilibrium always exists.
Now that we have characterised the policies that can arise in a separating equilib-
rium, we can assess the impact of allowing policy proposals to act as signals about risk.
Note that because separating equilibria always exist, allowing policy proposals to act as
signals about risk is always effective in the sense that it does indeed make it possible for
the low-risk type to distinguish himself from the high-risk type.
On the other hand, there need not be much improvement in terms of efficiency. For
example, when MRSl (0, 0) ≤ π̄, there is a separating equilibrium in which the low-risk
consumer receives the (null) policy (0, 0), and the high-risk consumer receives the policy
(L, π̄ L). That is, only the high-risk consumer is insured. Moreover, this remains an equi-
librium outcome regardless of the probability that the consumer is high-risk!8 Thus, the
presence of a bad apple – even with very low probability – can still spoil the outcome just
as in the competitive equilibrium under asymmetric information wherein signalling was
not possible.
Despite the existence of equilibria that are as inefficient as in the model without
signalling, when signalling is present, there are always equilibria in which the low-risk
consumer receives some insurance coverage. The one of these that is best for the low-risk
consumer and worst for the insurance company provides the low-risk consumer with the
policy labelled ψ̄l in Fig. 8.7.
Because the high-risk consumer obtains the same policy ψhc in every separating
equilibrium, and so receives the same utility, the equilibrium outcome (ψ̄l , ψhc ) is Pareto
efficient among separating equilibria and it yields zero profits for the insurance company.
This outcome is present in Fig. 8.7 regardless of the probability that the consumer is low-
risk. Thus, even when the only competitive equilibrium under asymmetric information
gives no insurance to the low-risk consumer (which occurs when α is sufficiently small),
the low-risk consumer can obtain insurance, and market efficiency can be improved when
signalling is possible.
We now turn our attention to the second category of equilibria.
8 Or, according to our second interpretation, regardless of the proportion of high-risk consumers in the population.
396 CHAPTER 8
p
p ⫽ B: High-risk
45⬚
zero-profit line
hc
p ⫽ B ⫹ a: Low-risk
l iso-profit line
l⬙ p ⫽ B: Low-risk
zero-profit line
l⬘
l
B
0 L
Pooling Equilibria
Recall that an equilibrium is a pooling one if the two types of consumers propose the
same policy. By doing so, the insurance company cannot distinguish between them.
Consequently, the low-risk consumer will be treated somewhat more like the high-risk
consumer and vice versa. It is fair to say that in such equilibria, the high-risk consumer is
mimicking the low-risk one.
To characterise the set of pooling equilibria, let us first consider the behaviour of the
insurance company. If both consumers propose the same policy in equilibrium, then the
insurance company learns nothing about the consumer’s accident probability on hearing
the proposal. Consequently, if the proposal is (B, p), then accepting it would yield the
insurance company expected profits equal to
π̂ = απ + (1 − α)π̄ .
Then the policy will be accepted if p > π̂B, rejected if p < π̂B, and the insurance company
will be indifferent between accepting and rejecting if p = π̂B.
INFORMATION ECONOMICS 397
p ⫽ B: Low-risk
zero-profit line
B
0
Owing to this, the set of policies (B, p) satisfying p = π̂B will play an important
part in the analysis of pooling equilibria. Fig. 8.8 depicts the set of such policies. They lie
on a ray through the origin called the pooling zero-profit line.
Now suppose that (B, p) is the pooling equilibrium proposal. According to
Lemma 8.1, we must have
Moreover, as the discussion following the lemma points out, this policy must be accepted
by the insurance company. Therefore, it must lie on or above the pooling zero-profit line,
so we must have
p ≥ π̂B. (8.6)
The policies satisfying the preceding three inequalities are depicted by the shaded
region in Fig. 8.9. We now demonstrate that these are precisely the policies that can arise
as pooling equilibrium outcomes.
Proof: The discussion preceding the statement of the theorem shows that (B , p ) must
satisfy (8.5) and (8.6) in order that ψ be the outcome of some pooling equilibrium. It
suffices therefore to prove the converse.
Suppose that ψ = (B , p ) satisfies (8.5) and (8.6). We must define beliefs β(·) and a
strategy σ (·) for the insurance company so that (ψ , ψ , σ (·), β(·)) constitutes a sequential
equilibrium.
398 CHAPTER 8
~
ul
Low-risk
zero-profit line
B
0
Thus, just as in the proof of Theorem 8.1, the insurance company considers any devi-
ation from the equilibrium proposal to have come from the high risk type. Consequently,
it is profit-maximising to accept a proposal (B, p) = ψ only if p ≥ π̄B, as σ (·) specifies.
On the other hand, when the equilibrium policy, ψ , is proposed, Bayes’ rule requires
the insurance company’s beliefs to be unchanged because this proposal is made by both
risk types. Because β(ψ ) = α, the beliefs do indeed satisfy Bayes’ rule. And given these
beliefs, it is profit-maximising to accept the policy ψ , because by (8.6), it yields non-
negative expected profits.
Thus, the insurance company’s beliefs satisfy Bayes’ rule, and given these beliefs,
it is maximising expected profits subsequent to each policy proposal of the consumer.
It remains to show that the two consumer types are maximising their utility given the
insurance company’s strategy.
By proposing ψ , the consumer (high- or low-risk) obtains the policy ψ . By deviat-
ing to (B, p) = ψ , the consumer obtains the policy (0, 0) if the insurance company rejects
the proposal (i.e., if p < π̄B), and obtains the policy (B, p) if it is accepted (i.e., if p ≥ π̄B).
Thus, proposing ψ is optimal for risk type i = l, h if
But these inequalities follow from (8.5) (see Fig. 8.9). Therefore, (ψ , ψ , σ (·), β(·))
is a sequential equilibrium.
As Fig. 8.9 shows, there are potentially many pooling equilibria. It is instructive to
consider how the set of pooling equilibria is affected by changes in the probability, α, that
the consumer is low-risk.
As α falls, the shaded area in Fig. 8.9 shrinks because the slope of the pooling
zero-profit line increases, while everything else in the figure remains fixed. Eventually, the
shaded area disappears altogether. Thus, if the probability that the consumer is high-risk is
sufficiently high, there are no pooling equilibria.
As α increases, the shaded region in Fig. 8.9 expands because the slope of the pool-
ing zero-profit line decreases. Fig. 8.10 shows that when α is large enough, there are
pooling equilibria that make both consumer types better off than they would be in every
separating equilibrium – even the low-risk consumer. This is not so surprising for the high-
risk consumer. The reason this is possible for the low-risk consumer is that it is costly for
him to separate himself from the high-risk consumer.
Effective separation requires the low-risk consumer to choose a policy that the high-
risk consumer does not prefer to ψhc . This restricts the low-risk consumer’s choice and
certainly reduces his utility below that which he could obtain in the absence of the high-
risk consumer. When α is sufficiently high, and the equilibrium is a pooling one, it is very
much like the high-risk consumer is not present. The cost to the low-risk consumer of
pooling is then simply a slightly inflated marginal cost per unit of benefit (i.e., π̂), over
and above that which he would pay if his risk type were known (i.e., π). This cost vanishes
p
High-risk
45⬚ zero-profit line
u hc
Pooling
~ zero-profit line
ul
hc
⬙
⬘
Low-risk
zero-profit line
l
B
0
as α tends to one. On the other hand, the cost of separating himself from the high-risk
consumer is bounded away from zero.
The reader may have noticed that in the proofs of Theorems 8.1 and 8.2, there was a
common, and not so appealing, component. In each case, when constructing an equilibrium
assessment, the beliefs assigned to the insurance company were rather extreme.
Recall that in both proofs, the insurance company’s beliefs were constructed so that
every deviation from equilibrium was interpreted as having been proposed by the high-
risk consumer. Although there is nothing formally incorrect about this, it is perhaps worth
considering whether or not such beliefs are reasonable.
Let us be clear before proceeding further. The beliefs constructed in proofs of
Theorems 8.1 and 8.2 are perfectly in line with our definition of a sequential equilibrium
for the insurance signalling game. What we are about to discuss is whether or not we wish
to place additional restrictions on the insurance company’s beliefs.
A Refinement
Are the beliefs assigned to the insurance company in the proofs of Theorems 8.1 and 8.2
reasonable? To see that they might not be, consider a typical pooling equilibrium policy,
ψ , depicted in Fig. 8.11.
According to the equilibrium constructed in the proof of Theorem 8.2, were the
consumer to propose instead the policy ψ , the insurance company would believe that
the consumer had a high accident probability and would reject the proposal. But do such
beliefs make sense in light of the equilibrium ψ ? Note that by proposing the equilibrium
policy ψ , the low-risk consumer obtains utility u∗l and the high-risk consumer obtains
45⬚ High-risk
zero-profit line
u h* Pooling
zero-profit line
u *l
⬘
⬙ Low-risk
zero-profit line
B
0
utility u∗h . Moreover, u∗l < ul (ψ ), and uh (ψ ) < u∗h . Therefore, whether the insurance
company accepts or rejects the proposal ψ , the high-risk consumer would be worse off
making this proposal than making the equilibrium proposal ψ . On the other hand, were
the insurance company to accept the proposal ψ , the low-risk consumer would be better
off having made that proposal than having made the equilibrium proposal ψ . Simply put,
only the low-risk consumer has any incentive at all in making the proposal ψ , given that
ψ is the equilibrium proposal.
With this in mind, it seems unreasonable for the insurance company to believe, after
seeing the proposal ψ , that it faces the high-risk consumer. Indeed, it is much more rea-
sonable to insist that it instead believes it faces the low-risk consumer. Accordingly, we
shall add the following restriction to the insurance company’s beliefs. It applies to all
sequential equilibria, not just pooling ones.
Proof: We first argue that there are no pooling equilibria satisfying the intuitive crite-
rion. Actually, we have almost already done this in our discussion of Fig. 8.11 preceding
Definition 8.3. There we argued that if ψ were a pooling equilibrium outcome, then there
would be a policy ψ that is preferred only by the low-risk type, which, in addition, lies
strictly above the low-risk zero-profit line (see Fig. 8.11). Consequently, if the low-risk
type makes this proposal and the intuitive criterion is satisfied, the insurance company
must believe that it faces the low-risk consumer. Because ψ lies strictly above the low-
risk zero-profit line, the insurance company must accept it (by sequential rationality). But
this means that the low-risk consumer can improve his payoff by deviating from ψ to
ψ . This contradiction establishes the claim: there are no pooling equilibria satisfying the
intuitive criterion.
402 CHAPTER 8
uhc
hc
~
ul
Low-risk
zero-profit line
l
l
B
0
Suppose now that (ψl , ψh , σ (·), β(·)) is a separating equilibrium satisfying the intu-
itive criterion. Then, according to Lemma 8.1, the high-risk consumer’s proposal must be
accepted by the insurance company and his equilibrium utility, u∗h , must be at least uch (see
Fig. 8.12).
Next, suppose by way of contradiction, that the low-risk consumer’s equilibrium
utility, u∗l , satisfies u∗l < ul (ψ̄ l ). Let ψ̄ l = (B̄l , p̄l ) and consider the proposal ψlε ≡, (B̄l −
ε, p̄l + ε) for ε positive and small. Then due to the continuity of ul (·), the following
inequalities hold for ε small enough. (See Fig. 8.12.)
u∗h ≥ uch > uh ψlε ,
ul ψlε > u∗l ,
p̄l + ε > π (B̄l − ε).
The first two together with the intuitive criterion imply that on seeing the pro-
posal ψlε , the insurance company believes that it faces the low-risk consumer. The third
inequality together with the sequential rationality property of the assessment imply that
the insurance company must accept the proposal ψlε because it earns positive expected
profits.
Hence, the low-risk consumer can achieve utility ul (ψlε ) > u∗l by proposing ψlε . But
then u∗l cannot be the low-risk consumer’s equilibrium utility. This contradiction estab-
lishes that the low-risk consumer’s equilibrium utility must be at least ul (ψ̄ l ). Thus, we
have shown that the equilibrium utilities of the two consumer types must satisfy
Now, these inequalities imply that the proposals made by both consumer types are
accepted by the insurance company. Consequently, the hypotheses of Theorem 8.1 are
satisfied. But according to Theorem 8.1, these two inequalities can hold in a sequential
equilibrium only if (see Fig. 8.7)
ψl = ψ̄ l , and
ψh = ψhc .
This is the set of policies that only the low-risk type prefers to his equilibrium policy. We
now define σ (·) and β(·) as follows.
1, if (B, p) ∈ A ∪ {ψl }
β(B, p) =
0, if (B, p) ∈
/ A ∪ {ψl }.
A, if (B, p) = ψl , or p ≥ π̄ B,
σ (B, p) =
R, otherwise.
Figure 8.13. An p
equilibrium satisfying
the intuitive criterion. 45⬚ High-risk
zero-profit line
uhc
hc
ul Low-risk
zero-profit line
l
A
B
0
404 CHAPTER 8
It is straightforward to check that by constriction, the beliefs satisfy the intuitive cri-
terion. In addition, one can virtually mimic the relevant portion of the proof of Theorem 8.1
to conclude that the assessment (ψ̄ l , ψhc , σ (·), β(·)) constitutes a separating equilibrium.
8.1.3 SCREENING
When most consumers purchase motor insurance, they do not present the insurance com-
pany with a policy and await a reply, as in the model of the last section. Rather, the
insurance company typically offers the consumer a menu of policies from which to choose,
and the consumer simply makes a choice. By offering consumers a menu of policies, insur-
ance companies are able to (implicitly) screen consumers by tailoring the offered policies
so that high-risk types are induced to choose one particular policy, and low-risk types are
induced to choose another. We now analyse such a model.
Again, we shall formulate the situation as an extensive form game. Although it was
possible to illustrate the essential features of signalling using just a single insurance com-
pany, there are nuances of screening that require two insurance companies to reveal. Thus,
we shall add an additional insurance company to the model.9
As before, there will be two consumers, low- and high-risk, occurring with prob-
ability α and 1 − α, respectively. And again, one can interpret this as there being many
consumers, a fraction α of which is low-risk.
So consider the following ‘insurance screening game’ involving two insurance
companies and two consumers. Fig. 8.14 depicts its extensive form.
• The two insurance companies move first by simultaneously choosing a finite list
(menu) of policies.
• Nature moves second and determines which consumer the insurance companies
face. The low-risk consumer is chosen with probability α, and the high-risk
consumer with probability 1 − α.
• The chosen consumer moves last by choosing a single policy from one of the
insurance companies’ lists.
9 We could also have included two insurance companies in the signalling model. This would not have changed
the results there in any significant way.
INFORMATION ECONOMICS 405
~ ~
⌿B ⌿B ⌿B ⌿B
Nature
…
Low risk High risk
(␣) (1 – ␣)
High-risk consumer Low-risk consumer
⬘ ⬘
Now, because there are only two possible types of consumers, we may restrict the
insurance companies to lists with at most two policies. Thus, a pure strategy for insurance
j j j j
company j = A, B is a pair of policies j = (ψl , ψh ). We interpret ψl (resp. ψh ) as the
policy that insurance company j includes in its list for the low- (resp., high-) risk con-
sumer. However, keep in mind that the low- (resp., high-) risk consumer need not choose
this policy because the insurance company cannot identify the consumer’s risk type. The
consumer will choose the policy yielding him the highest utility among those offered by
the two insurance companies.
A pure strategy for consumer i = l, h is a choice function ci (·) specifying for each
pair of policy pairs, ( A , B ), an insurance company and one of its policies or the null
policy. Thus, we always give the consumers the option of choosing the null policy from
either insurance company even if this policy is not formally on either company’s list. This
is simply a convenient way to allow consumers the ability not to purchase insurance. Thus,
j j
ci ( A , B ) = (j, ψ), where j = A or B, and where ψ = ψl , ψh , or (0, 0).
As is evident from Fig. 8.14, the only non-singleton information set belongs to insur-
ance company B. However, note that no matter what strategies the players employ, this
information set must be reached. Consequently, it is enough to consider the subgame per-
fect equilibria of this game. You are asked to show in an exercise that were the game finite
(so that the sequential equilibrium definition can be applied), its set of sequential equilib-
rium outcomes would be identical to its set of subgame perfect equilibrium outcomes.
Again, we can split the set of pure strategy subgame perfect equilibria into two kinds:
separating and pooling. In a separating equilibrium, the two consumer types make different
policy choices, whereas in a pooling equilibrium, they do not.
Note then that in a pooling equilibrium, although the two types of consumers must
choose to purchase the same policy, they need not purchase it from the same insurance
company.
LEMMA 8.2 Both insurance companies earn zero expected profits in every pure strategy subgame
perfect equilibrium.
Proof: The proof of this result is analogous to that in the model of Bertrand competition
from Chapter 4.
First, note that in equilibrium, each insurance company must earn non-negative prof-
its because each can guarantee zero profits by offering a pair of null policies in which
B = p = 0. Thus, it suffices to show that neither insurance company earns strictly positive
expected profits.
Suppose by way of contradiction that company A earns strictly positive expected
profits and that company B’s profits are no higher than A’s. Let ψl∗ = (B∗l , p∗l ) and ψh∗ =
(B∗h , p∗h ) denote the policies chosen by the low- and high-risk consumers, respectively, in
equilibrium. We then can write the total expected profits of the two firms as
property implies that at least one of the consumers strictly prefers his own choice to the
other’s; i.e., either
ul (ψl∗ ) > ul (ψh∗ ), or (P.1)
uh (ψh∗ ) > uh (ψl∗ ). (P.2)
Suppose then that (P.1) holds. Consider the deviation for company B in which it
offers the pair of policies ψlε = (B∗l + ε, p∗l ) and ψhε = (B∗h + β, p∗h ), where ε, β > 0.
Clearly, each consumer i = l, h strictly prefers policy ψiε to ψi∗ . In addition, we
claim that ε and β > 0 can be chosen arbitrarily small so that
β
ul ψlε > ul ψh , and (P.3)
β ε
uh ψh > u h ψl . (P.4)
To see this, note that by (P.1), (P.3) will hold as long as ε and β are small enough. Inequality
(P.4) then can be assured by fixing β and choosing ε small enough, because for β > 0 and
fixed, we have
β
uh ψh > uh (ψh∗ ) ≥ uh (ψl∗ ) = lim uh ψlε ,
ε→0
where the weak inequality follows because, in equilibrium, the high-risk consumer cannot
prefer any other policy choice to his own. See Fig. 8.15.
But (P.3) and (P.4) imply that subsequent to B’s deviation, the low-risk consumer
β
will choose the policy ψlε , and the high-risk consumer will choose the policy ψh . For ε
and β small enough, this will yield company B expected profits arbitrarily close to and
therefore strictly above B’s equilibrium expected profits. But this is again a contradiction.
Pooling Equilibria
One might suspect that the set of pooling equilibria would be whittled down by the cream-
skimming phenomenon. Indeed, the setting seems just right for cream skimming when
both consumer types are treated the same way. This intuition turns out to be correct with a
vengeance. Indeed, cream skimming eliminates the possibility of any pooling equilibrium
at all.
THEOREM 8.4 Non-existence of Pooling Equilibria
There are no pure strategy pooling equilibria in the insurance screening game.
Consider first the case in which B∗ > 0. Then (P.1) implies that
Consequently, p∗ > 0 as well, so that ψ ∗ does not lie on either axis as shown in Fig. 8.16.
By the single-crossing property, there is a region, R (see Fig. 8.16), such that ψ ∗ is the
limit of policies in R. Let ψ be a policy in R very close to ψ ∗ .
Suppose now that insurance company A is offering policy ψ ∗ in equilibrium. If insur-
ance company B offers policy ψ , and only ψ , then the high-risk consumer will choose
uh*
* u *l
B
0
INFORMATION ECONOMICS 409
policy ψ ∗ (or one he is indifferent to) from the first insurance company, whereas the low-
risk consumer will purchase ψ from insurance company B. If ψ is close enough to ψ ∗ ,
then by (P.2), insurance company B will earn strictly positive profits from this cream-
skimming deviation, and so must be earning strictly positive profits in equilibrium. But
this contradicts Lemma 8.2.
Consider now the case in which B∗ = 0. By (P.1), this implies that p∗ = 0 as well.
Thus, ψ ∗ is the null policy, as in Fig. 8.17. But either company now can earn positive
profits by offering the single policy (L, π̄L + ε) where ε > 0 is sufficiently small. It earns
strictly positive profits because it earns strictly positive profits on both consumer types (it
is above both the high- and low-risk zero-profit lines), and the high-risk consumer certainly
will choose this policy over the null policy. This final contradiction completes the proof.
Note the importance of cream skimming to the preceding result. This is a typi-
cal feature of competitive screening models wherein multiple agents on one side of a
market compete to attract a common pool of agents on the other side of the market by
simultaneously offering a menu of ‘contracts’ from which the pool of agents may choose.
Separating Equilibria
The competitive nature of our screening model also has an important impact on the set of
separating equilibria, as we now demonstrate.
Note then that the only possible separating equilibrium in the insurance screen-
ing model coincides with the best separating equilibrium for consumers in the insurance
c
uh
L
*
B
0 L
410 CHAPTER 8
ul
Low-risk
zero-profit line
l
B
0 L
signalling game from section 8.1.1. By Theorem 8.4, this will be the only possible
equilibrium in the game.
Proof: The proof proceeds in series of claims.
Claim 1. The high-risk consumer must obtain at least utility uch . (See Fig. 8.18.)
By Lemma 8.2, both insurance companies must earn zero profits. Consequently, it
cannot be the case that the high-risk consumer strictly prefers the policy (L, π̄L + ε) to ψh∗ .
Otherwise, one of the insurance companies could offer just this policy and earn positive
profits. (Note that this policy earns positive profits on both consumers.) But this means that
The result follows by taking the limit of the right-hand side as ε → 0, because uh (·) is
continuous and ψhc = (L, π̄ L).
Claim 2. ψl∗ must lie on the low-risk zero-profit line.
Note that by Claim 1, ψh∗ must lie on or below the high-risk zero-profit line. Thus,
non-positive profits are earned on the high-risk consumer. Because by Lemma 8.2 the
insurance companies’ aggregate profits are zero, this implies that ψl∗ lies on or above the
low-risk zero-profit line.
So, suppose by way of contradiction that ψl∗ = (B∗l , p∗l ) lies above the low-risk zero-
profit line. Then p∗l > 0. But this means that B∗l > 0 as well because the low-risk consumer
would otherwise choose the null policy (which is always available). Thus, ψl∗ is strictly
above the low-risk zero-profit line and not on the vertical axis as shown in Fig. 8.19.
Consequently, region R in Fig. 8.19 is present. Now if the insurance company which
is not selling a policy to the high-risk consumer offers policies only strictly within region R,
then only the low-risk consumer will purchase a policy from this insurance company. This
is because such a policy is strictly preferred to ψl∗ by the low-risk consumer and strictly
worse than ψl∗ (which itself is no better than ψh∗ ) for the high-risk consumer. This deviation
INFORMATION ECONOMICS 411
Figure 8.19. A p
cream-skimming region.
uh(*l )
Low-risk
ul ( *l ) zero-profit line
*l
B
0
45⬚
u hc
hc
Low-risk
zero-profit line
ul ⬙
R
⬙ ⬘
l
B
0
would then result in strictly positive profits for this insurance company because all such
policies are above the low-risk zero-profit line. The desired conclusion follows from this
contradiction.
Claim 3. ψh∗ = ψhc .
By Claim 2, and Lemma 8.2, ψh∗ must lie on the high-risk, zero-profit line. But by
Claim 1, uh (ψh∗ ) ≥ uh (ψhc ). Together, these imply that ψh∗ = ψhc (see Fig. 8.18).
Claim 4. ψl∗ = ψ̄ l .
Consult Fig. 8.20. By Claim 2, it suffices to show that ψl∗ cannot lie on the low-risk
zero-profit line strictly below ψ̄ l (such as ψ ) or strictly above ψ̄ l (such as ψ ).
So, suppose first that ψl∗ = ψ . The high-risk consumer would then strictly prefer
ψ to ψhc and thus would not choose ψhc contrary to Claim 3.
Next, suppose that ψl∗ = ψ . Then the low-risk consumer obtains utility ul in
equilibrium (see Fig. 8.20). Moreover, region R is then present. Consider the insurance
412 CHAPTER 8
company that does not sell ψhc to the high-risk consumer. Let this insurance company offer
any policy strictly within region R. This policy will be purchased only by the low-risk
consumer and will earn strictly positive profits. This contradiction proves Claim 4 and
completes the proof.
Note that Theorem 8.5 does not claim that a separating screening equilibrium exists.
Together with Theorem 8.4, it says only that if a pure strategy subgame perfect equilibrium
exists, it must be separating and the policies chosen by the consumers are unique.
Cream skimming is a powerful device in this screening model for eliminating equi-
libria. But it can be too powerful. Indeed, there are cases in which no pure strategy subgame
perfect equilibrium exists at all.
Consider Fig. 8.21. Depicted there is a case in which no pure strategy equilibrium
exists. To see this, it is enough to show that it is not an equilibrium for the low- and high-
risk consumers to obtain the policies ψ̄ l and ψhc as described in Theorem 8.5. But this is
indeed the case, because either insurance company can deviate by offering only the policy
ψ , which will be purchased by both consumer types (because it is strictly preferred by
them to their equilibrium policies). Consequently, this company will earn strictly positive
expected profits because ψ is strictly above the pooling zero-profit line (which is the
appropriate zero-profit line to consider because both consumer types will purchase ψ ).
But this contradicts Lemma 8.2.
Thus, when α is close enough to one, so that the pooling zero-profit line intersects the
ūl indifference curve (see Fig. 8.21), the screening model admits no pure strategy subgame
p
High-risk
zero-profit line
45⬚
c
uh
hc
Pooling
ul zero-profit line
⬘ Low-risk
zero-profit line
l
B
0
Figure 8.21. No equilibrium exists. If the best policies available for the
low- and high-risk consumers are ψ̄ l and ψhc , respectively, then offering
the policy ψ will attract both consumer types and earn positive profits
because it lies above the pooling zero-profit line. No pure strategy
subgame perfect equilibrium exists in this case.
INFORMATION ECONOMICS 413
perfect equilibrium.10 One can show that there always exists a subgame perfect equilib-
rium in behavioural strategies, but we shall not pursue this. We are content to note that
non-existence in this model arises only when the extent of the asymmetry of information
is relatively minor, and in particular when the presence of high-risk consumers is small.
We next consider an issue that we have so far ignored. What is the effect of the
availability of insurance on the driving behaviour of the consumer?
10 Even when the pooling zero-profit line does not intersect the ūl indifference curve, an equilibrium is not guar-
anteed to exist. There may still be a pair of policies such that one attracts the low-risk consumers making positive
profits, and the other attracts the high-risk consumers (keeping them away from the first policy) making negative
profits, so that overall expected profits are strictly positive.
414 CHAPTER 8
The monotone likelihood ratio property says that conditional on observing the
accident loss, l, the relative probability that low effort was expended versus high effort
increases with l. Thus, one would be more willing to bet that the consumer exerted low
effort when the observed accident loss is higher.
As in our previous models, the consumer has a strictly increasing, strictly concave,
von Neumann-Morgenstern utility function, u(·), over wealth, and initial wealth equal to
w > L. In addition, d(e) denotes the consumer’s disutility of effort, e. Thus, for a given
effort level e, the consumer’s von Neumann-Morgenstern utility over wealth is u(·) − d(e),
where d(1) > d(0).11
We assume that the insurance company can observe the amount of loss, l, due to
an accident, but not the amount of accident avoidance effort, e. Consequently, the insur-
ance company can only tie the benefit amount to the amount of loss. Let Bl denote the
benefit paid by the insurance company to the consumer when the accident loss is l. Thus,
a policy is a tuple (p, B0 , B1 , . . . , BL ), where p denotes the price paid to the insurance
company in return for guaranteeing the consumer Bl dollars if an accident loss of l dollars
occurs.
The question of interest is this: what kind of policy will the insurance company offer
the consumer, and what are its efficiency properties?
L
max p− πl (e)Bl , subject to (8.7)
e,p,B0 ,...,BL
l=0
L
πl (e)u(w − p − l + Bl ) − d(e) ≥ ū,
l=0
11 Allof the analysis to follow generalises to the case in which utility takes the form u(w, e), where u(w, 0) >
u(w, 1) for all wealth levels w.
12 Because the consumer always can choose not to purchase insurance, ū must be at least as large as
maxe∈{0,1} Ll=0 πl (e)u(w − l) − d(e). However, ū may be strictly larger than this if, for example, there are other
insurance companies offering policies to the consumer as well.
INFORMATION ECONOMICS 415
L
L
L=p− πl (e)Bl − λ ū − πl (e)u(w − p − l + Bl ) + d(e) .
l=0 l=0
u (w − p − l + Bl ) = 1/λ, ∀ l ≥ 0.
13 Indeed, it was clear from the start that setting B = 0 was harmless because changes in B always can be offset
0 0
by corresponding changes in the price p and in the benefit levels B1 , . . . , BL without changing the consumer’s
utility or the insurance company’s profits.
416 CHAPTER 8
Bl = l, for all l = 0, 1, . . . , L.
Therefore, for either fixed effort level e ∈ {0, 1}, the symmetric information solution
provides full insurance to the consumer at every loss level. This is no surprise because
the consumer is strictly risk averse and the insurance company is risk neutral. It is simply
an example of efficient risk sharing. In addition, the price charged by the insurance com-
pany equates the consumer’s utility from the policy at the required effort level with his
reservation utility.
Now that we have determined for each effort level the optimal policy, it is straight-
forward to optimise over the effort level as well. Given e ∈ {0, 1}, the optimal benefit levels
are Bl = l for each l, so using (8.11) the optimal price p(e) is given implicitly by
L
p(e) − πl (e)l.
l=0
Note the trade-off between requiring high versus low effort. Because d(0) < d(1),
(8.12) implies that requiring lower effort allows the insurance company to charge a higher
price, increasing profits. On the other hand, requiring higher effort reduces the expected
loss due to an accident (by the monotone likelihood ratio property; see the exercises), and
so also increases expected profits. One must simply check which effort level is best for the
insurance company in any specific case.
What is important here is that regardless of which effort level is best for the firm, the
profit-maximising policy always involves full insurance. This is significant and it implies
that the outcome here is Pareto efficient. We have seen this sort of result before, so we
shall not give another proof of it.
L
max p− πl (e)Bl subject to (8.13)
e,p,B0 ,...,BL
l=0
L
πl (e)u(w − p − l + Bl ) − d(e) ≥ ū, and (8.14)
l=0
L
L
πl (e)u(w − p − l + Bl ) − d(e) ≥ πl (e )u(w − p − l + Bl ) − d(e ), (8.15)
l=0 l=0
Now, adding the incentive constraint to the problem cannot increase the insurance
company’s maximised profits. Therefore, if the solution to (8.16) satisfies the incentive
constraint, then it must be the desired optimal policy. But, clearly, the solution does indeed
satisfy (8.15). Given the policy in (8.16), the incentive constraint when e = 0 reduces to
d(0) ≥ d(1),
Consequently, inducing the consumer to exert low effort in a manner that maximises
profits requires the insurance company to offer the same policy as it would were effort
observable.
L
L
L = p− πl (1)Bl − λ ū − πl (1)u(w − p − l + Bl ) + d(1) (8.17)
l=0 l=0
L
L
−β πl (0)u(w − p − l + Bl ) − d(0) − πl (1)u(w − p − l + Bl ) − d(1) ,
l=0 l=0
where λ and β are the multipliers corresponding to constraints (8.14) and (8.15),
respectively.
The first-order conditions are
L
∂L
= 1−λ (πl (1) + β(πl (1) − πl (0)))u (w − p − l + Bl ) = 0, (8.18)
∂p
l=0
∂L
= −πl (1) + [λπl (1) + β(πl (1) − πl (0))]u (w − p − l + Bl ) = 0, ∀ l, (8.19)
∂Bl
∂L L
= ū − πl (1)u(w − p − l + Bl ) + d(1) ≤ 0, (8.20)
∂λ
l=0
∂L
L
= (πl (0) − πl (1))u(w − p − l + Bl ) − d(0) + d(1) ≤ 0, (8.21)
∂β
l=0
Suppose that β = 0. Then (8.22) would imply that the left-hand side is constant in l,
which implies that w − p + Bl − l is constant in l. But this cannot hold because then con-
dition (8.21) fails, as its left-hand side reduces to d(0) − d(1), which is strictly negative.
We conclude that β = 0.
To see that λ = 0, first note that the monotone
likelihood
ratio property implies that
there is an l such that πl (0) = πl (1). Because l πl (0) = l πl (1) = 1, there must exist
l and l such that πl (0) > πl (1), and πl (0) < πl (1). Consequently, the term in square
brackets in (8.22) takes on both positive and negative values.
Now, if λ = 0, then because β = 0, the right-hand side of (8.22) takes on both posi-
tive and negative values. However, the left-hand side is always strictly positive. Therefore,
λ = 0. Indeed, this argument shows that λ > 0.
The fact that both λ and β are non-zero implies that both constraints, (8.20) and
(8.21), are binding in the optimal solution. Thus, the consumer is held down to his
reservation utility, and he is just indifferent between choosing high and low effort.
To gain more insight into the optimal policy for e = 1, it is helpful to show that
β > 0. So suppose that β < 0. The monotone likelihood ratio property then implies that
the right-hand side of (8.22) is strictly increasing in l. Consequently u (w − p + Bl − l)
is strictly decreasing in l, so that Bl − l, and therefore u(w − p + Bl − l) are strictly
increasing
in l. But the latter together with the monotone likelihood ratio property imply
that l (πl (1) − πl (0))u(w − p + Bl − l) < 0 (see Exercise 8.13). This contradicts (8.21),
because d(0) < d(1). We conclude that β > 0.
Now because β > 0, the monotone likelihood ratio property implies that the right-
hand side of (8.22) is strictly decreasing, so that u (w − p + Bl − l) is strictly increasing.
Consequently, the optimal policy must display the following feature:
l − Bl is strictly increasing in l = 0, 1, . . . , L. (8.23)
Recall that we may set B0 = 0 without any loss of generality. Consequently, condi-
tion (8.23) indicates that the optimal high-effort policy does not provide full insurance –
rather, it specifies a deductible payment that increases with the size of the loss.
This is, of course, very intuitive. To give the consumer an incentive to choose high
effort, there must be something in it for the consumer. When l − Bl is strictly increasing,
there is a positive utility benefit to exerting high effort, namely,
L
(πl (1) − πl (0))u(w − p − l + Bl ) > 0.
l=0
That this sum is strictly positive follows from (8.23) and the monotone likelihood ratio
property (again, see Exercise 8.13). Of course, there is also a utility cost associated with
high effort, namely, d(1) − d(0) > 0. The optimal policy is crafted so that the utility
benefit of high effort just equals the utility cost.
The Optimal Policy and Efficiency
As we have seen, the policy that is best for the insurance company differs depending on
whether it wishes to induce the consumer to choose high or low accident avoidance effort.
420 CHAPTER 8
The overall optimal policy – the one that solves the maximisation problem (8.13) – is
simply the one of these two that yields the larger expected profits.
Now, suppose that in the symmetric information case, the optimal effort level
required of the consumer by the insurance company is low. Then precisely the same (full
insurance) policy will be optimal in the asymmetric information case. This follows because
this policy yields the same expected profits as in the symmetric information case, and the
maximum expected profits when e = 1 is no higher in the asymmetric information case
versus the symmetric information case because there is an additional constraint present
under asymmetric information. Consequently, because the symmetric information out-
come is Pareto efficient, so, too, will be the asymmetric information outcome in this case.
On the other hand, suppose that the optimal effort level required by the insurance
company of the consumer is high in the symmetric information case. It may well be that the
insurance company’s maximised expected profits are substantially lower when it attempts
to induce high effort in the asymmetric information case. Because expected profits condi-
tional on low effort are identical in both the symmetric and asymmetric information cases,
it may then be optimal for the insurance company in the asymmetric information setting
to induce low effort by offering the full insurance policy. Although this would be optimal
for the insurance company, it would not be Pareto efficient. For compared to the sym-
metric information solution, the consumer’s utility is unchanged (and equal to ū), but the
insurance company’s profits are strictly lower.
Thus, once again, the effects of asymmetric information can reveal themselves in
Pareto-inefficient outcomes.
applicable answers, but it is one where all the analyst’s creativity, insight, and logical
rigour can pay handsome dividends.
8.4 EXERCISES
8.1 Consider the insurance model of section 8.1, but treat each insurance company as if it were a risk-
neutral consumer with wealth endowment w̄ ≥ L in every state, where L is the size of the loss should
one of the m risk-averse consumers have an accident. Also assume that the number of risk-neutral
consumers exceeds the number of risk-averse ones. Show that the competitive equilibrium derived
in section 8.1 is a competitive equilibrium in this exchange economy.
8.2 Suppose that in the insurance model with asymmetric information, a consumer’s accident probability
is a function of his wealth. That is, π = f (w). Also suppose that different consumers have different
wealth levels, and that f > 0. Does adverse selection necessarily occur here?
8.3 In our insurance model of section 8.1, many consumers may have the same accident probability.
We allowed policy prices to be person specific. Show that, with symmetric information, equilibrium
policy prices depend only on probabilities, not on the particular individuals purchasing them.
8.4 Answer the following questions related to the insurance model with adverse selection.
(a) When there are finitely many consumers, F, the distribution of consumer accident probabilities
is a step function. Show that g :[0, π̄L] → [0, π̄L] then is also a step function and that it is
non-decreasing.
(b) Show that g must therefore possess a fixed point.
(c) More generally, show that a non-decreasing function mapping the unit interval into itself must
have a fixed point. (Note that the function need not be continuous! This is a special case of a
fixed-point theorem due to Tarski (1955)).
8.5 When analysing our insurance model with adverse selection, we claimed that when the distribution
of accident probabilities is uniform on [0, 1], there can be at most two competitive equilibrium
prices. You will prove this in this exercise. Suppose that f : [a, b] → [a, b] is continuous and
f > 0.
(a) Use the fundamental theorem of calculus to argue that if f (x∗ ) = x∗ and f (x∗ ) ≥ 1, then f (x) >
x for every x > x∗ .
(b) Using an argument analogous to that in (a), show that if f (x∗ ) = x∗ and f (x∗ ) ≤ 1, then f (x) > x
for every x < x∗ .
(c) Conclude from (a) and (b) that f has at most two fixed points.
(d) Conclude that there can be at most two competitive equilibrium prices in our insurance model
with adverse selection when the distribution of accident probabilities is uniform on [0, 1].
8.6 Suppose there are two states, 1 and 2. State 1 occurs with probability π , and wi denotes a consumer’s
wealth in state i.
(a) If the consumer is strictly risk-averse and w1 = w2 , show that an insurance company can provide
him with insurance rendering his wealth constant across the two states so that he is better off
and so that the insurance company earns positive expected profits.
(b) Suppose there are many consumers and many insurance companies and that a feasible allocation
is such that each consumer’s wealth is constant across states. Suppose also that in this allocation,
422 CHAPTER 8
some consumers are insuring others. Show that the same wealth levels for consumers and
expected profits for insurance companies can be achieved by a feasible allocation in which no
consumer insures any other.
8.7 (Akerlof) Consider the following market for used cars. There are many sellers of used cars. Each
seller has exactly one used car to sell and is characterised by the quality of the used car he wishes to
sell. Let θ ∈ [0, 1] index the quality of a used car and assume that θ is uniformly distributed on [0, 1].
If a seller of type θ sells his car (of quality θ ) for a price of p, his utility is us (p, θ ). If he does not sell
his car, then his utility is 0. Buyers of used cars receive utility θ − p if they buy a car of quality θ at
price p and receive utility 0 if they do not purchase a car. There is asymmetric information regarding
the quality of used cars. Sellers know the quality of the car they are selling, but buyers do not know
its quality. Assume that there are not enough cars to supply all potential buyers.
(a) Argue that in a competitive equilibrium under asymmetric information, we must have
E(θ | p) = p.
(b) Show that if us (p, θ) = p − θ/2, then every p ∈ (0, 1/2] is an equilibrium price.
√
(c) Find the equilibrium price when us (p, θ ) = p − θ . Describe the equilibrium in words. In
particular, which cars are traded in equilibrium?
(d) Find an equilibrium price when us (p, θ ) = p − θ 3 . How many equilibria are there in this case?
(e) Are any of the preceding outcomes Pareto efficient? Describe Pareto improvements whenever
possible.
8.8 Show that in the insurance signalling game, if the consumers have finitely many policies from which
to choose, then an assessment is consistent if and only if it satisfies Bayes’ rule. Conclude that a
sequential equilibrium is then simply an assessment that satisfies Bayes’ rule and is sequentially
rational.
8.9 Analyse the insurance signalling game when benefit B is restricted to being equal to L. Assume
that the low-risk consumer strictly prefers full insurance at the high-risk competitive price to no
insurance.
(a) Show that there is a unique sequential equilibrium when attention is restricted to those in which
the insurance company earns zero profits.
(b) Show that among all sequential equilibria, there are no separating equilibria. Is this intuitive?
(c) Show that there are pooling equilibria in which the insurance company earns positive profits.
8.10 Consider the insurance signalling game.
(a) Show that there are separating equilibria in which the low-risk consumer’s policy proposal is
rejected in equilibrium if and only if MRSl (0, 0) ≤ π̄ .
(b) Given a separating equilibrium in which the low-risk consumer’s policy proposal is rejected,
construct a separating equilibrium in which it is accepted without changing any player’s
equilibrium payoff.
(c) Continue to consider this setting with one insurance company and two types of consumers. Also,
assume low-risk consumers strictly prefer no insurance to full insurance at the high-risk com-
petitive price. Show that when α (the probability that the consumer is low-risk) is low enough,
the only competitive equilibrium under asymmetric information gives the low-risk consumer no
insurance and the high-risk consumer full insurance.
INFORMATION ECONOMICS 423
(d) Returning to the general insurance signalling game, show that every separating equilibrium
Pareto dominates the competitive equilibrium described in part (c).
8.11 Consider the insurance screening game. Suppose that the insurance companies had only finitely
many policies from which to construct their lists of policies. Show that a joint strategy is a subgame
perfect equilibrium if and only if there are beliefs that would render the resulting assessment a
sequential equilibrium.
8.12 Consider the insurance screening game.
(a) Suppose there is only one insurance company, not two. Provide a diagram showing the unique
pooling contract that is best for the low-risk consumer subject to non-negative expected profits
for the insurance company.
(b) Prove that the pooling contract from part (a) does not maximise the low-risk consumer’s
expected utility among all menus of pairs of contracts subject to earning non-negative profits for
the insurance company. Among those contracts, find the contract that maximises the low-risk
consumer’s expected utility.
(c) What contract maximises the insurance company’s expected profits?
8.13 Consider the moral hazard insurance model where the consumer has the option of exerting either
high or low accident avoidance effort (i.e., e = 0 or 1). Recall that πl (e) > 0 denotes the probability
that a loss of l dollars is incurred due to an accident. Show that if the monotone likelihood ratio
property holds so that πl (0)/πl (1) is strictly increasing in l, then Ll=0 πl (0)xl > Ll=0 πl (1)xl for
every increasing sequence of real numbers x1 < x2 < · · · < xL .
8.14 Consider the moral hazard insurance model.
(a) Show that when information is symmetric, the profit-maximising policy price is higher when
low effort is induced compared to high effort.
(b) Let the consumer’s reservation utility, ū, be the highest he can achieve by exerting the utility-
maximising effort level when no insurance is available. Suppose that when information is
asymmetric, it is impossible for the insurance company to earn non-negative profits by inducing
the consumer to exert high effort. Show then that if there were no insurance available at all, the
consumer would exert low effort.
8.15 Consider once again the moral hazard insurance model. Let the consumer’s von Neumann-
√
Morgenstern utility of wealth be u(w) = w, let his initial wealth be w0 = $100, and suppose
that there are but two loss levels, l = 0 and l = $51. As usual, there are two effort levels, e = 0
and e = 1. The consumer’s disutility of effort is given by the function d(e), where d(0) = 0 and
d(1) = 1/3. Finally, suppose that the loss probabilities are given by the following entries, where the
rows correspond to effort and the columns to loss levels.
l=0 l = 51
e=0 1/3 2/3
e=1 2/3 1/3
So, for example, the probability that a loss of $51 occurs when the consumer exerts high effort is 1/3.
(a) Verify that the probabilities given in the table satisfy the monotone likelihood ratio property.
424 CHAPTER 8
(b) Find the consumer’s reservation utility assuming that there is only one insurance company and
that the consumer’s only other option is to self-insure.
(c) What effort level will the consumer exert if no insurance is available?
(d) Show that if information is symmetric, then it is optimal for the insurance company to offer a
policy that induces high effort.
(e) Show that the policy in part (d) will not induce high effort if information is asymmetric.
(f) Find the optimal policy when information is asymmetric.
(g) Compare the insurance company’s profits in the symmetric and asymmetric information cases.
Also, compare the consumer’s utility in the two cases. Argue that the symmetric information
solution Pareto dominates that with asymmetric information.
8.16 Consider the following principal–agent problem. The owner of a firm (the principal) employs a
worker (the agent). The worker can exert low effort, e = 0, or high effort, e = 1. The resulting
revenue, r, to the owner is random, but is more likely to be high when the worker exerts high effort.
Specifically, if the worker exerts low effort, e = 0, then
0, with probability 2/3
r= .
4, with probability 1/3
0, with probability 1/3
r= .
4, with probability 2/3
√
The worker’s von Neumann-Morgenstern utility from wage w and effort e is u(w, e) = w − e.
The firm’s profits are π = r − w when revenues are r and the worker’s wage is w. A wage contract
(w0 , w4 ) specifies the wage, wr ≥ 0, that the worker will receive if revenues are r ∈ {0, 4}. When
working, the worker chooses effort to maximise expected utility and always has the option (his only
other option) of quitting his job and obtaining (w, e) = (0, 0).
Find the wage contract (w0 , w4 ) ∈ [0, ∞)2 that maximises the firm’s expected profits in each
of the situations below.
(a) The owner can observe the worker’s effort and so the contract can also be conditioned
on the effort level of the worker. How much effort does the worker exert in the expected
profit-maximising contract?
(b) The owner cannot observe the worker’s effort and so the contract cannot be conditioned on
effort. How much effort does the worker exert in the expected profit-maximising contract now?
8.17 A manager cannot observe the effort, e, of a worker, but can observe the output the worker pro-
duces. There are n effort levels available to the worker, e1 < · · · < en , and there are m output levels,
y1 < · · · < ym . Output depends stochastically on effort and p(y | e) is the probability that the output
level is y given that the worker exerts effort e. The worker’s von Neumann-Morgenstern utility of
receiving wage w when he exerts effort e is u(w, e), strictly increasing in w and strictly decreasing
in e. Note that the worker’s ‘wage’ here is his total compensation.
INFORMATION ECONOMICS 425
Assume that p(· | ·) satisfies the strict monotone likelihood ratio property, i.e., that for every
i = 1, 2, . . . , m,
p(yi+1 | e)
p(yi | e)
is strictly increasing in e.
(a) The manager wishes to offer the worker a wage contract so as to maximise his expected profits,
where the worker’s only other option is to stay at home and receive a wage of zero, and where the
price per unit of output is fixed at one dollar (wages are also in dollars). Formulate the manager’s
optimisation problem. (What can the worker’s wage depend upon?)
(b) Suppose that the optimal wage contract is such that the worker chooses effort level ei > e1 .
Prove that the wage contract must be somewhere strictly increasing in output (i.e., it must be
the case that w(yi ) < w(yj ) for some yi < yj ). You may find the result from Exercise 8.13 useful
here.
CHAPTER 9
AUCTIONS AND
MECHANISM DESIGN
In most real-world markets, sellers do not have perfect knowledge of market demand.
Instead, sellers typically have only statistical information about market demand. Only the
buyers themselves know precisely how much of the good they are willing to buy at a
particular price. In this chapter, we will revisit the monopoly problem under this more
typical circumstance.
Perhaps the simplest situation in which the above elements are present occurs when
a single object is put up for auction. There, the seller is typically unaware of the buyers’
values but may nevertheless have some information about the distribution of values across
buyers. In such a setting, there are a number of standard auction forms that the seller might
use to sell the good – first-price, second-price, Dutch, English. Do each of these standard
auctions raise the same revenue for the seller? If not, which is best? Is there a non-standard
yet even better selling mechanism for the seller? To answer these and other questions, we
will introduce and employ some of the tools from the theory of mechanism design.
Mechanism design is a general theory about how and when the design of appropri-
ate institutions can achieve particular goals. This theory is especially germane when the
designer requires information possessed only by others to achieve his goal. The subtlety
in designing a successful mechanism lies in ensuring that the mechanism gives those who
possess the needed information the incentive to reveal it to the designer. This chapter pro-
vides an introduction to the theory of mechanism design. We shall begin by considering
the problem of designing a revenue-maximising selling mechanism. We then move on to
the problem of efficient resource allocation. In both cases, the design problem will be sub-
ject to informational constraints – the agents possessing private information will have to
be incentivised to report their information truthfully.
1 We shall assume throughout and unless otherwise noted that in all auctions ties in bids are broken at random:
each tied bidder is equally likely to be deemed the winner.
428 CHAPTER 9
• First-Price, Sealed-Bid: Each bidder submits a sealed bid to the seller. The
highest bidder wins and pays his bid for the good.
• Second-Price, Sealed-Bid: Each bidder submits a sealed bid to the seller. The
highest bidder wins and pays the second-highest bid for the good.
• Dutch Auction: The seller begins with a very high price and begins to reduce it.
The first bidder to raise his hand wins the object at the current price.
• English Auction: The seller begins with very low price (perhaps zero) and begins
to increase it. Each bidder signals when he wishes to drop out of the auction. Once
a bidder has dropped out, he cannot resume bidding later. When only one bidder
remains, he is the winner and pays the current price.
Can we decide even among these four which is best for the seller? To get a handle
on this problem, we must begin with a model.
2 This amounts to assuming that the object has already been produced and that the seller’s use value for it is zero.
3 Recall that F (v ) denotes the probability that i’s value is less than or equal to v , and that f (v ) = F (v ). The
i i v i i i i i
latter relation can be equivalently expressed as Fi (vi ) = 0 i fi (x)dx. Consequently, we will sometimes refer to fi
and sometimes refer to Fi since each one determines the other.
4 Although such an outcome is not possible in any one of the four auctions above, there are other auctions (i.e.,
all-pay auctions) in which payments must be made whether or not one wins the object.
5 There are more general models in which buyers with private information would potentially obtain yet additional
information about the value of the object were they to learn another buyer’s private information, but we shall not
consider such models here.
AUCTIONS AND MECHANISM DESIGN 429
begin to think about how the seller’s profits vary with different auction formats. Note
that with the production decision behind him and his own value equal to zero, profit-
maximisation is equivalent to revenue-maximisation.
Before we can determine the seller’s revenues in each of the four standard auc-
tions, we must understand the bidding behaviour of the buyers across the different auction
formats. Let us start with the first-price auction.
function b̂(·), but he does not know bidder i’s value. Now, if bidder i’s value is v, bidder
i would like his friend to submit the bid b̂(v) on his behalf. His friend can do this for him
once bidder i calls him and tells him his value. Clearly, bidder i has no incentive to lie to
his friend about his value. That is, among all the values r ∈ [0, 1] that bidder i with value
v can report to his friend, his payoff is maximised by reporting his true value, v, to his
friend. This is because reporting the value r results in his friend submitting the bid b̂(r) on
his behalf. But if bidder i were there himself he would submit the bid b̂(v).
Let us calculate bidder i’s expected payoff from reporting an arbitrary value, r, to his
friend when his value is v, given that all other bidders employ the bidding function b̂(·).
To calculate this expected payoff, it is necessary to notice just two things. First, bidder i
will win only when the bid submitted for him is highest. That is, when b̂(r) > b̂(vj ) for
all bidders j = i. Because b̂(·) is strictly increasing this occurs precisely when r exceeds
the values of all N − 1 other bidders. Letting F denote the distribution function associated
with f , the probability that this occurs is (F(r))N−1 which we will denote F N−1 (r). Second,
bidder i pays only when he wins and he then pays his bid, b̂(r). Consequently, bidder i’s
expected payoff from reporting the value r to his friend when his value is v, given that all
other bidders employ the bidding function b̂(·), can be written
Evaluating the right-hand side at r = v, where it is equal to zero, and rearranging yields,
(N − 1)F N−2 (v)f (v)b̂(v) + F N−1 (v)b̂ (v) = (N − 1)vf (v)F N−2 (v). (9.3)
Looking closely at the left-hand side of (9.3), we see that it is just the derivative of the
product F N−1 (v)b̂(v) with respect to v. With this observation, we can rewrite (9.3) as
dF N−1 (v)b̂(v)
= (N − 1)vf (v)F N−2 (v). (9.4)
dv
Now, because (9.4) must hold for every v, it must be the case that
v
F N−1 (v)b̂(v) = (N − 1) xf (x)F N−2 (x)dx + constant.
0
AUCTIONS AND MECHANISM DESIGN 431
Noting that a bidder with value zero must bid zero, we conclude that the constant above
must be zero. Hence, it must be the case that
N−1 v
b̂(v) = xf (x)F N−2 (x)dx,
F N−1 (v) 0
There are two things to notice about the bidding function in (9.5). First, as we had
assumed, it is strictly increasing in v (see Exercise 9.1). Second, it has been uniquely
determined. Hence, in conclusion, we have proven the following.
EXAMPLE 9.1 Suppose that each bidder’s value is uniformly distributed on [0, 1]. Then
F(v) = v and f (v) = 1. Consequently, if there are N bidders, then each employs the
bidding function v
1
b̂(v) = N−1 xdxN−1
v 0
v
1
= N−1 x(N − 1)xN−2 dx
v 0
N − 1 v N−1
= N−1 x dx
v 0
N−1 1
= N−1 vN
v N
v
= v− .
N
6 Strictly speaking, we have not shown that this is an equilibrium. We have shown that if a symmetric equilibrium
exists, then this must be it. You are asked to show that this is indeed an equilibrium in an exercise. You might
also wonder about the existence of asymmetric equilibria. It can be shown that there are none, although we shall
not do so here.
432 CHAPTER 9
So, each bidder shades his bid, by bidding less than his value. Note that as the number of
bidders increases, the bidders bid more aggressively.
Because F N−1 (·) is the distribution function of the highest value among a bidder’s
N − 1 competitors, the bidding strategy displayed in Theorem 9.1 says that each bidder
bids the expectation of the second highest bidder’s value conditional on his own value
being highest. But, because the bidders use the same strictly increasing bidding func-
tion, having the highest value is equivalent to having the highest bid and so equivalent to
winning the auction. So, we may say:
In the unique symmetric equilibrium of a first-price, sealed-bid auction, each bidder bids
the expectation of the second-highest bidder’s value conditional on winning the auction.
The idea that one ought to bid conditional on winning is very intuitive in a first-
price auction because of the feature that one’s bid matters only when one wins the auction.
Because this feature is present in other auctions as well, this idea should be considered one
of the basic insights of our strategic analysis.
Having analysed the first-price auction, it is an easy matter to describe behaviour in
a Dutch auction.
Clearly then, the first-price and Dutch auctions raise exactly the same revenue for
the seller, ex post (i.e., for every realisation of bidder values v1 , . . . , vN ).
We now turn to the second-price, sealed-bid auction.
AUCTIONS AND MECHANISM DESIGN 433
7 In fact, even the independence assumption can be dropped. (See Exercise 9.5.)
434 CHAPTER 9
Given this result, it is easy to see that the bidder with the highest value will win in an
English auction. But what price will he pay for the object? That, of course, depends on the
price at which his last remaining competitor drops out of the auction. But his last remaining
competitor will be the bidder with the second-highest value, and he will, like all bidders,
drop out when the price reaches his value. Consequently, the bidder with highest value
wins and pays a price equal to the second-highest value. Hence, we see that the outcome
of the English auction is identical to that of the second-price auction. In particular, the
English and second-price auctions earn exactly the same revenue for the seller, ex post.
8 Asin the second-price auction case, this weak dominance result does not rely on the independence of the
bidder’s values. It holds even if the values are correlated. However, it is important that the values are private.
AUCTIONS AND MECHANISM DESIGN 435
We have seen that in a second-price auction, because each bidder bids his value,
the seller receives as price the second-highest value among the N bidder values. So, if
h(v) is the density of the second-highest value, the seller’s expected revenue, RSPA , in a
second-price auction can be written
1
RSPA = vh(v)dv.
0
9 To see this, note that the highest value is less than or equal to v if and only if all N values are, and that this
occurs with probability F N (v). Hence, the distribution function of the highest value is F N . Because the density
function is the derivative of the distribution function the result follows.
436 CHAPTER 9
We shall now compare the two. From (9.6) and (9.5) we have
1 1
v
RFPA = N xdF N−1 (x) f (v)F N−1 (v)dv
F N−1 (v)
0 0
1 v
= N(N − 1) xF N−2 (x)f (x)dx f (v)dv
0 0
1 v
= N(N − 1) [xF N−2 (x)f (x)f (v)]dxdv
0 0
1 1
= N(N − 1) [xF N−2 (x)f (x)f (v)]dvdx
0 x
1
= N(N − 1) xF N−2 (x)f (x)(1 − F(x))dx
0
= RSPA ,
where the fourth equality follows from interchanging the order of integration (i.e., from
dxdv to dvdx), and the final equality follows from (9.7).
EXAMPLE 9.2 Consider the case in which each bidder’s value is uniform on [0, 1] so that
F(v) = v and f (v) = 1. The expected revenue generated in a first-price auction is
1
RFPA = N b̂(v)f (v)F N−1 (v)dv
0
1
v N−1
=N v− v dv
0 N
1
= (N − 1) vN dv
0
N−1
= .
N+1
10 One way to see this is to treat probability density like probability. Then the probability (density) that some
particular bidder’s value is v is f (v) and the probability that exactly one of the remaining N − 1 other bidders’
values is above this is (N − 1)F N−2 (v)(1 − F(v)). Consequently, the probability that this particular bidder’s value
is v and it is second-highest is (N − 1)f (v)F N−2 (v)(1 − F(v)). Because there are N bidders, the probability (i.e.,
density) that the second-highest value is v is then N(N − 1)f (v)F N−2 (v)(1 − F(v)).
AUCTIONS AND MECHANISM DESIGN 437
Remarkably, the first- and second-price auctions raise the same expected revenue,
regardless of the common distribution of bidder values! So, we may state the following:
If N bidders have independent private values drawn from the common distribution, F, then
all four standard auction forms (first-price, second-price, Dutch, and English) raise the
same expected revenue for the seller.
This revenue equivalence result may go some way towards explaining why we see
all four auction forms in practice. Were it the case that one of them raised more revenue
than the others on average, then we would expect that one to be used rather than any of
the others. But what is it that accounts for the coincidence of expected revenue in these
auctions? Our next objective is to gain some insight into why this is so.
denotes the payment that bidder i must make to the seller. The sum of the probabilities,
p1 (v1 , . . . , vN ) + · · · + pN (v1 , . . . , vN ) is always no greater than unity.
A direct selling mechanism works as follows. Because the seller does not know
the bidders’ values, he asks them to report them to him simultaneously. He then takes
those reports, v1 , . . . , vN , which need not be truthful, and assigns the object to one of
the bidders according to the probabilities pi (v1 , . . . , vN ), i = 1, . . . , N, keeping the object
with the residual probability, and secures the payment ci (v1 , . . . , vN ) from each bidder
i = 1, . . . , N. It is assumed that the entire direct selling mechanism – the probability
assignment functions and the cost functions – are public information, and that the seller
must carry out the terms of the mechanism given the vector of reported values.
Several points are worthy of note. First, although the sum of probabilities p1 + · · · +
pN can never exceed unity, we allow this sum to fall short of unity because we want to allow
the seller to keep the object.12 Second, a bidder’s cost may be negative. Third, a bidder’s
cost may be positive even when that bidder does not receive the object (i.e., when that
bidder’s probability of receiving the object is zero).
Clearly, the seller’s revenue will depend on the reports submitted by the bidders. Will
they be induced to report truthfully? If not, how will they behave? These are very good
questions, but let us put them aside for the time being. Instead, we introduce what will
turn out to be an extremely important special kind of direct selling mechanism, namely,
those in which the bidders find it in their interest to report truthfully. These mechanisms
are called incentive-compatible. Before introducing the formal definition, we introduce a
little notation.
Consider a direct selling mechanism (pi (·), ci (·))N i=1 . Suppose that bidder i’s value
is vi and he considers reporting that his value is ri . If all other bidders always report their
values truthfully, then bidder i’s expected payoff is
1 1
ui (ri , vi ) = ··· (pi (ri , v−i )vi − ci (ri , v−i ))f−i (v−i )dv−i ,
0 0
where f−i (v−i ) = f (v1 ) · · · f (vi−1 )f (vi+1 ) · · · f (vN ) and dv−i = dv1 · · · dvi−1 dvi+1 · · · dvN .
For every ri ∈ [0, 1], let
1 1
p̄i (ri ) = ··· pi (ri , v−i )f−i (v−i )dv−i
0 0
and
1 1
c̄i (ri ) = ··· ci (ri , v−i )f−i (v−i )dv−i .
0 0
12 This is more generality than we need at the moment because the seller never keeps the object in any of the four
standard auctions. However, this will be helpful a little later.
AUCTIONS AND MECHANISM DESIGN 439
Therefore, p̄i (ri ) is the probability that i receives the object when he reports ri and c̄i (ri ) is
i’s expected payment when he reports ri , with both of these being conditional on all others
always reporting truthfully. Consequently, bidder i’s expected payoff when his value is vi
and he reports ri can be written as
Note very carefully what the definition does not say. It does not say that reporting
truthfully is best for a bidder regardless of the others’ reports. It only says that a bidder can
do no better than to report truthfully so long as all other bidders report truthfully. Thus,
although truthful reporting is a Bayesian-Nash equilibrium in an incentive-compatible
mechanism, it need not be a dominant strategy for any player.
You might wonder how all of this is related to the four standard auctions. We
will now argue that each of the four standard auctions can be equivalently viewed
as an incentive-compatible direct selling mechanism. In fact, understanding incentive-
compatible direct selling mechanisms will not only be the key to understanding the
connection between the four standard auctions, but it will be central to our understanding
revenue-maximising auctions as well.
Consider a first-price auction with symmetric bidders. We would like to construct an
‘equivalent’ direct selling mechanism in which truth-telling is an equilibrium. To do this,
we shall employ the first-price auction equilibrium bidding function b̂(·). The idea behind
our construction is simple. Instead of the bidders submitting bids computed by plugging
their values into the equilibrium bidding function, the bidders will be asked to submit their
values and the seller will then compute their equilibrium bids for them. Recall that because
b̂(·) is strictly increasing, a bidder wins the object in a first-price auction if and only if he
has the highest value.
13 Thiswould in fact be a consequence of our Chapter 7 definition of Bayesian-Nash equilibrium but for the fact
that we restricted attention to finite type spaces there.
440 CHAPTER 9
Consider, then, the following direct selling mechanism, where b̂(·) is the equilibrium
bidding function for the first-price auction given in (9.5):
1, if vi > vj for all j = i
pi (v1 , . . . , vN ) =
0, otherwise,
and (9.9)
b̂(vi ), if vi > vj for all j = i
ci (v1 , . . . , vN ) =
0, otherwise.
Look closely at this mechanism. Note that the bidder with the highest reported value,
v, receives the object and he pays b̂(v) for it, just as he would have in a first-price auction
equilibrium. So, if the bidders report their values truthfully, then the bidder with the highest
value, v, wins the object and makes the payment b̂(v) to the seller. Consequently, if this
mechanism is incentive-compatible, the seller will earn exactly the same ex post revenue
as he would with a first-price auction.
To demonstrate that this mechanism is incentive-compatible we need to show that
truth-telling is a Nash equilibrium. So, let us suppose that all other bidders report their val-
ues truthfully and that the remaining bidder has value v. We must show that this bidder can
do no better than to report his value truthfully to the seller. So, suppose that this bidder con-
siders reporting value r. He then wins the object and makes a payment of b̂(r) if and only
if r > vj for all other bidders j. Because the other N − 1 bidders’ values are independently
distributed according to F, this event occurs with probability F N−1 (r). Consequently, this
bidder’s expected payoff from reporting value r when his true value is v is
But this is exactly the payoff in (9.1), which we already know is maximised when r = v.
Hence, the direct selling mechanism (9.9) is indeed incentive-compatible.
Let us reconsider what we have accomplished here. Beginning with the equilibrium
of a first-price auction, we have constructed an incentive-compatible direct selling mecha-
nism whose truth-telling equilibrium results in the same ex post assignment of the object
to bidders and the same ex post payments by them. In particular, it results in the same ex
post revenue for the seller. Moreover, this method of constructing a direct mechanism is
quite general. Indeed, beginning with the equilibrium of any of the four standard auctions,
we can similarly construct an incentive-compatible direct selling mechanism that yields
the same ex post assignment of the object to bidders and the same ex post payments by
them. (You are asked to do this in an exercise.)
In effect, we have shown that each of the four standard auctions is equivalent to some
incentive-compatible direct selling mechanism. Because of this, we can now gain insight
into the former by studying the latter.
AUCTIONS AND MECHANISM DESIGN 441
Proof: Suppose the mechanism is incentive-compatible. We must show that (i) and (ii) hold.
To see that (i) holds, note that by incentive compatibility, for all ri , vi ∈ [0, 1],
p̄i (ri )vi − c̄i (ri ) = ui (ri , vi ) ≤ ui (vi , vi ) = p̄i (vi )vi − c̄i (vi ).
Adding and subtracting p̄i (vi )ri to the right-hand side, this implies
p̄i (ri )vi − c̄i (ri ) ≤ [p̄i (vi )ri − c̄i (vi )] + p̄i (vi )(vi − ri ).
But a careful look at the term in square brackets reveals that it is ui (vi , ri ), bidder i’s
expected payoff from reporting vi when his true value is ri . By incentive compatibil-
ity, this must be no greater than ui (ri , ri ), his payoff when he reports his true value, ri .
Consequently,
p̄i (ri )vi − c̄i (ri ) ≤ [p̄i (vi )ri − c̄i (vi )] + p̄i (vi )(vi − ri )
≤ ui (ri , ri ) + p̄i (vi )(vi − ri )
= [p̄i (ri )ri − c̄i (ri )] + p̄i (vi )(vi − ri ).
That is,
p̄i (ri )vi − c̄i (ri ) ≤ [p̄i (ri )ri − c̄i (ri )] + p̄i (vi )(vi − ri ),
So, when vi > ri , it must be the case that p̄i (vi ) ≥ p̄i (ri ). We conclude that p̄i (·) is
non-decreasing. Hence, (i) holds. (See also Exercise 9.7.)
To see that (ii) holds, note that because bidder i’s expected payoff must be maximised
when he reports truthfully, the derivative of ui (ri , vi ) with respect to ri must be zero when
442 CHAPTER 9
where the first equality follows from the fundamental theorem of calculus, the second from
(P.1), and the third from integration by parts. Consequently, for every bidder i and every
vi ∈ [0, 1],
vi
c̄i (vi ) = c̄i (0) + p̄i (vi )vi − p̄i (x)dx, (P.2)
0
proving (ii).
We must now show the converse. So, suppose that (i) and (ii) hold. We must show
that ui (ri , vi ) is maximised in ri when ri = vi . To see this, note that substituting (ii) into
(9.8) yields
ri
ui (ri , vi ) = p̄i (ri )vi − c̄i (0) + p̄i (ri )ri − p̄i (x)dx . (P.3)
0
This can be rewritten as
vi vi
ui (ri , vi ) = −c̄i (0) + p̄i (x)dx − (p̄i (x) − p̄i (ri ))dx ,
0 ri
where this expression is valid whether ri ≤ vi or ri ≥ vi .15 Because by (i) p̄i (·) is non-
decreasing, the integral in curly brackets is non-negative for all ri and vi . Consequently,
vi
ui (ri , vi ) ≤ −c̄i (0) + p̄i (x)dx. (P.4)
0
14 We are ignoring two points here. The first is whether u (r , v ) is in fact differentiable in r . Although it need
i i i i
not be everywhere differentiable, incentive compatibility implies that it must be differentiable almost everywhere
and that the analysis we shall conduct can be made perfectly rigorous. We will not pursue these details here. The
second point we ignore is the first-order condition at the two non-interior values vi = 0 or 1. Strictly speaking,
the derivatives at these boundary points need not be zero. But there is no harm in this because these two values
each occur with probability zero.
15 Recall the convention in mathematics that when a < b, a f (x)dx = − b f (x)dx.
b a
AUCTIONS AND MECHANISM DESIGN 443
ui (ri , vi ) ≤ ui (vi , vi ),
1 1
N
R= ··· ci (v1 , . . . , vN )f (v1 ) . . . f (vN )dv1 . . . dvN
0 0 i=1
N
1 1
= ··· ci (v1 , . . . , vN )f (v1 ) . . . f (vN )dv1 . . . dvN
i=1 0 0
N
1 1 1
= ··· ci (vi , v−i )f−i (v−i )dv−i fi (vi )dvi
i=1 0 0 0
N
1
= c̄i (vi )fi (vi )dvi
i=1 0
N
1 vi
= c̄i (0) + p̄i (vi )vi − p̄i (x)dx fi (vi )dvi
i=1 0 0
N 1
vi
N
= p̄i (vi )vi − p̄i (x)dx fi (vi )dvi + c̄i (0),
i=1 0 0 i=1
where the fourth equality follows from the definition of c̄i (vi ) and the fifth equality follows
from (ii) of Theorem 9.5.
444 CHAPTER 9
Consequently, the seller’s expected revenue depends only on the probability assign-
ment functions and the amount bidders expect to pay when their values are zero. Because a
bidder’s expected payoff when his value is zero is completely determined by his expected
payment when his value is zero, the desired result follows.
The revenue equivalence theorem provides an explanation for the apparently co-
incidental equality of expected revenue among the four standard auctions. We now see
that this follows because, with symmetric bidders, each of the four standard auctions has
the same probability assignment function (i.e., the object is assigned to the bidder with the
highest value), and in each of the four standard auctions a bidder with value zero receives
expected utility equal to zero.
The revenue equivalence theorem is very general and allows us to add additional
auctions to the list of those yielding the same expected revenue as the four standard ones.
For example, a first-price, all-pay auction, in which the highest among all sealed bids wins
but every bidder pays an amount equal to his bid, also yields the same expected revenue
under bidder symmetry as the four standard auctions. You are asked to explore this and
other auctions in the exercises.
9.3.2 EFFICIENCY
Before closing this section, we briefly turn our attention to the allocative properties of the
four standard auctions. As we have already noted several times, each of these auctions allo-
cates the object to the bidder who values it most. That is, each of these auctions is efficient.
In the case of the Dutch and the first-price auctions, this result relies on bidder symmetry.
Without symmetry, different bidders in a first-price auction, say, will employ different
strictly increasing bidding functions. Consequently, if one bidder employs a lower bidding
function than another, then the one may have a higher value yet be outbid by the other.
of the first-price auction was exactly replicated in the direct mechanism’s truth-telling
equilibrium. As it turns out, the same type of construction can be applied to any selling
procedure. That is, given an arbitrary selling procedure and a Nash equilibrium in which
each bidder employs a strategy mapping his value into payoff-maximising behaviour under
that selling procedure, we can construct an equivalent incentive-compatible direct selling
mechanism. The requisite probability assignment and cost functions map each vector of
values to the probabilities and costs that each bidder would experience according to the
equilibrium strategies in the original selling procedure. So constructed, this direct selling
mechanism is incentive-compatible and yields the same (probabilistic) assignment of the
object and the same expected costs to each bidder as well as the same expected revenue to
the seller.
Consequently, if some selling procedure yields the seller expected revenue equal
to R, then so too does some incentive-compatible direct selling mechanism. But this means
that no selling mechanism among all conceivable selling mechanisms yields more revenue
for the seller than the revenue-maximising, incentive-compatible direct selling mechanism.
We can, therefore, restrict our search for a revenue-maximising selling procedure to the
(manageable) set of incentive-compatible direct selling mechanisms. In this way, we have
simplified our problem considerably while losing nothing.
This simple but extremely important technique for reducing the set of mechanisms
to the set of incentive-compatible direct mechanisms is an instance of what is called the
revelation principle. This principle is used again and again in the theory of mechanism
design and we will see it in action again in Section 9.5 when we consider the problem of
achieving efficient outcomes in a private information setting.
ui (vi , vi ) = p̄i (vi )vi − c̄i (vi ) ≥ 0 for all vi ∈ [0, 1].
N
1 vi
N
R= p̄i (vi )vi − p̄i (x)dx fi (vi )dvi + c̄i (0)
i=1 0 0 i=1
subject to
where the expression for the seller’s expected revenue follows from incentive compatibility
precisely as in the proof of Theorem 9.6.
It will be helpful to rearrange the expression for the seller’s expected revenue.
N
1 vi
N
R= p̄i (vi )vi − p̄i (x)dx fi (vi )dvi + c̄i (0)
i=1 0 0 i=1
N 1 1 vi
N
= p̄i (vi )vi fi (vi )dvi − p̄i (x)fi (vi )dxdvi + c̄i (0).
i=1 0 0 0 i=1
AUCTIONS AND MECHANISM DESIGN 447
By interchanging the order of integration in the iterated integral (i.e., from dxdvi to dvi dx),
we obtain
N 1 1 1
N
R= p̄i (vi )vi fi (vi )dvi − p̄i (x)fi (vi )dvi dx + c̄i (0)
i=1 0 0 x i=1
N 1 1
N
= p̄i (vi )vi fi (vi )dvi − p̄i (x)(1 − Fi (x))dx + c̄i (0).
i=1 0 0 i=1
N 1 1
N
R= p̄i (vi )vi fi (vi )dvi − p̄i (vi )(1 − Fi (vi ))dvi + c̄i (0)
i=1 0 0 i=1
N
N
1 1 − Fi (vi )
= p̄i (vi ) vi − fi (vi )dvi + c̄i (0).
0 fi (vi )
i=1 i=1
we may write
N
1 1 1 − Fi (vi )
R= ··· pi (v1 , . . . , vN ) vi − f1 (v1 ) . . . fN (vN )dv1 . . . dvN
0 0 fi (vi )
i=1
N
+ c̄i (0),
i=1
or
N
1 1 1 − Fi (vi )
R= ··· pi (v1 , . . . , vN ) vi − f1 (v1 ) . . . fN (vN )dv1 . . . dvN
0 0 fi (vi )
i=1
N
+ c̄i (0). (9.11)
i=1
448 CHAPTER 9
So, our problem is to maximise (9.11) subject to the constraints (i)–(iii) above. For
the moment, let us concentrate on the first term in (9.11), namely
1 1
N
1 − Fi (vi )
··· pi (v1 , . . . , vN ) vi − f1 (v1 ) . . . fN (vN )dv1 . . . dvN . (9.12)
0 0 fi (vi )
i=1
Clearly, (9.12) would be maximised if the term in curly brackets were max-
imised for each vector of values v1 , . . . , vN . Now, because the pi (v1 , . . . , vN ) are non-
and sum to one or less, the N + 1 numbers p1 (v1 , . . . , vN ), . . . , pN (v1 , . . . , vN ),
negative
1− N i=1 pi (v1 , . . . , vN ) are non-negative and sum to one. So, the sum above in curly
brackets, which can be rewritten as
N
N
1 − Fi (vi )
pi (v1 , . . . , vN ) vi − + 1− pi (v1 , . . . , vN ) · 0,
fi (vi )
i=1 i=1
But then the sum in curly brackets can be no larger than the largest of these bracketed terms
if one of them is positive, and no larger than zero if all of them are negative. Suppose now
that no two of the bracketed terms are equal to one another. Then, if we define
i (vi ) 1−Fj(vj )
1, ifvi − 1−F > max 0, vj − for all j = i,
p∗i (v1 , . . . , vN ) = fi (vi ) fj (vj ) (9.13)
0, otherwise,
N N
1 − Fi (vi ) ∗ 1 − Fi (vi )
pi (v1 , . . . , vN ) vi − ≤ pi (v1 , . . . , vN ) vi − .
fi (vi ) fi (vi )
i=1 i=1
Therefore, if the bracketed terms are distinct with probability one, we will have
1 1
N
1 − Fi (vi )
R= ··· pi (v1 , . . . , vN ) vi − f1 (v1 ) . . . fN (vN )dv1 . . . dvN
0 0 fi (vi )
i=1
N
+ c̄i (0)
i=1
AUCTIONS AND MECHANISM DESIGN 449
1 1
N
1 − Fi (vi )
≤ ··· p∗i (v1 , . . . , vN ) vi − f1 (v1 ) . . . fN (vN )dv1 . . . dvN
0 0 fi (vi )
i=1
N
+ c̄i (0),
i=1
for all incentive-compatible direct selling mechanisms pi (·), ci (·). For the moment, then,
let us assume that the bracketed terms are distinct with probability one. We will introduce
an assumption on the bidders’ distributions that guarantees this shortly.16
Because constraint (iii) implies that each c̄i (0) ≤ 0, we can also say that for all
incentive-compatible direct selling mechanisms pi (·), ci (·), the seller’s revenue can be no
larger than the following upper bound:
1 1
N
1 − Fi (vi )
R≤ ··· p∗i (v1 , . . . , vN ) vi − f1 (v1 ) . . . fN (vN )dv1 . . . dvN .
0 0 fi (vi )
i=1
(9.14)
Now, because the c̄∗i and p̄∗i are averages of the c∗i and p∗i , this required relationship
between averages will hold if it holds for each and every vector of values v1 , . . . , vN .
That is, (ii) is guaranteed to hold if we define the c∗i as follows: for every v1 , . . . , vN ,
vi
c∗i (v1 , . . . , vN ) = c∗i (0, v−i ) + p∗i (v1 , . . . , vN )vi − p∗i (x, v−i )dx. (9.15)
0
To complete the definition of the cost functions and to satisfy constraint (iii),
we shall set c∗i (0, v−i , . . . , v−i ) = 0 for all i and all v2 , . . . , vn . So, our candidate for
a revenue-maximising, incentive-compatible direct selling mechanism is as follows: for
and
vi
c∗i (v1 , . . . , vN ) = p∗i (v1 , . . . , vN )vi − p∗i (x, v−i )dx. (9.17)
0
By construction, this mechanism satisfies constraints (ii) and (iii), and it achieves the
upper bound for revenues in (9.14). To see this, simply substitute the p∗i into (9.11) and
recall that by construction c̄∗i (0) = 0 for every i. The result is that the seller’s revenues are
1 1
N
1 − Fi (vi )
R= ··· p∗i (v1 , . . . , vN ) vi − f1 (v1 ) . . . fN (vN )dv1 . . . dvN ,
0 0 fi (vi )
i=1
1 − Fi (vi )
vi − is strictly increasing in vi . (9.18)
fi (vi )
This assumption is satisfied for a number of distributions, including the uniform distri-
bution. Moreover, you are asked to show in an exercise that it holds whenever each Fi
is any convex function, not merely that of the uniform distribution.17 Note that in addi-
tion to ensuring that (i) holds, this assumption also guarantees that the numbers v1 −
(1 − F1 (v1 ))/f1 (v1 ), . . . , vN − (1 − FN (vN ))/fN (vN ) are distinct with probability one,
a requirement that we earlier employed but had left unjustified until now.
Let us now see why (9.18) implies that (i) is satisfied. Consider some bidder i
and some fixed vector of values, v−i , for the other bidders. Now, suppose that v̄i >
vi and that p∗i (vi , v−i ) = 1. Then, by the definition of p∗i , it must be the case that
vi − (1 − Fi (vi ))/fi (vi ) is positive and strictly greater than vj − (1 − Fj(vj ))/fj (vj ) for
all j = i. Consequently, because vi − (1 − Fi (vi ))/fi (vi ) is strictly increasing it must
17 When this assumption fails, the mechanism we have constructed here is not optimal. One can nevertheless
construct the optimal mechanism, but we shall not do so here. Thus, the additional assumption we are making
here is only for simplicity’s sake.
AUCTIONS AND MECHANISM DESIGN 451
also be the case that v̄i − (1 − Fi (v̄i ))/fi (v̄i ) is both positive and strictly greater than
vj − (1 − Fj(vj ))/fj (vj ) for all j = i, which means that p∗i (v̄i , v−i ) = 1. Thus, we have
shown that if p∗i (vi , v−i ) = 1, then p∗i (vi , v−i ) = 1 for all vi > vi . But because p∗i takes
on either the value 0 or 1, p∗i (vi , v−i ) is non-decreasing in vi for every v−i . This in turn
implies that p̄∗i (vi ) is non-decreasing in vi , so that constraint (i) is indeed satisfied.
In the end then, our hard work has paid off handsomely. We can now state the
following.
The payment portion of the mechanism is a little less transparent. To get a clearer pic-
ture of what is going on, suppose that when the (truthfully) reported values are v1 , . . . , vN ,
bidder i does not receive the object, i.e., that p∗i (vi , v−i ) = 0. What must bidder i pay
according to the mechanism? The answer, according to (9.17), is
vi
c∗i (vi , v−i ) = p∗i (vi , v−i )vi − p∗i (x, v−i )dx
0
vi
∗
= 0 · vi − pi (x, v−i )dx.
0
But recall that, by virtue of assumption (9.18), p∗i (·, v−i ) is non-decreasing. Consequently,
because p∗i (vi , v−i ) = 0, it must be the case that p∗i (x, v−i ) = 0 for every x ≤ vi . Hence the
integral above must be zero so that
So, we have shown that according to the optimal mechanism, if bidder i does not receive
the object, he pays nothing.
Suppose now that bidder i does receive the object, i.e., that p∗i (vi , v−i ) = 1.
According to (9.17), he then pays
vi
c∗i (vi , v−i ) = p∗i (vi , v−i )vi − p∗i (x, v−i )dx
0
vi
= vi − p∗i (x, v−i )dx.
0
Now, because p∗i takes on the value 0 or 1, is non-decreasing and continuous from the left
in i’s value, and p∗i (vi , v−i ) = 1, there must be a largest value for bidder i, ri∗ < vi , such
that p∗i (ri∗ , v−i ) = 0. Note that ri∗ will generally depend on v−i so it would be more explicit
to write ri∗ (v−i ). Note then that by the very definition of ri∗ (v−i ), p∗i (x, v−i ) is equal to 1
for every x > ri∗ (v−i ), and is equal to 0 for every x ≤ ri∗ (v−i ). But this means that
vi
c∗i (vi , v−i ) = vi − 1dx
ri∗ (v−i )
= vi − (vi − ri∗ (v−i ))
= ri∗ (v−i ).
So, when bidder i wins the object, he pays a price, ri∗ (v−i ), that is independent of his own
reported value. Moreover, the price he pays is the maximum value he could have reported,
given the others’ reported values, without receiving the object.
Putting all of this together, we may rephrase the revenue-maximising selling
mechanism defined by (9.16) and (9.17) in the following manner.
AUCTIONS AND MECHANISM DESIGN 453
The presence of inefficiencies is not surprising. After all, the seller is a monopolist
seeking maximal profits. In Chapter 4, we saw that a monopolist will restrict output below
the efficient level so as to command a higher price. The same effect is present here. But,
because there is only one unit of an indivisible object for sale, the seller here restricts sup-
ply by sometimes keeping the object, depending on the vector of reports. But this accounts
for only the first kind of inefficiency. The second kind of inefficiency that arises here did
not occur in our brief look at monopoly in Chapter 4. The reason is that there we assumed
that the monopolist was unable to distinguish one consumer from another. Consequently,
the monopolist had to charge all consumers the same price. Here, however, we are assum-
ing that the seller can distinguish bidder i from bidder j and that the seller knows that
i’s distribution of values is Fi and that j’s is Fj . This additional knowledge allows the
monopolist to discriminate between the bidders, which leads to higher profits.
Let us now eliminate this second source of inefficiency by supposing that bidders
are symmetric. Because the four standard auctions all yield the same expected revenue for
the seller under symmetry, this will also allow us to compare the standard auctions with
the optimal selling mechanism.
How does symmetry affect the optimal selling mechanism? If the bidders are sym-
metric, then fi = f and Fi = F for every bidder i. Consequently, the optimal selling
mechanism is as follows: if the vector of reported values is (v1 , . . . , vN ), the bidder i
with the highest positive vi − (1 − F(vi ))/f (vi ) receives the object and pays the seller ri∗ ,
the largest value he could have reported, given the other bidder’s reported values, without
winning the object. If there is no such bidder i, the seller keeps the object and no payments
are made.
But let us think about this for a moment. Because we are assuming that v− (1 −
F(v))/f (v) is strictly increasing in v, the object is actually awarded to the bidder i with
the strictly highest value vi , so long as vi − (1 − Fi (vi ))/fi (vi ) > 0 – that is, so long as
vi > ρ ∗ ∈ [0, 1], where
1 − F(ρ ∗ )
ρ∗ − = 0. (9.19)
f (ρ ∗ )
(You are asked to show in an exercise that a unique such ρ ∗ is guaranteed to exist.)
Now, how large can bidder i’s reported value be before he is awarded the object?
Well, he does not get the object unless his reported value is strictly highest and strictly
above ρ ∗ . So, the largest his report can be without receiving the object is the largest of the
other bidders’ values or ρ ∗ , whichever is larger. Consequently, when bidder i does receive
the object he pays either ρ ∗ or the largest value reported by the other bidders, whichever
is larger.
Altogether then, the optimal selling mechanism is as follows: the bidder whose
reported value is strictly highest and strictly above ρ ∗ receives the object and pays the
larger of ρ ∗ and the largest reported value of the other bidders.
Remarkably, this optimal direct selling mechanism can be mimicked by running a
second-price auction with reserve price ρ ∗ . That is, an auction in which the bidder with
the highest bid strictly above the reserve price wins and pays the second-highest bid or the
AUCTIONS AND MECHANISM DESIGN 455
reserve price, whichever is larger. If no bids are above the reserve price, the seller keeps
the object and no payments are made. This is optimal because, just as in a standard second-
price auction, it is a dominant strategy to bid one’s value in a second-price auction with a
reserve price.
This is worth highlighting.
You might wonder about the other three standard auctions. Will adding an appropri-
ate reserve price render these auctions optimal for the seller too? The answer is yes, and
this is left for you to explore in the exercises.
So, we have now come full circle. The four standard auctions – first-price, second-
price, Dutch, and English – all yield the same revenue under symmetry. Moreover, by
supplementing each by an appropriate reserve price, the seller maximises his expected
revenue. Is it any wonder then that these auctions are in such widespread use?
18 It is also possible to interpret ‘money’ instead as a separate commodity that individuals directly desire. But we
will stick with the monetary interpretation.
456 CHAPTER 9
Let us see why this claim is true. Suppose, for example, that the social state happens
to be x ∈ X but that y ∈ X satisfies,
N
N
vi (y) > vi (x). (9.21)
i=1 i=1
We would like to show that a Pareto improvement is available. In fact, we shall show that
a Pareto improvement can be obtained by switching the social state from x to y.
Now, even though (9.21) holds, merely switching the social state from x to y need
not result in a Pareto improvement because some individual utilities may well fall in the
move from x to y. The key idea is to compensate those individuals whose utilities fall
by transferring to them income from individuals whose utilities rise. It is here where the
common rate at which income translates into utility across individuals is absolutely central.
For each individual i, define the income transfer, τi , as follows:
1
N
τi = vi (x) − vi (y) + (vi (y) − vi (x)).
N
i=1
If τi > 0, then individual i receives τi dollars while if τi < 0 individual i is taxed τi dollars.
By construction, the τi sum to zero and so these are indeed income transfers among the N
individuals.
After changing the state from x to y and carrying out the income transfers, the change
in individual i’s utility is,
vi (y) + τi − vi (x),
which, by (9.21), is strictly positive. Hence, each individual is strictly better off after the
change. This proves that the social state x is not Pareto efficient and establishes the ‘only
458 CHAPTER 9
if’ part of the claim in (9.20).21 You are asked to establish the ‘if’ part of the claim in
Exercise 9.26.
where the maximisation is over all social states. We then also say that x̂(t) is an ex post
efficient social state given t ∈ T.
Thus, x̂ : T → X is ex post Pareto efficient if for each type vector t ∈ T, the social
state, x̂(t), maximises the sum of individual ex post utilities given t.
21 We implicitly assume that individuals whose transfers are negative, i.e., who are taxed, have sufficient income
to pay the tax.
22 We do not instead call this a social choice function, as in Section 6.5 of Chapter 6, because we do not require
the range of x(·) to be all of X.
AUCTIONS AND MECHANISM DESIGN 459
might ask individuals one at a time to publicly announce their type (of course they might
lie). We might then ask whether anyone believes that someone lied about their type, suit-
ably punishing (via taxes) those whose announcements are doubted by sufficiently many
others – the hope being that this might encourage honest reports. On the other hand, we
might not ask individuals to report their types at all. Rather, we might ask them to vote
directly for the social state they would like implemented. But what voting system ought
we employ? Plurality rule? Pairwise majority with ties broken randomly? Should the votes
be by secret ballot? Or public and sequential? As you can sense, we could go on and on.
There are endless possibilities for designing a system, or mechanism, in the pursuit of
achieving our goal.
Fortunately, just as in the single-good revenue-maximisation setting, the revelation
principle applies here and it allows us to limit our search to the set of incentive-compatible
direct mechanisms. Before we discuss this second application of the revelation princi-
ple any further, it is useful to have on record two definitions. They are the extensions of
Definitions 9.1 and 9.2 to the present more general setup.
c1 (t1 , . . . , tN ), . . . , cN (t1 , . . . , tN ).
Because of the similarity between Definitions 9.1 and 9.4, there is little need for
further discussion except to say that Definition 9.4 becomes equivalent to Definition 9.1
when (i) there is a single object available, (ii) there are N + 1 individuals consisting of
N bidders and one seller, and (iii) the social states are the N + 1 allocations in which,
either, one of the bidders ends up with the good or the seller ends up with the good. (See
Exercise 9.28.)
Given a direct mechanism, p, c1 , . . . , cN , it is useful to define, as in Section 9.3,
ui (ri , ti ) to be individual i’s expected utility from reporting that his type is ri ∈ Ti when his
true type is ti ∈ Ti and given that all other individuals always report their types truthfully.
That is,
ui (ri , ti ) = q−i (t−i ) px (ri , t−i )vi (x, ti ) − ci (ri , t−i ) ,
t−i ∈T−i x∈X
460 CHAPTER 9
where q−i (t−i ) = j=i qj (tj ). As before, we can simplify this formula by defining
p̄xi (ri ) = q−i (t−i )px (ri , t−i ),
t−i ∈T−i
and
c̄i (ri ) = q−i (t−i )ci (ri , t−i ). (9.22)
t−i ∈T−i
Then,
ui (ri , ti ) = p̄xi (ri )vi (x, ti ) − c̄i (ri ) (9.23)
x∈X
With these definitions in hand, it is worthwhile to informally discuss how the rev-
elation principle allows us to reduce our search to the set of incentive-compatible direct
mechanisms. So, suppose that we manage to design some, possibly quite complex, exten-
sive form game for individuals in society to play, where the payoffs to the individuals at
the endpoints are defined by the utility they receive from some social state and income
distribution at that endpoint. Because the strategies they choose to adopt may depend upon
their types, any ‘equilibrium’ they play (i.e., Nash, subgame-perfect, sequential) will be
a Bayesian-Nash equilibrium of the game’s strategic form. Suppose that in some such
Bayesian-Nash equilibrium, an ex post efficient social state is always certain to occur. We
would then say that the given extensive form game (mechanism) successfully implements
an ex post efficient outcome. According to the revelation principle, a direct incentive-
compatible mechanism can do precisely the same thing. Here’s how. Instead of having the
individuals play their strategies themselves, design a new (direct) mechanism that simply
plays their strategies for them after they report their types. Consequently, if the other indi-
viduals always report honestly, then, from your perspective, it is as if you are participating
in the original extensive form game against them. But in that game, it was optimal for
you to carry out the actions specified by your strategy conditional on your actual type.
Consequently, it is optimal for you to report your type truthfully in the new direct mech-
anism so that those same actions are carried out on your behalf. Hence, the new direct
23 Because the type spaces T are finite here, our Chapter 7 definition of a Bayesian-Nash equilibrium applies.
i
If the type spaces are infinite we would simply define the truth-telling equilibrium to be a Bayesian-Nash
equilibrium.
AUCTIONS AND MECHANISM DESIGN 461
mechanism is incentive-compatible and always yields the same ex post efficient social
state and income distribution as would the old. That’s all there is to it!
As a matter of terminology, we call an incentive-compatible direct mechanism ex
post efficient if it assigns probability one to a set of ex post efficient social states given any
vector of reported types t ∈ T, i.e., if for every t ∈ T, px (t) > 0 implies x ∈ X is ex post
efficient when the vector of types is t.
Such a solution always exists because X is finite. If there are multiple solutions choose any
one of them. The ex post efficient allocation function x̂(·) is therefore well-defined and it
will remain fixed for the rest of this chapter.
Let us think about the externality imposed by each individual i on the remaining
individuals under the assumption that ex post efficiency can be achieved. The trick to
computing individual i’s externality is to think about the difference his presence makes to
the total utility of the others.
When individual i is present and the vector of types is t ∈ T, the social state is x̂(t)
and the total utility of the others is,24
vj (x̂(t), tj ).
j =i
24 We can safely ignore the income individuals may have because we will ultimately be interested only in utility
differences between different outcomes and therefore individual incomes will always cancel. In other words, it is
harmless to compute utilities as if initial individual incomes are zero.
462 CHAPTER 9
That was simple enough. But what is the total utility of the others when individual i is not
present? This too is straightforward if we assume that in the absence of individual i – i.e.,
if society consists only of the N − 1 individuals j = i – the social state is chosen in an ex
post efficient manner relative to those who remain.
For each t−i ∈ T−i , let x̃i (t−i ) ∈ X solve,
max vj (x, tj ).
x∈X
j =i
That is, x̃i : T−i → X is an ex post efficient allocation function in the society without
individual i.
It is now a simple matter to compute the difference that i’s presence makes to the
total utility of the others. Evidently, when the type vector is t ∈ T, the difference in the
utility of the others when i is not present as compared to when he is present is,
vj (x̃i (t−i ), tj ) − vj (x̂(t), tj ).
j =i j =i
That is, each individual pays his externality based on the reported types. The cVCG
i are
called the VCG cost functions.
The key idea behind the VCG mechanism is to define individual costs so that each
individual internalises the externality that, through his report, he imposes on the rest of
society. Let us return to Example 9.3 to see what the VCG mechanism looks like there.
EXAMPLE 9.4 Consider the situation in Example 9.3. If the
vector of reported
types is
t ∈ T, then it is efficient for the town to build the bridge if i vi (B, ti ) > i vi (S, ti ).26
Given the definition of the vi , this leads to the following ex post efficient allocation
function. For each t ∈ T,
B, if N i=1 (ti − 5) > 0
x̂(t) =
S, otherwise.
So far so good, but will the VCG mechanism actually succeed in implementing an
ex post efficient outcome? By construction, the mechanism chooses an outcome that is
ex post efficient based on the reported vector of types. However, individuals are free to
lie about their types, and, if they do, the outcome will typically not be ex post efficient
with respect to the actual vector of types. Hence, for this mechanism to work, it must
induce individuals to report their types truthfully. Our next result establishes that the VCG
mechanism does indeed do so.
Proof: We must show that truthful reporting is a weakly dominant strategy for an arbitrary
individual i. Suppose then that the others report t−i ∈ T−i , which need not be truthful.
Suppose also that individual i’s type is ti ∈ Ti and that he reports ri ∈ Ti . His utility would
then be,27
Note that x̂(·) and cVCG (·) are evaluated at i’s reported type, ri , while vi (x, ·) is evalu-
ated at i’s actual type, ti . We must show that (P.1) is maximised when individual i reports
truthfully, i.e., when ri = ti .
Substituting the definition of cVCG
i (ri , t−i ) into (P.1), i’s utility can be written as,
⎛
vi (x̂(ri , t−i ), ti ) − cVCG
i (ri , t−i ) = vi (x̂(ri , t−i ), ti ) − ⎝ vj (x̃i (t−i ), tj )
j =i
⎞
− vj (x̂(ri , t−i ), tj )⎠
j =i
N
= vj (x̂(ri , t−i ), tj ) − vj (x̃i (t−i ), tj ). (P.2)
j=1 j =i
Hence, we must show that setting ri = ti maximises the right-hand side of the second
equality (P.2). To see why this is so, note that ri appears only in the first summation there
and so it suffices to show that,
N
N
vj (x̂(ti , t−i ), tj ) ≥ vj (x̂(ri , t−i ), tj ), for all t−i ∈ T−i . (P.3)
j=1 j=1
N
N
vj (x̂(ti , t−i ), tj ) ≥ vj (x, tj ), for all x ∈ X.
j=1 j=1
To test your understanding of this proof and also of the VCG mechanism, you should
try to show, with and without the aid of the proof, that it is a dominant strategy to tell the
truth in the VCG mechanism that is explicitly defined in Example 9.4.
Several remarks are in order. First, because each individual’s cost, cVCG
i (t), is always
non-negative, the mechanism never runs a deficit and typically runs a surplus.
27 We can safely ignore i’s initial level of income since it simply adds a constant to all of our utility calculations.
AUCTIONS AND MECHANISM DESIGN 465
Second, one might therefore wonder whether any individual would prefer to avoid
paying his cost by not participating in the mechanism. To properly address this question
we must specify what would happen if an individual were to choose not to participate. An
obvious specification is to suppose that the VCG mechanism would be applied as usual, but
only to those who do participate. With this in mind, we can show that it is an equilibrium
for all N individuals to participate.
If all individuals participate and report truthfully (a dominant strategy), then
individual i’s payoff when the vector of types is t is,
N
vi (x̂(t), ti ) − cVCG
i (t) = vj (x̂(t), tj ) − vj (x̃i (t−i ), tj ). (9.24)
j=1 j =i
On the other hand, if individual i chooses not to participate, he avoids paying the cost
cVCG
i (t), but the social state becomes instead x̃i (t−i ), i.e., an ex post efficient social state
for the N − 1 participating individuals who report their types. Consequently, if individual
i chooses not to participate his utility will be
N
N
vj (x̂(t), tj ) ≥ vj (x̃i (t−i ), tj ),
j=1 j=1
N
vj (x̂(t), tj ) − vj (x̃i (t−i ), tj ) ≥ vi (x̃i (t−i ), ti ).
j=1 j =i
Hence, by (9.24), i’s utility from participating, exceeds (9.25), his utility from not partici-
pating. Thus, it is an equilibrium for all individuals to voluntarily participate in the VCG
mechanism.
Third, the dominance of truth-telling in the VCG mechanism might appear to contra-
dict Theorem 6.4 (the Gibbard-Satterthwaite Theorem) of Chapter 6. Indeed, the function
x̂(·) maps vectors of types (which index individual utility functions) into social choices
in such a way that no individual can ever gain by reporting untruthfully. That is, x̂(·) is
strategy-proof. Moreover, because we have assumed nothing about the range of x̂(·), the
range might very well be all of X (if not, simply remove those elements of X that are absent
from the range). In that case, x̂(·) is a strategy-proof social choice function. But it is cer-
tainly not dictatorial! (Consider the one-good case, for example.) But, rest assured, there
is no contradiction because, in contrast to the situation in Chapter 6, we have restricted
the domain of preferences here to those that are quasi-linear. This restriction permits us to
avoid the negative conclusion of the Gibbard-Satterthwaite theorem.
466 CHAPTER 9
N
ci (t) = 0, for every t ∈ T.
i=1
If a direct mechanism’s cost functions are budget-balanced then we say that the mechanism
is budget-balanced as well.
When the vector of reported types is t ∈ T individual i’s VCG cost, cVCG
i (t), is his
externality. Thus, according to the formula in (9.22), the quantity
c̄VCG
i (ti ) = q−i (t−i )cVCG
i (ti , t−i )
t−i ∈T−i
⎛ ⎞
= q−i (t−i ) ⎝ vj (x̃i (t−i ), tj ) − vj (x̂(t), tj )⎠ , (9.26)
t−i ∈T−i j =i j =i
is i’s expected externality when his type is ti . It turns out that these expected externalities
can be used to define costs in a way that delivers ex post efficiency and a balanced budget.
cVCG
i (ti ) − cVCG
i+1 (ti+1 ),
28 Paying one’s externality to a single other individual keeps the formula for the new cost functions simple. But
paying any number of the others one’s expected externality would do just as well. See Exercise 9.29.
29 See Arrow (1979) and d’Aspremont and Gérard-Varet (1979).
468 CHAPTER 9
Let uVCG
i (ri , ti ) denote individual i’s expected utility in the VCG mechanism when
his type is ti and he reports that it is ri , and when other individuals always report their types
truthfully. Then,
uVCG
i (ri , ti ) = q(t−i )vi (x̂(ri , t−i ), ti ) − c̄VCG
i (ri ),
t−i ∈T−i
because the first term (the summation) is his expected utility from the social state, and
the second, with the negative sign, is his expected cost given his report. We already know
that truth-telling is a Bayesian-Nash equilibrium in the VCG mechanism (indeed it is a
dominant strategy). Hence, uVCGi (ri , ti ) is maximised in ri when ri = ti .
In the new mechanism, i’s expected costs when he reports ri and the others (in
particular individual i + 1) report truthfully are,
c̄VCG
i (ri ) − qi+1 (ti+1 )c̄VCG
i+1 (ti+1 ).
ti+1 ∈Ti+1
Hence, his expected utility when he reports ri in the new mechanism and when all others
report truthfully is,
q(t−i )vi (x̂(ri , t−i ), ti ) − c̄VCG
i (ri ) + c̄i+1 ,
t−i ∈T−i
where c̄i+1 = ti+1 ∈Ti+1 qi+1 (ti+1 )c̄VCG
i+1 (ti+1 ) is a constant. But this last expression is
equal to,
uVCG
i (ri , ti ) + c̄i+1 ,
Note carefully that Theorem 9.11 does not say that truth-telling is a weakly dom-
inant strategy in the new budget-balanced mechanism. It says only that truth-telling is
a Bayesian-Nash equilibrium. Consequently, although we gain a balanced budget (and
hence full efficiency) when we adjust the cost functions of the VCG mechanism, we lose
the otherwise very nice property of dominant strategy equilibrium.30
30 In
fact, there are theorems stating that it is impossible to achieve both in a wide variety of circumstances. See
Green and Laffont (1977) and Holmstrom (1979b).
AUCTIONS AND MECHANISM DESIGN 469
EXAMPLE 9.5 Continuing with Examples 9.3 and 9.4, suppose that there are just two
individuals, i.e., N = 2. As you are asked to show in Exercise 9.30, the cost formula given
in Theorem 9.11 yields, for the two individuals here, budget-balanced cost functions that
can be equivalently described by the following table.
Let us understand why the entries in the table are as they are. The ‘circular seating’
description following Theorem 9.11 implies that the entries in the second row of the table
are simply the expected VCG costs, i.e., the c̄VCG i (ti ). In particular, the fourth entry in
the second row is c̄VCG
1 (4), individual 1’s expected VCG cost when he reports that his
type is t1 = 4. By reporting t1 = 4 < 5, he can be pivotal only for the swimming pool,
and even then he is pivotal only when individual 2 reports t2 = 6, in which case his VCG
cost, i.e., his externality, is cVCG
1 (4, 6) = 6 − 5 (see Example 9.4). Because individual 2
reports truthfully and the probability that t2 = 6 is 1/9, individual 1’s expected externality
is therefore c̄VCG
1 (4) = 19 (6 − 5) = 19 , as in the table.
Note that one’s payment to the other individual is higher the more extreme is one’s
report. This is in keeping with the idea that, for correct incentives, individuals should
pay their externality (but keep in mind that the amount paid according to the table is not
one’s cost, because each individual also receives a payment from the other individual).
Indeed, the more extreme an individual’s report, the more likely it is that he gets his way,
or, equivalently, the less likely it is that the other individual gets their way. Requiring
individuals to pay more when their reports are more extreme keeps them honest.
Thus, when N = 2, the budget-balanced expected externality mechanism for the
town is as follows. The two individuals are asked to report their types and make pay-
ments to one another according to the table above. The bridge is built if the sum
of the reports exceeds 10 and the swimming pool is built otherwise. This mecha-
nism is incentive-compatible, ex post efficient, budget-balanced, and leads to voluntary
participation.
Theorem 9.11 provides an affirmative answer to the question of whether one can
design a mechanism that ensures an ex post efficient outcome in a quasi-linear utility,
independent private values setting. Thus, we have come quite a long way. But there are
important situations that our analysis so far does not cover and it is now time to get to them.
individuals are willing to participate in the mechanism.31 Indeed, we presumed that when
an individual chooses not to participate, two things are true. First, his income is unchanged,
implying that he cannot be forced to give it up. Second, the set of social states available to
the remaining individuals is also unchanged, implying that the individual himself has no
control – i.e., no property rights – over them.
The ‘no property rights over social states’ assumption sometimes makes perfect
sense. For example, when the mechanism is an auction and the N participating individ-
uals are bidders, it is natural to suppose that no bidder has any effect on the availability of
the good should he decide not to participate. But what if we include the seller as one of the
individuals participating in the mechanism? It typically will not make sense to assume that
the good will remain available to the bidders if the seller chooses not to participate.32 Or,
consider a situation in which a firm-owner has the technology to produce a good (at some
cost) that a consumer might value. In this case too, the set of social states is not the same
for the consumer alone as it is with the consumer and firm-owner together. Or, suppose
one is interested in dissolving a partnership (e.g., a law firm, a marriage, etc.) efficiently,
where each partner has rights to the property that is jointly owned. In order to cover these
and other important situations we must generalise our model.
The key to accommodating property rights over social states is to be more flexible
about individual participation decisions. To get us moving in the right direction, consider
a situation involving a seller who owns an object and potential buyer. The seller’s value
for the object is some vs ∈ [0, 1] known only to the seller, and the buyer’s value for the
object is some vb ∈ [0, 1] known only to the buyer. If we wish to give the seller property
rights over the object, then we cannot force him to trade it away. Consequently, the seller
will participate in a mechanism only if he expects to receive utility at least vs from doing
so, because he can achieve this utility by not participating and keeping the object for him-
self. The notable feature of this example is that the value to the seller of not participating
depends non-trivially on his private type, vs . We will now incorporate this idea into our
general model.
For each individual i, and for each ti ∈ Ti , let IRi (ti ) denote i’s expected utility when
he does not participate in the mechanism and his type is ti . Thus, in the example of the
previous paragraph, letting individual 1 be the seller, we have IR1 (vs ) = vs for each vs ∈
[0, 1], and letting individual 2 be the buyer, we have IR2 (vb ) = 0 for each vb ∈ [0, 1].
EXAMPLE 9.6 Reconsider Example 9.3 but suppose that the town itself must finance the
building of either the bridge or the swimming pool and that building neither (i.e., ‘Don’t
Build’ (D)) is a third social state that is available. The types are as before as are the utilities
for the bridge and pool. But we must specify utilities for building nothing. Suppose that
individual 1 is the only engineer in town and that he would be the one to build the bridge
or the pool. His utility for the social state D is
v1 (D, t1 ) = 10,
vi (D, ti ) = 0.
You may think of v1 (D, t1 ) = 10 as the engineer’s (opportunity) cost of building either the
bridge or the pool. So, if the engineer cannot be forced to build (i.e., if he has property
472 CHAPTER 9
rights over the social state D ), then the mechanism must give him at least an expected
utility of 10 because he can ensure this utility simply by not building anything. Hence, for
every t ∈ T, we have IR1 (t1 ) = 10 and IRi (ti ) = 0 for i > 1. As we now show, the expected
externality mechanism that worked so beautifully without property rights no longer works.
Note that it is always efficient to build something, because total utility is equal to 10
if nothing is built, while it is strictly greater than 10 (assuming the engineer is not the only
individual) if the swimming pool is built. Suppose there are just two individuals, the engi-
neer and one other. The expected externality mechanism described in Example 9.5 fails to
work because the engineer will sometimes refuse to build. For instance, if the engineer’s
type is t1 < 4, then whatever are the reports, the mechanism will indicate that either the
bridge or the pool be built and individual 2’s payment to the engineer will be no more than
10/9. Consequently, even ignoring the payment the engineer must make to individual 2, the
engineer’s expected utility if he builds is strictly less than 10 because max(t1 + 5, 2t1 ) +
10/9 < 10 when t1 < 4. The engineer is therefore strictly better off exercising his right
not to build. So, under the expected externality mechanism, the outcome is inefficient
whenever t1 < 4 because the engineer’s individual rationality constraint is violated.
Can the type of difficulty encountered in Example 9.6 be remedied? That is, is
it always possible to design an incentive-compatible, ex post efficient, budget-balanced
direct mechanism that is also individually rational? In general, the answer is ‘No’ (we will
return to the specific case of Example 9.6 a little later). However, we can come to an essen-
tially complete understanding of when it is possible and when it is not. Let us begin by
providing conditions under which it is possible.33
33 Theremainder of this chapter draws heavily from Krishna and Perry (1998). Another very nice treatment can
be found in Williams (1999).
AUCTIONS AND MECHANISM DESIGN 473
Note that the VCG mechanism runs an expected surplus because cVCG i (t) ≥ 0 for every
i and every t. On the other hand, the IR-VCG may or may not run an expected sur-
plus because it reduces the expected surplus of the VCG mechanism by the amount
of the participation subsidies. We can now state the following result, which holds for
any incentive-compatible mechanism, not merely for the particular mechanisms we have
considered so far.
1
N
cBi (t) = c̄i (ti ) − c̄i+1 (ti+1 ) + c̄i+1 − c̄j ,
N
j=1
where c̄i (ti ) is defined by (9.22), c̄i = t∈T q(t)ci (t), and i + 1 = 1 when i = N. Then the
resulting mechanism – with the same probability assignment function – is budget-balanced
and remains incentive-compatible. Moreover, the resulting mechanism is weakly preferred
by every type of every individual to the original mechanism. Therefore, if the original
mechanism was individually rational, so is the new budget-balanced mechanism.
1
N
= c̄i (ri ) − c̄i+1 + c̄i+1 − c̄j
N
j=1
1
N
= c̄i (ri ) − c̄j . (P.1)
N
j=1
Hence, individual i’s expected cost when he reports ri differs from the original by a fixed
constant.
Given the original cost functions, ci , let ui (ri , ti ) denote individual i’s expected utility
from reporting ri when his type is ti and when all others report truthfully, and let uBi (ri , ti )
denote the analogous quantity with the new cost functions, cBi . Because the probability
assignment function has not changed, the result from (P.1) and the formula in (9.23) imply,
1
N
uBi (ri , ti ) = ui (ri , ti ) + c̄j . (P.2)
N
j=1
Therefore, because ui (ri , ti ) is maximised in ri when ri = ti , the same is true of uBi (ri , ti )
and we conclude that the new mechanism is incentive-compatible.
Finally, the assumption that the original mechanism runs an expected surplus means
precisely that,
N
c̄j ≥ 0.
j=1
Consequently, evaluating (P.2) at ri = ti , the expected utility of every type of each individ-
ual is at least as high in the truth-telling equilibrium with the new cost functions as with
the old.
Let us note a few things about Theorem 9.12. First it provides explicit budget-
balanced cost functions derived from the original cost functions that maintain incentive-
compatibility. Second, not only do we achieve a balanced budget, we do so while ensuring
476 CHAPTER 9
that every individual, regardless of his type, is at least as well off in the truth-telling equi-
librium of the new mechanism as he was in the old. Thus, if individuals were willing to
participate in the old mechanism they are willing to participate in the new mechanism as
well, regardless of their type.34 Consequently, an immediate implication of Theorem 9.12
is the following.
N
q(t)(cVCG
i (t) − ψi∗ ) ≥ 0.
t∈T i=1
1 VCG
N
c̄VCG
i (ti ) − ψi∗ − c̄VCG
i+1 (ti+1 ) + c̄i+1 −
VCG
(c̄j − ψj∗ ),
N
j=1
where c̄VCG
j (tj ) is defined by (9.26) and c̄VCG
j = tj ∈Tj qj (tj )c̄VCG
j (tj ) is individual j’s ex
ante expected VCG cost.
The proof of Theorem 9.13 really is immediate because the IR-VCG mechanism is
incentive-compatible, ex post efficient, and individually rational. So, if it runs an expected
surplus, adjusting its cost functions, cVCG
i (t) − ψi∗ , according to Theorem 9.12, results
in an incentive-compatible, ex post efficient, budget-balanced, and individually ratio-
nal mechanism. You now need only convince yourself that the resulting mechanism is
precisely that which is defined in Theorem 9.13. (Do convince yourself.)
Theorem 9.13 identifies expected surplus in the IR-VCG mechanism as a suffi-
cient condition for the existence of a mechanism that satisfies all of our demands, i.e.,
incentive compatibility, ex post efficiency, budget-balancedness, and individual rationality.
Moreover, Theorem 9.13 explicitly constructs such a mechanism.
EXAMPLE 9.7 In the light of Theorem 9.13, let us reconsider Example 9.6 when there
are just two individuals, one of whom is the engineer. We know from Example 9.6 that the
budget-balanced expected externality mechanism is not individually rational. In particular,
when the engineer’s type is low, he is better off not participating. Thus, the engineer’s
34 Theorem 9.12 remains true, and the proof is identical to that given here, even when the private value assumption
fails – i.e., even when vi (x, t) depends on t−i as well as on ti . On the other hand, the proof given here depends
crucially on the assumption that the types are independent across individuals.
AUCTIONS AND MECHANISM DESIGN 477
participation subsidy ψ1∗ must be strictly positive. Let us check whether the IR-VCG
mechanism runs an expected surplus here. According to (9.27),
Thus, we must compute the minimum expected VCG utility over the engineer’s types. It
is not difficult to argue that the higher is the engineer’s type, the better off he must be in
the VCG mechanism (see Exercise 9.32). Hence, the minimum occurs when t1 = 1, and
his expected utility in the VCG mechanism is,
10
U1VCG (1) = (1 + 5) − ,
9
because the pool will be built regardless of 2’s report, and his expected VCG cost when his
type is t1 = 1 is, c̄VCG
1 (1) = 10/9 (see Exercise 9.33). Hence,
the engineer’s individual rationality constraint. Exercise 9.33 asks you to explicitly provide
a mechanism that does the job.
and
c̄i (ti ) = ci (ti , t−i )q−i (t−i )dt−i ,
T−i
rather than their finite summation counterparts in (9.22). Consequently, individual i’s
expected utility from reporting ri when his type is ti and the others report truthfully is
once again,
ui (ri , ti ) = p̄xi (ri )vi (x, ti ) − c̄i (ri ),
x∈X
need it, this yields the following first-order condition for every individual i and every
ti ∈ (0, 1).
∂ui (ri , ti )
= i (ti )vi (x, ti ) − c̄i (ti ) = 0,
p̄x
∂ri ri =ti
x∈X
so that
c̄i (ti ) = i (ti )vi (x, ti ).
p̄x (9.28)
x∈X
Consequently, if two mechanisms, p, cA1 , . . . , cAN , and p, cB1 , . . . , cBN have the
same probability assignment function, then the derivative of the expected costs in the A
mechanism, c̄Ai (ti ), must satisfy (9.28) as must the derivative of the expected costs in the
B mechanism, c̄iB (ti ). Hence, for all i and all ti ∈ (0, 1),
c̄Ai (ti ) =
i (ti )vi (x, ti ) = c̄Bi (ti ).
p̄x
x∈X
That is, the derivatives of the expected cost functions must be identical. But then, so
long as the fundamental theorem of calculus can be applied, the expected cost functions
themselves must differ by a constant because,
ti ti
c̄Ai (ti ) − c̄Ai (0) = c̄Ai (s)ds = c̄Bi (s)ds = c̄Bi (ti ) − c̄Bi (0).
0 0
assignment functions, p̄xi , and every individual is indifferent between the two mechanisms
when his type is zero, then the two mechanisms generate the same expected revenue.
We leave the proof of Theorem 9.15 to you as an exercise (see Exercise 9.35).
Another immediate consequence of Theorem 9.14 is the following. Suppose that for
each t ∈ T there is a unique ex post efficient social state. Then any two ex post efficient
incentive-compatible mechanisms have the same probability assignment functions. Hence,
by Theorem 9.14, because the VCG mechanism is incentive-compatible and ex post effi-
cient, any other incentive compatible ex post efficient mechanism must have expected
cost functions that differ from the VCG expected costs (i.e., the expected externalities) by
a constant. Indeed, if you look back at all of the ex post efficient mechanisms we con-
structed, expected costs differ by a constant from the VCG expected costs. This fact is the
basis of our next result.
The assumption that for each t ∈ T there is a unique ex post efficient social state is
very strong when there are finitely many social states.37 Fortunately there are much weaker
assumptions that have the same effect.
Proof: Because, for each t−j ∈ T−j there is, for all but finitely many tj ∈ Tj , a unique ex
post efficient social state given (tj , t−j ), the expected probability assignment functions,
p̄xi (ti ), are uniquely determined by ex post efficiency. Consider then some incentive-
compatible, ex post efficient, individually rational direct mechanism with cost functions
c1 , . . . , cN . According to the fact just stated, its expected probability assignment functions
must coincide with those of the IR-VCG mechanism. So, by Theorem 9.14, its expected
cost functions differ by a constant from the expected cost functions, c̄VCG i (ti ) − ψi∗ , of the
IR-VCG mechanism. Hence, for some constants, k1 , . . . , kN ,
38
37 Indeed, if each vi (x, ti ) is continuous in ti , then uniqueness implies that x̂(t) must be constant. That is, there
must be a single social state that is ex post efficient regardless of the vector of types. But in that case there is no
problem to begin with since there is no uncertainty about which social state is ex post efficient.
38 Keep in mind that all of our formulae, including those defing the ψ ∗ , must be adjusted from sums over types
i
to integrals over types. But otherwise they are the same.
AUCTIONS AND MECHANISM DESIGN 481
(P.1) says that starting with the VCG mechanism and adjusting it by giving to each indi-
vidual i the participation subsidy ψi∗ + ki renders it individually rational in addition to ex
post efficient. But because the participation subsidies ψi∗ are, by definition, the smallest
such subsidies, it must be the case that ki ≥ 0 for all i. Hence, by (P.1),
so that each individual, regardless of his type, expects to pay a lower cost in the mechanism
with cost functions, ci , than in the IR-VCG mechanism. Hence, the IR-VCG mechanism
generates at least as much expected revenue.
Proof: If such a mechanism exists, then, because it is budget-balanced, its expected rev-
enues are zero. On the other hand, by Theorem 9.16, the IR-VCG mechanism raises at
least as much ex ante expected revenue. Therefore, the ex ante expected revenue raised by
the IR-VCG mechanism must be non-negative.
because, without the buyer, the seller receives the object and obtains utility ts from the
social state, but with the buyer, the buyer receives the object and the seller obtains utility
zero from the social state. On the other hand, if tb < ts , the buyer’s externality is zero
because with or without him the seller receives the object. Hence,
ts , if tb > ts
cVCG (tb , ts ) =
b 0 if tb < ts .
There is no need to specify who receives the object when tb = ts because this event occurs
with probability zero and will have no effect on expected costs. Indeed, from what we
already know, the buyer’s expected cost given his type is:
tb 1 2
c̄VCG (tb ) = ts dts = t .
b
0 2b
Similarly, because within the VCG mechanism the buyer and seller are symmetric,39
1 2
c̄VCG
s (ts ) = t .
2s
Consequently, if UiVCG (ti ) is individual i’s expected utility in the VCG mechanism when
his type is ti , then
tb
UbVCG (tb ) = tb dts − c̄VCG
b (tb )
0
1
= tb2 − tb2
2
1 2
= t ,
2b
where the integral in the first line is the utility the buyer expects from receiving the object
when that is the efficient social state. Similarly, by symmetry,
1 2
UsVCG (ts ) = t .
2s
The IR-VCG mechanism runs an expected surplus when the expected revenue
from the VCG mechanism exceeds the sum of the participation subsidies, ψb∗ + ψs∗ . The
39 The fact that the seller owns the object plays no role in the VCG mechanism, which always operates as if there
are no property rights over social states.
AUCTIONS AND MECHANISM DESIGN 483
and
so that
1 1
ψb∗ + ψs∗ = > .
2 3
There are several lessons to draw from Example 9.8. First, the example provides
an explanation for the otherwise puzzling phenomenon of strikes and disagreements in
bargaining situations. The puzzling thing about strikes is that one imagines that whatever
agreement is eventually reached could have been reached without the strike, saving both
sides time and resources. But the result in the example demonstrates that this ‘intuition’
is simply wrong. Sometimes there is no mechanism that can assure ex post efficiency –
inefficiencies must occasionally appear. And one example of such an inefficiency is that
associated with a strike.
Second, the example illustrates that property rights matter. A very famous result in
law and economics is the ‘Coase Theorem’ which states, roughly, that if one’s only interest
is Pareto efficiency, property rights do not matter – e.g., whether a downstream fishery is
484 CHAPTER 9
given the legal right to clean water or an upstream steel mill is given the legal right to
dump waste into the stream, the two parties will, through appropriate transfer payments to
one another, reach a Pareto-efficient agreement. Our analysis reveals an important caveat,
namely that the Coase Theorem can fail when the parties have private information about
their preferences. If no individual has property rights over social states, we found that
efficiency was always possible. However, when property rights are assigned (as in the
buyer–seller example) an efficient agreement cannot always be guaranteed.
Third, the fact that property rights can get in the way of efficiency provides an impor-
tant lesson for the privatisation of public assets (e.g. government sale of off-shore oil rights,
or of radio spectrum for commercial communication (mobile phones, television, radio)).
If the government’s objective is efficiency, then it is important to design the privatisation
mechanism so that it assigns the objects efficiently, if possible. This is because the assign-
ment, by its nature, creates property rights. If the assignment is inefficient, and private
information remains, the establishment of property rights may well lead to unavoidable,
persistent, and potentially large efficiency losses.
Fourth, the example suggests that the lack of symmetry in ownership may play a
role in the impossibility result. For example, a setting without property rights is one where
property rights are symmetric and also one where it is possible to construct an ex post
efficient budget-balanced mechanism with voluntary participation. In the exercises you
are asked to explore this idea further (see also Cramton et al. (1987)).
An excellent question at this point is, ‘What do we do when there does not exist
an incentive-compatible, ex post efficient, budget-balanced, individually rational mech-
anism?’ This is a terrific and important question, but one we will not pursue in this
introduction to mechanism design. One answer, however, is that we do the next best thing.
We instead search among all incentive-compatible mechanisms for those that cannot be
Pareto improved upon either from the interim perspective (i.e., from the perspective of
individuals once they know their type but no one else’s), or from the ex ante perspective.
An excellent example of this methodology can be found in Myerson and Satterthwaite
(1983).
The theory of mechanism design is rich, powerful, and important, and, while we
have only skimmed the surface here, we hope to have given you a sense of its usefulness
in addressing the fundamental problem of resource allocation in the presence of private
information.
9.6 EXERCISES
9.1 Show that the bidding strategy in (9.5) is strictly increasing.
9.2 Show in two ways that the symmetric equilibrium bidding strategy of a first-price auction with N
symmetric bidders each with values distributed according to F, can be written as
v F(x) N−1
b̂(v) = v − dx.
0 F(v)
AUCTIONS AND MECHANISM DESIGN 485
For the first way, use our solution from the text and apply integration by parts. For the second
way, use the fact that F N−1 (r)(v − b̂(r)) is maximised in r when r = v and then apply the envelope
theorem to conclude that d(F N−1 (v)(v − b̂(v))/dv = F N−1 (v); now integrate both sides from 0 to v.
9.3 This exercise will guide you through the proof that the bidding function in (9.5) is in fact a symmetric
equilibrium of the first-price auction.
(a) Recall from (9.2) that
du(r, v)
= (N − 1)F N−2 (r)f (r)(v − b̂(r)) − F N−1 (r)b̂ (r).
dr
Using (9.3), show that
du(r, v)
= (N − 1)F N−2 (r)f (r)(v − b̂(r)) − (N − 1)F N−2 (r)f (r)(r − b̂(r))
dr
= (N − 1)F N−2 (r)f (r)(v − r).
(b) Use the result in part (a) to conclude that du(r, v)/dr is positive when r < v and negative when
r > v, so that u(r, v) is maximised when r = v.
9.4 Throughout this chapter we have assumed that both the seller and all bidders are risk neutral. In this
question, we shall explore the consequences of risk aversion on the part of bidders.
There are N bidders participating in a first-price auction. Each bidder’s value is independently
drawn from [0,1] according to the distribution function F, having continuous and strictly positive
density f . If a bidder’s value is v and he wins the object with a bid of b < v, then his von Neumann-
1
Morgenstern utility is (v − b) α , where α ≥ 1 is fixed and common to all bidders. Consequently,
the bidders are risk averse when α > 1 and risk neutral when α = 1. (Do you see why?) Given
the risk-aversion parameter α, let b̂α (v) denote the (symmetric) equilibrium bid of a bidder when
his value is v. The following parts will guide you toward finding b̂α (v) and uncovering some of its
implications.
(a) Let u(r, v) denote a bidder’s expected utility from bidding b̂α (r) when his value is v, given that
all other bidders employ b̂α (·). Show that
1
u(r, v) = F N−1 (r)(v − b̂α (r)) α .
(d) Prove that b̂α (v) is strictly increasing in α ≥ 1. Does this make sense? Conclude that as bidders
become more risk averse, the seller’s revenue from a first-price auction increases.
(e) Use part (d) and the revenue equivalence result for the standard auctions in the risk-neutral case
to argue that when bidders are risk averse as above, a first-price auction raises more revenue for
the seller than a second-price auction. Hence, these two standard auctions no longer generate the
same revenue when bidders are risk averse.
(f) What happens to the seller’s revenue as the bidders become infinitely risk averse (i.e., as
α → ∞)?
9.5 In a private values model, argue that it is a weakly dominant strategy for a bidder to bid his value in
a second-price auction even if the joint distribution of the bidders’ values exhibits correlation.
9.6 Use the equilibria of the second-price, Dutch, and English auctions to construct incentive-compatible
direct selling mechanisms for each of them in which the ex post assignment of the object to bidders
as well as their ex post payments to the seller are unchanged.
9.7 Prove part (i) of Theorem 9.5 under the assumption that both p̄i (vi ) and c̄i (vi ) are differentiable at
every vi ∈ [0, 1].
9.8 In a first-price, all-pay auction, the bidders simultaneously submit sealed bids. The highest bid wins
the object and every bidder pays the seller the amount of his bid. Consider the independent private
values model with symmetric bidders whose values are each distributed according to the distribution
function F, with density f .
9.9 Suppose there are just two bidders. In a second-price, all-pay auction, the two bidders simultaneously
submit sealed bids. The highest bid wins the object and both bidders pay the second-highest bid.
9.10 Consider the following variant of a first-price auction. Sealed bids are collected. The highest bidder
pays his bid but receives the object only if the outcome of the toss of a fair coin is heads. If the
outcome is tails, the seller keeps the object and the high bidder’s bid. Assume bidder symmetry.
(d) Both with and without using the revenue equivalence theorem, show that the seller’s expected
revenue is exactly half that of a standard first-price auction.
9.11 Suppose all bidders’ values are uniform on [0, 1]. Construct a revenue-maximising auction. What is
the reserve price?
9.12 Consider again the case of uniformly distributed values on [0, 1]. Is a first-price auction with the
same reserve price as in the preceding question optimal for the seller? Prove your claim using the
revenue equivalence theorem.
9.13 Suppose the bidders’ values are i.i.d., each according to a uniform distribution on [1, 2]. Construct a
revenue-maximising auction for the seller.
9.14 Suppose there are N bidders with independent private values where bidder i’s value is uniform on
[ai , bi ]. Show that the following is a revenue-maximising, incentive-compatible direct selling mech-
anism. Each bidder reports his value. Given the reported values v1 , . . . , vN , bidder i wins the object if
vi is strictly larger than the N − 1 numbers of the form max[ai , bi /2 + max(0, vj − bj /2)] for j = i.
Bidder i then pays the seller an amount equal to the largest of these N − 1 numbers. All other bidders
pay nothing.
9.15 A drawback of the direct mechanism approach is that the seller must know the distribution of the
bidders’ values to compute the optimal auction. The following exercise provides an optimal auction
that is distribution-free for the case of two asymmetric bidders, 1 and 2, with independent private
values. Bidder i’s strictly positive and continuous density of values on [0, 1] is fi with distribution
Fi . Assume throughout that vi − (1 − Fi (vi ))/fi (vi ) is strictly increasing for i = 1, 2.
The auction is as follows. In the first stage, the bidders each simultaneously submit a sealed
bid. Before the second stage begins, the bids are publicly revealed. In the second stage, the bidders
must simultaneously declare whether they are willing to purchase the object at the other bidder’s
revealed sealed bid. If one of them says ‘yes’ and the other ‘no’, then the ‘yes’ transaction is carried
out. If they both say ‘yes’ or both say ‘no’, then the seller keeps the object and no payments are
made. Note that the seller can run this auction without knowing the bidders’ value distributions.
(a) Consider the following strategies for the bidders: In the first stage, when his value is vi , bidder
i = j submits the sealed bid b∗i (vi ) = bi , where bi solves
1 − Fj (bi ) 1 − Fi (vi )
bi − = max 0, vi − .
fj (bi ) fi (vi )
(Although such a bi need not always exist, it will always exist if the functions v1 −
(1 − F1 (v1 ))/f1 (v1 ) and v2 − (1 − F2 (v2 ))/f2 (v2 ) have the same range. So, assume this is the
case.)
In the second stage each bidder says ‘yes’ if and only if his value is above the other bidder’s
first-stage bid.
Show that these strategies constitute an equilibrium of this auction. (Also, note that while the
seller need not know the distribution of values, each bidder needs to know the distribution of the
other bidder’s values to carry out his strategy. Hence, this auction shifts the informational burden
from the seller to the bidders.)
(b) (i) Show that in this equilibrium the seller’s expected revenues are maximised.
(ii) Is the outcome always efficient?
488 CHAPTER 9
(c) (i) Show that it is also an equilibrium for each bidder to bid his value and then to say ‘yes’ if
and only if his value is above the other’s bid.
(ii) Is the outcome always efficient in this equilibrium?
(d) Show that the seller’s revenues are not maximal in this second equilibrium.
(e) Unfortunately, this auction possesses many equilibria. Choose any two strictly increasing
functions gi : [0, 1] → R2 i = 1, 2, with a common range. Suppose in the first stage that
bidder i = j with value vi bids b̃i (vi ) = bi , where bi solves gj (bi ) = gi (vi ) and says ‘yes’ in the
second stage if and only if his value is strictly above the other bidder’s bid. Show that this is an
equilibrium of this auction. Also, show that the outcome is always efficient if and only if gi = gj .
9.16 Show that condition (9.18) is satisfied when each Fi is a convex function. Is convexity of Fi
necessary?
9.17 Consider the independent private values model with N possibly asymmetric bidders. Suppose we
restrict attention to efficient individually rational, incentive-compatible direct selling mechanisms;
i.e., those that always assign the object to the bidder who values it most.
(a) What are the probability assignment functions?
(b) What then are the cost functions?
(c) What cost functions among these maximise the seller’s revenue?
(d) Conclude that among efficient individually rational, incentive-compatible direct selling mech-
anisms, a second-price auction maximises the seller’s expected revenue. (What about the other
three standard auction forms?)
9.18 Call a direct selling mechanism pi (·), ci (·), i = 1, . . . , N deterministic if the pi take on only the
values 0 or 1.
(a) Assuming independent private values, show that for every incentive-compatible deterministic
direct selling mechanism whose probability assignment functions, pi (vi , v−i ), are non-
decreasing in vi for every v−i , there is another incentive-compatible direct selling mechanism
with the same probability assignment functions (and, hence, deterministic as well) whose cost
functions have the property that a bidder pays only when he receives the object and when he
does win, the amount that he pays is independent of his reported value. Moreover, show that the
new mechanism can be chosen so that the seller’s expected revenue is the same as that in the old.
(b) How does this result apply to a first-price auction with symmetric bidders, wherein a bidder’s
payment depends on his bid?
(c) How does this result apply to an all-pay, first-price auction with symmetric bidders wherein
bidders pay whether or not they win the auction?
9.19 Show that it is a weakly dominant strategy for each bidder to report his value truthfully in the
optimal direct mechanism we derived in this chapter.
9.20 Under the assumption that each bidder’s density, fi , is continuous and strictly positive and that each
vi − (1 − Fi (vi ))/fi (vi ) is strictly increasing,
(a) Show that the optimal selling mechanism entails the seller keeping the object with strictly
positive probability.
(b) Show that there is precisely one ρ ∗ ∈ [0, 1] satisfying ρ ∗ − (1 − F(ρ ∗ ))/f (ρ ∗ ) = 0.
AUCTIONS AND MECHANISM DESIGN 489
9.21 Show that when the bidders are symmetric, the first-price, Dutch, and English auctions all are
optimal for the seller once an appropriate reserve price is chosen. Indeed, show that the optimal
reserve price is the same for all four of the standard auctions.
9.22 You are hired to study a particular auction in which a single indivisible good is for sale. You find out
that N bidders participate in the auction and each has a private value v ∈ [0, 1] drawn independently
from the common density f (v) = 2v, whose cumulative distribution function is F(v) = v2 . All you
know about the auction rules is that the highest bidder wins. But you do not know what he must
pay, or whether bidders who lose must pay as well. On the other hand, you do know that there is
an equilibrium, and that in equilibrium each bidder employs the same strictly increasing bidding
function (the exact function you do not know), and that no bidder ever pays more than his bid.
9.23 We have so far assumed that the seller has a single good for sale. Suppose instead that the seller
has two identical goods for sale. Further, assume that even though there are two identical goods
for sale, each bidder wishes to win just one of them (he doesn’t care which one, since they are
identical). There are N bidders and each bidder’s single value, v, for either one of the goods is drawn
independently from [0, 1] according to the common density function f (·). So, if a bidder’s value
happens to be v = 1/3, he is willing to pay at most 1/3 to receive one of the two goods.
Suppose that the seller employs the following auction to sell the two goods. Each bidder
is asked to submit a sealed bid. The highest two bids win and the two winners each pay the
third-highest bid. Losers pay nothing.
(a) Argue that a bidder can do no better in this auction than to bid his value.
(b) Find an expression for the seller’s expected revenue. You may use the fact that the density, g(v),
of the third-highest value among the N bidder values is g(v) = 16 N(N − 1)(N − 2)f (v)(1 −
F(v))2 F N−3 (v).
9.24 Consider again the situation described in Exercise 9.23, but now suppose that a different mechanism
is employed by the seller to sell the two goods as follows. He randomly separates the N bidders
into two separate rooms of N/2 bidders each (assume that N is even) and runs a standard first-price
auction in each room.
(a) What bidding function does each bidder employ in each of the two rooms? (Pay attention to the
number of bidders.)
(b) Assume that each bidder’s value is uniformly distributed on [0, 1] (therefore f (v) = 1 and
F(v) = v for all v ∈ [0, 1]
(i) Find the seller’s expected revenue as a function of the total number of bidders, N.
(ii) By comparing your result in (i) with your answer to part (b) of Exercise 9.23, show that
the seller’s expected revenue is higher when he auctions the two goods simultaneously
than when he separates the bidders and auctions the two goods separately.
(iii) Would the seller’s expected revenue be any different if he instead uses a second-price
auction in each of the two rooms?
490 CHAPTER 9
(iv) Use your answer to (iii) to provide an intuitive – although perhaps incomplete – explanation
for the result in (ii).
9.25 Suppose there are N bidders and bidder i’s value is independently drawn uniformly from [ai , bi ].
(a) Prove that the seller can maximise his expected revenue by employing an auction of the
following form. First, the seller chooses, for each bidder i, a possibly distinct reserve price ρi .
(You must specify the optimal value of ρi for each bidder i.) Each bidder’s reserve price is public
information. The bidders are invited to submit sealed bids and the bidder submitting the highest
positive bid wins. Only the winner pays and he pays his reserve price plus the second-highest
bid, unless the second-highest bid is negative, in which case he pays his reserve price.
(b) Prove that each bidder has a weakly dominant bidding strategy in the auction described in (a).
9.26 Establish the ‘if’ part of the claim in (9.20). In particular, prove that if x̂ ∈ X solves
N
max vi (x),
x∈X
i=1
then, there is no y ∈ X and no income transfers, τi , among the N individuals such that
vi (y) + τi ≥ vi (x) for every i with at least one inequality strict.
9.27 Justify the definition of ex post Pareto efficiency given in Definition 9.3 using arguments similar to
those used to establish (9.20). The transfers may depend on the entire vector of types.
9.28 Show that Definition 9.4 is equivalent to Definition 9.1 when (i) there is a single object available,
(ii) there are N + 1 individuals consisting of N bidders and one seller, and (iii) the social states are
the N + 1 allocations in which, either, one of the bidders ends up with the good or the seller ends
up with the good.
9.29 Consider the ‘circle’ mechanism described in Theorem 9.11. Instead of defining the new cost
functions by paying the individual on one’s right one’s expected externality given one’s type,
suppose one pays each of the other individuals an equal share of one’s expected externality. Prove
that the conclusions of Theorem 9.11 remain valid.
9.30 Consider Examples 9.3–9.5.
(a) Show that the cost formula given in Theorem 9.11 for the budget-balanced expected externality
mechanism yields cost functions that can be equivalently described by the following table.
(b) We know that the mechanism described in Example 9.5 is incentive-compatible. Nonetheless,
show by direct computation that, when individual 1’s type is t1 = 3 and individual 2 always
reports truthfully, individual 1 can do no better than to report his type truthfully.
(c) Construct the table analogous to that in part (a) when the town consists of N = 3 individuals.
AUCTIONS AND MECHANISM DESIGN 491
9.31 Consider Example 9.3. Add the social state ‘Don’t Build’ (D) to the set of social states so that
X = {D, S, B}. Suppose that for each individual i,
vi (D, ti ) = ki
is independent of ti .
(a) Argue that one interpretation of ki is the value of the leisure time individual i must give up
towards the building of either the pool or the bridge. (For example, all the ki might be zero
except k1 > 0, where individual 1 is the town’s only engineer.)
(b) What are the interim individual rationality constraints if individuals have property rights over
their leisure time?
(c) When is it efficient to build the pool? The bridge?
(d) Give sufficient conditions for the existence of an ex post efficient mechanism both when
individuals have property rights over their leisure time and when they do not. Describe the
mechanism in both cases and show that the presence of property rights makes it more difficult
to achieve ex post efficiency.
9.32 Consider Examples 9.3–9.7 and suppose that the VCG mechanism is used there. Without making
any explicit computations, show that each individual i’s expected utility in the truth-telling
equilibrium is non-decreasing in his type.
9.33 Consider Example 9.7.
(a) Compute the expected VCG costs for the engineer, i = 1, and for the other individual, i = 2,
as a function of their types. Argue that the values in the second row of the table in Example 9.5
are valid for the engineer. Why are they not valid for the other individual, i = 2? Show that the
expected VCG costs for i = 2 are given by the following table.
t2 : 1 2 3 4 5 6 7 8 9
c̄VCG
2 (t2 ) : 20
9
16
9
13
9
11
9
10
9
10
9
11
9
13
9
16
9
type is zero. Now apply Theorem 9.14 to conclude that the expected cost functions must be identical
in the two mechanisms. Finally, conclude that expected revenues must be identical.
9.36 Reconsider Example 9.8. Suppose that the indivisible object is a business that is jointly owned
by two partners whose preferences and private information are exactly as in Example 9.8. Let
us call the partners individuals 1 and 2 here. Suppose that individual i’s share of the business is
αi ∈ [0, 1] and that α1 + α2 = 1. The significance of the ownership shares is that they translate into
individually rational utilities as follows. For i = 1, 2,
Thus, each individual i has the right to a fraction αi of the business. Suppose now that the partnership
is to be dissolved.
(a) Under what conditions on the ownership shares, αi , can the partnership be dissolved with an
incentive-compatible, ex post efficient, budget-balanced, individually rational mechanism?
Comment on the effect of asymmetric versus symmetric ownership shares.
(b) Can you generalise the result in (a) to N partners?
MATHEMATICAL
APPENDICES
CHAPTER A1
SETS AND MAPPINGS
necessary for B because ‘x is an integer less than 10’ is implied by the statement ‘x is an
integer less than 8’. If we form the contrapositive of these two statements, the statement
∼A becomes ‘x is not an integer less than 10’, and the statement ∼B becomes ‘x is not
an integer less than 8’. Beware that the statement ‘∼A ⇐ ∼B’ is false. The value of x
could well be 9. We must reverse the direction of implication to obtain a contrapositive
statement that is also true. The proper contrapositive statement, therefore, would be ‘x is
not an integer less than 8’ implied by ‘x is not an integer less than 10’, or ∼B ⇐ ∼A.
The notion of necessity is distinct from that of sufficiency. When we say, ‘A is suffi-
cient for B’, we mean that whenever A holds, B must hold. We can say, ‘A is true only if
B is true’, or that ‘A implies B’ (A ⇒ B). Once again, whenever the statement A ⇒ B is
true, the contrapositive statement, ∼B ⇒ ∼A, is also true.
Two implications, ‘A ⇒ B’ and ‘A ⇐ B’, can both be true. When this is so, we say
that ‘A is necessary and sufficient for B’, or that ‘A is true if and only if B is true’, or ‘A
iff B’. When A is necessary and sufficient for B, we say that the statements A and B are
equivalent and write ‘A ⇐⇒ B’.
To illustrate briefly, suppose that A and B are the following statements: A ≡ ‘X
is yellow’, B ≡ ‘X is a lemon’. Certainly, if X is a lemon, then X is yellow. Here, A is
necessary for B. At the same time, just because X is yellow does not mean that it must
be a lemon. It could be a banana. So A is not sufficient for B. Suppose instead that the
statements are A ≡ ‘X is a sour, yellow-skinned fruit’ and B ≡ ‘X is a lemon’. Here, A is
implied by B, or A is necessary for B. If X is a lemon, then it must be a yellow and sour
fruit. At the same time, A implies B, or A is sufficient for B, because if X is a yellow and
sour fruit, it must be a lemon. Because A is necessary and sufficient for B, there must be
an equivalence between lemons and sour, yellow-skinned fruit.
cA B
A U A\B B\A
A B A B
AB U AB
The basic operations on sets are union and intersection. They correspond to the
logical notions of ‘or’ and ‘and’, respectively.1 For two sets S and T, we define the union
of S and T as the set S ∪ T ≡ {x | x ∈ S or x ∈ T}. We define the intersection of S and T as
the set S ∩ T ≡ {x | x ∈ S and x ∈ T}. Some of these sets are illustrated in Fig. A1.1.
Sometimes we want to examine sets constructed from an arbitrary number of other
sets. We could use some notation such as {S1 , S2 , S3 , . . .} to denote the set of all sets that
concern us, but it is more common to collect the necessary (possibly infinite) number of
integers starting with 1 into a set, I ≡ {1, 2, 3, . . .}, called an index set, and denote the
collection of sets more simply as {Si }i∈I . We would denote the union of all sets in the
collection by ∪i∈I Si , and the intersection of all sets in the collection as ∩i∈I Si .
The product of two sets S and T is the set of ‘ordered pairs’ in the form (s, t), where
the first element in the pair is a member of S and the second is a member of T. The product
of S and T is denoted
S × T ≡ {(s, t) | s ∈ S, t ∈ T}.
One familiar set product is the ‘Cartesian plane’. This is the plane in which you commonly
graph things. It is the visual representation of a set product constructed from the set of real
numbers. The set of real numbers is denoted by the special symbol R and is defined as
R × R ≡ {(x1 , x2 ) | x1 ∈ R, x2 ∈ R},
1 In everyday language, the word ‘or’ can be used in two senses. One, called the exclusive ‘or’, carries the meaning
‘either, but not both’. In mathematics, the word ‘or’ is used in the inclusive sense. The inclusive ‘or’ carries the
meaning ‘either or both’.
SETS AND MAPPINGS 499
x2
x0 (x10, x20 )
x20
x1
x10
Figure A1. 2. The Cartesian plane, R2 .
then any point in the set (any pair of numbers) can be identified with a point in the Cartesian
plane depicted in Fig. A1.2. The set R × R is sometimes called ‘two-dimensional
Euclidean space’ and is commonly denoted R2 .
More generally, any n-tuple, or vector, is just an n-dimensional ordered tuple
(x1 , . . . , xn ) and can be thought of as a ‘point’ in n-dimensional Euclidean space, or
‘n-space’. As before, n-space is defined as the set product
Rn ≡ R × R × · · · × R ≡ {(x1 , . . . , xn ) | xi ∈ R, i = 1, . . . , n}.
n times
Rn+ ≡ {(x1 , . . . , xn ) | xi ≥ 0, i = 1, . . . , n} ⊂ Rn .
We use the notation x ≥ 0 to indicate vectors in Rn+ , where each component xi is greater
than or equal to zero. We use the notation x 0 to indicate vectors where every component
of the vector is strictly positive. More generally, for any two vectors x and y in Rn , we say
that x ≥ y iff xi ≥ yi , i = 1, . . . , n. We say that x y if xi > yi , i = 1, . . . , n.
begin to appreciate the importance of convexity and its role in some fundamental optimisa-
tion problems in microeconomics. For now, we begin with a formal definition of a convex
set and then try to develop a feel for what we have defined.
tx1 + (1 − t)x2 ∈ S
This is not as bad as it seems at first, and you will quickly get used to what it says.
Basically, it says that a set is convex if for any two points in the set, all weighted aver-
ages of those two points (where the weights sum to 1) are also points in the same set. Let
us take the definition piece by piece, then put the pieces together to see what we get.
The kind of weighted average used in the definition is called a convex combination.
We say that z is a convex combination of x1 and x2 if z = tx1 + (1 − t)x2 for some number
t between zero and 1. Because t is between zero and 1, so is (1 − t), and the sum of the
weights, t + (1 − t), will always equal 1. A convex combination z is thus a point that, in
some sense, ‘lies between’ the two points x1 and x2 .
To make this clearer, let us take a simple example. Consider the two points x1 ∈ R
and x ∈ R, where x1 = 8 and x2 = 2, represented in Fig. A1.3. The convex combination,
2
z = x2 + t(x1 − x2 ).
If we think of the point x2 as the ‘starting point’, and the difference (x1 − x2 ) as the ‘dis-
tance’ from x2 to x1 , then this expression says that z is a point located at the spot x2 plus
some proportion t, less than or equal to 1, of the distance between x2 and x1 . If we sup-
pose that t takes the value zero, then in our example, z = x2 = 2. If t takes the value 1,
then z = x1 = 8. The extreme values of zero and 1 thus make the convex combination
of any two points coincide with one of those two points. Values of t between zero and 1
The point z if
1 2
t0 t t t1
2 3
x2 x1
2 5 6 8
x1 x 2 6
will make the convex combination take some value in between the two points. If t = 1/2,
then z = x2 + (1/2)(x1 − x2 ) = 2 + 3 = 5, which is the midpoint of the interval between
x1 and x2 . If t = 2/3, then z = x2 + (2/3)(x1 − x2 ) = 2 + 4 = 6, which is the point two-
thirds of the distance between x2 and x1 . You can see that as long as we choose a value of
t in the interval 0 ≤ t ≤ 1, the convex combination will always lie somewhere strictly in
between the two points, or it will coincide with one of the points.
The second part of the definition of a convex set refers not just to some convex
combinations of two points, but to all convex combinations of those points. In our example
from the real line, notice that x1 , x2 , and every point in between x1 and x2 can be expressed
as the convex combination of x1 and x2 for some value of t between zero and 1. The set of
all convex combinations of x1 and x2 will therefore be the entire line segment between x1
and x2 , including those two points.
These basic ideas carry over to sets of points in two dimensions as well. Consider
the two vectors in R2 , denoted x1 = (x11 , x21 ) and x2 = (x12 , x22 ). When we form the convex
combination of vectors, we must be careful to observe the rules of scalar multiplication
and vector addition. When a vector is multiplied by a scalar, the product is a vector, with
each component of the vector multiplied by that scalar. When two vectors are added, the
sum is a vector, with each component the sum of the corresponding components from the
two vectors being added. So the convex combination of x1 and x2 will be
z = tx1 + (1 − t)x2
= tx11 , tx21 + (1 − t)x12 , (1 − t)x22
= tx11 + (1 − t)x12 , tx21 + (1 − t)x22 .
In Fig. A1.4, x1 and x2 are the points with coordinates (x11 , x21 ) and (x12 , x22 ), respec-
tively. Their convex combination z has horizontal coordinate tx11 + (1 − t)x12 = x12 +
t(x11 − x12 ), or x12 plus the proportion t of the distance between the horizontal coordinates
of x1 and x2 . Similarly, the vertical coordinate of z is tx21 + (1 − t)x22 = x22 + t(x21 − x22 ),
or x22 plus the proportion t of the distance between the vertical coordinates of x2 and x1 .
Because each coordinate is the same proportion t of the distance between the respective
horizontal and vertical coordinates, the point z will lie in that same proportion t of the
distance between x2 and x1 along the chord connecting them. As we vary the proportion t,
we will move the coordinates of z back and forth between x11 and x12 on the horizontal axis,
and between x21 and x22 on the vertical axis, always keeping the horizontal and vertical
coordinates of z at the same proportion of the distance between the respective coordinates
of x1 and x2 . As we vary t, then, we will continue to describe vectors that lie at different
locations along the chord connecting x1 and x2 . Just as before, any point along that chord
can be described as a convex combination of x1 and x2 for some value of t between zero
and 1. The set of all convex combinations of the vectors x1 and x2 is, therefore, precisely
the set of all points on the chord connecting x1 and x2 , including the endpoints.
Now look again at the definition of a convex set. Read it carefully and you will see
that we could just as well have said that a set is convex if it contains all convex combi-
nations of every pair of points in the set. We therefore have a very simple and intuitive
502 CHAPTER A1
x2
1
x1
x2
1 2 z
z2 tx2 (1 t) x2
2 x2
x2
x1
2 1 2 1
x1 z1 tx1 (1 t) x1 x1
rule defining convex sets: A set is convex iff we can connect any two points in the set by a
straight line that lies entirely within the set. Examples of convex and non-convex sets are
shown in Fig. A1.5. Notice that convex sets are all ‘nicely behaved’. They have no holes,
no breaks, and no awkward curvatures on their boundaries. They are nice sets.
We will end our discussion of convex sets for now by noting a simple but important
property of sets constructed from convex sets.
Proof: Let S and T be convex sets. Let x1 and x2 be any two points in S ∩ T. Because
x1 ∈ S ∩ T, x1 ∈ S and x1 ∈ T. Because x2 ∈ S ∩ T, x2 ∈ S and x2 ∈ T. Let z = tx1 +
(1 − t)x2 , for t ∈ [0, 1], be any convex combination of x1 and x2 . Because S is a con-
vex set, z ∈ S. Because T is a convex set, z ∈ T. Because z ∈ S and z ∈ T, z ∈ S ∩ T.
Because every convex combination of any two points in S ∩ T is also in S ∩ T, S ∩ T is a
convex set.
SS
0 1
504 CHAPTER A1
Suppose that S = {1, 2, . . . , 10}, and consider the relation defined by the statement, ‘is
greater than’. This relation is not complete because one can easily find some x ∈ S and
some y ∈ S where it is neither true that x > y nor that y > x: for example, one could pick
x = y = 1, or x = y = 2, and so on. The definition of completeness does not require the
elements x and y to be distinct, so nothing prevents us from choosing them to be the same.
Because no integer can be either less than or greater than itself, the relation ‘is greater
than’ is not complete. However, the relation on S defined by the statement ‘is at least as
great as’ is complete: for any two integers, whether distinct or not, one will always be at
least as great as the other, as completeness requires.
Both the relations just considered are transitive. If x is greater than y and y is greater than z,
then x is certainly greater than z. The same is true for the relation defined by the statement
‘is at least as great as’.
A function is a very common though very special kind of relation. Specifically, a
function is a relation that associates each element of one set with a single, unique element
of another set. We say that the function f is a mapping from one set D to another set
R and write f : D → R. We call the set D the domain and the set R the range of the
mapping. If y is the point in the range mapped into by the point x in the domain, we write
y = f (x). To denote the entire set of points A in the range that is mapped into by a set of
points B in the domain, we write A = f (B). To illustrate, consider Fig. A1.7. Fig. A1.7(a)
is not a function, because more than one point in the range is assigned to points in the
y1
A f (B)
y1 y1
y1
x1 x1 x2
B
(a) (b)
1 1
I [1, 1]
1
R
2
2 2
S I
1 1
f 1(S)
D
(a) (b)
domain, such as x1 . Fig. A1.7(b) does depict a function because every point in the domain
is assigned some unique point in the range.
The image of f is that set of points in the range into which some point in the domain
is mapped, i.e., I ≡ {y | y = f (x), for some x ∈ D} ⊂ R. The inverse image of a set of
points S ⊂ I is defined as f −1 (S) ≡ {x | x ∈ D, f (x) ∈ S}. The graph of the function f
is familiar and is the set of ordered pairs G ≡ {(x, y) | x ∈ D, y = f (x)}. Some of these
ideas are illustrated in Fig. A1.8. In Fig. A1.8(a), we have let R = R, D = R, and have
graphed the function y = sin(x). The sine function, however, never takes values greater
than 1 or less than −1. Its image is therefore the subset of the range consisting of the
interval, I = [−1, 1]. In Fig. A1.8(b), we consider the function f : [0, 1] → [0, 1] given by
y = 12 x. Here, we have chosen to restrict both the domain and the range to the unit interval.
Once again, the image is a subset of the range I = [0, 12 ].
There is nothing in the definition of a function that prohibits more than one element
in the domain from being mapped into the same element in the range. In Fig. A1.7(b), for
example, both x1 and x2 are mapped into y1 , yet the mapping satisfies the requirements
of a function. If, however, every point in the range is assigned to at most a single point in
the domain, the function is called one-to-one. If the image is equal to the range – if every
point in the range is mapped into by some point in the domain – the function is said to be
onto. If a function is one-to-one and onto, then an inverse function f −1 : R → D exists
that is also one-to-one and onto.
x2
x2
x 22
2
b
2
a
2)
x22 x21 b
1,x
x
d(
x1
x 21
x12 x11 a
x1
x 11 x 12
ideas discussed here may be generalised to arbitrary types of sets, we confine ourselves to
considering sets in Rn , i.e., sets that contain real numbers or vectors of real numbers.
We begin by loosely describing the notion of a metric and a metric space. A metric
is simply a measure of distance. A metric space is just a set with a notion of distance
defined among the points within the set. The real line, R, together with an appropriate
function measuring distance, is a metric space. One such distance function, or metric, is
just the absolute-value function. For any two points x1 and x2 in R, the distance between
them, denoted d(x1 , x2 ), is given by
d(x1 , x2 ) =| x1 − x2 | .
The Cartesian plane, R2 , is also a metric space. A natural notion of distance defined
on the plane is inherited from Pythagoras. Choose any two points x1 = (x11 , x21 ) and x2 =
(x12 , x22 ) in R2 , as in Fig. A1.9. Construct the right triangle connecting the two points. If the
horizontal leg is of length a and the vertical leg is length b, Pythagoras tells √ us the length
of the hypotenuse – the distance between the points x1 and x2 – is equal to a2 + b2 . Now
a2 is just the square of the difference between the x1 components of the two points, and
b2 is the square of the difference in their x2 components. The length of the hypotenuse, or
d(x1 , x2 ), is therefore
2 2 2
d(x1 , x2 ) = a2 + b2 = x1 − x11 + x22 − x21 .
Whether it is obvious at first glance, both of these distance formulae can in fact
be viewed as special cases of the same formula. For x1 and x2 in R, the absolute value
| x1 − x2 | can be expressed as the square root of the product of (x1 − x2 ) with itself.
So we could write d(x , x ) =| x − x |= (x1 − x2 )(x1 − x2 ). For x1 and x2 in R2 , if
1 2 1 2
we can first apply the rules of vector subtraction to obtain the difference (x1 − x2 ) =
SETS AND MAPPINGS 507
(x11 − x12 , x21 − x22 ), then apply the rules of vector (dot) multiplication to multiply this
difference with itself, we obtain
(x1 − x2 ) · (x1 − x2 ) = x11 − x12 , x21 − x22 · x11 − x12 , x21 − x22
2 2
= x11 − x12 + x21 − x22
2 2
= (−1)2 x12 − x11 + (−1)2 x22 − x21
2 2
= x12 − x11 + x22 − x21 .
Notice that this vector product produces a scalar that is precisely the same as the scalar
beneath the radical in our earlier Pythagorean formula. Pythagoras tells us, therefore, that
we can again measure the distance between two points as the square root of a product of
the difference between the two points, this time the vector product of the vector difference
of two points in the
plane. Analogous to the case of points on the line, we can therefore
write d(x1 , x2 ) = (x1 − x2 ) · (x1 − x2 ) for x1 and x2 in R2 .
The distance between any two points in Rn is just a direct extension of these ideas.
In general, for x1 and x2 in Rn ,
d(x1 , x2 ) ≡ (x1 − x2 ) · (x1 − x2 )
1 2 2 2
≡ x1 − x12 + x21 − x22 + · · · + xn1 − xn2 ,
which we summarise with the notation d(x1 , x2 ) ≡ x1 − x2 . We call this formula the
Euclidean metric or Euclidean norm. Naturally enough, the metric spaces Rn that use
this as the measure of distance are called Euclidean spaces.
Once we have a metric, we can make precise what it means for points to be ‘near’
each other. If we take any point x0 ∈ Rn , we define the set of points that are less than
a distance ε > 0 from x0 as the open ε-ball with centre x0 . The set of points that are a
distance of ε or less from x0 is called the closed ε-ball with centre x0 . Notice carefully that
any ε-ball is a set of points. Formally:
2. The closed ε-ball with centre x0 and radius ε > 0 is the subset of points in Rn :
Some examples of open and closed balls are provided in Fig. A1.10. On the real line, the
open ball with centre x0 and radius ε is just the open interval Bε (x0 ) = (x0 − ε, x0 + ε).
The corresponding closed ball is the closed interval B∗ε (x0 ) = [x0 − ε, x0 + ε]. In R2 , an
open ball Bε (x0 ) is a disc consisting of the set of points inside, or on the interior of, the
circle of radius ε around the point x0 . The corresponding closed ball in the plane, B∗ε (x0 ), is
the set of points inside and on the edge of the circle. In three-space, the open ball Bε (x0 )
is the set of points inside the sphere of radius ε. The closed ball B∗ε (x0 ) is the set of points
inside and on the surface of the sphere. In R4 and higher dimensions, geometric intuition
is rather difficult, but the idea remains the same.
We have a pretty good intuitive feel for what the difference is between open and
closed intervals on the real line. The concept of the ε-ball allows us to formalise that
difference and make the distinction applicable to sets in higher-dimensional spaces. We
use the ε-ball to define open and closed sets and to establish some important results about
them.
The definition says that a set is open if around any point in it we can draw some open
ball – no matter how small its radius may have to be – so that all the points in that ball
will lie entirely in the set. This way of formalising things captures the essential aspects of
SETS AND MAPPINGS 509
the familiar open interval on the real line. Because an open interval (a, b) includes every
point between a and b, but not a and b themselves, we can choose any point x in that
interval, no matter how close to a or b, and find some small but positive ε such that the
open ball (here the open interval) Bε (x) = (x − ε, x + ε) is contained entirely within the
interval (a, b) itself. Thus, every open interval on the real line is an open set. Likewise,
open discs and open spheres are open sets by this definition. More generally, any open ball
is an open set.
To see this, let S be the open ball with centre x0 and radius ε in Fig. A1.11. If we take
any other point x in S, it will always be possible to draw an open ball around x whose points
all lie within S by choosing the radius of the ball around x carefully enough. Because x is
in S, we know that d(x0 , x) < ε. Thus, ε − d(x0 , x) > 0. If we let ε = ε − d(x0 , x) > 0,
then it will always be the case that Bε (x) ⊂ S no matter how close we take x to the edge
of the circle, as required. The following theorem is basic.
Proof: The second of these hardly requires any proof. Briefly, if we take any point in Rn
and any ε > 0, then the set Bε (x) will of course consist entirely of points in Rn by the
definition of the open ball. Thus, Bε (x) ⊂ Rn , so Rn is open. Likewise, the first of these is
(vacuously) true. If there are no points in ∅, then it will of course be true that ‘for every
point in ∅, we can find an ε, . . .’ satisfying the definition of an open set.
The last two, however, are worth proving. We will prove (3) here, leaving the proof
of (4) as an exercise.
Let Si be an open set for all i ∈ I, where I is some index set. We must show that
∪i∈I Si is an open set. So let x ∈ ∪i∈I Si . Then x ∈ Si for some i ∈ I. Because Si is open,
Bε (x) ⊂ Si for some ε > 0. Consequently, Bε (x) ⊂ ∪i∈I Si , which shows that ∪i∈I Si is
open.
Open sets have an interesting and useful property. They can always be described,
exactly, as a collection of different open sets. A moment’s reflection (perhaps more) should
convince you of this. Suppose we start with some open set. Since our set is open, we can
take each point in the set and ‘surround’ it with an open ball, all of whose points are
contained within our set. Each of these open balls is itself an open set, as we illustrated in
Fig. A1.11. Think now of the union of all these open balls. By Theorem A1.2, this union
of open balls must be some open set. Can you think of a point in the set we started with
that is not in this union of open balls? Can you think of a point that is in this union of open
balls but that is not in our original set? If you answered ‘no’ to both of these questions,
you have convinced yourself that the two sets are the same! This property of open sets is
important enough to warrant the status of a theorem.
S= Bεx (x).
x∈S
Proof: We have already looked at the ideas involved, so we can prove this rather quickly.
Let S ⊂ Rn be open. Then, for each x ∈ S, there exists some εx > 0 such that Bεx (x) ⊂ S
because S is open. We have to show that x ∈ S implies that x ∈ ∪x∈S Bεx (x), and we must
show that x ∈ ∪x∈S Bεx (x) implies that x ∈ S.
If x ∈ S, then x ∈ Bεx (x) by the definition of the open ball around x. But then x is
in any union that includes this open ball, so in particular, we must have x ∈ ∪x∈S Bεx (x),
completing the first part of the proof.
Now, if x ∈ ∪x∈S Bεx (x), then x ∈ Bεs (s), for some s ∈ S. But we chose every ε-ball
so that it was entirely contained in S. Therefore, if x ∈ Bεs (s) ⊂ S, we must have x ∈ S,
completing the second part of the proof.
... ...
a b
A B
It is worth taking a minute to see that this definition ‘works’, giving results that
correspond to our intuition. To see that it does, consider the simplest case. We know that
a closed interval in the real line is (or should be) a closed set. Does a closed interval
satisfy the definition of a closed set? Consider the interval [a, b] = {x | x ∈ R, a ≤ x ≤ b}
in Fig. A1.12. Now consider the two sets A = {x | x ∈ R, −∞ < x < a} and B = {x | x ∈
R, b < x < +∞}. A and B are the open intervals on either side of [a, b]. Because open
intervals are open sets, A and B are open sets. By Theorem A1.2, the union of open sets
is an open set, so A ∪ B = {x | x ∈ R, −∞ < x < a or b < x < +∞} is an open set. But
A ∪ B is the complement of the set [a, b] in R. Because [a, b]c is an open set, [a, b] is a
closed set by the definition of closed sets, as we wanted to show. In higher-dimensional
spaces, any closed ball is a closed set. In R2 , a closed disk is a closed set, and in R3 a
closed sphere is a closed set.
Loosely speaking, a set in Rn is open if it does not contain any of the points on its
boundary, and is closed if it contains all of the points on its boundary. More precisely, a
point x is called a boundary point of a set S in Rn if every ε-ball centred at x contains
points in S as well as points not in S. The set of all boundary points of a set S is denoted ∂S.
A set is open if it contains none of its boundary points; it is closed if it contains all of its
boundary points. Pushing things a bit further, we can define what it means to be an interior
point of a set. A point x ∈ S is called an interior point of S if there is some ε-ball centred
at x that is entirely contained within S, or if there exist some ε > 0 such that Bε (x) ⊂ S.
The set of all interior points of a set S is called its interior and is denoted int S. Looking
at things this way, we can see that a set is open if it contains nothing but interior points,
or if S = int S. By contrast, a set S is closed if it contains all its interior points plus all its
boundary points, or if S = int S ∪ ∂S. Fig. A1.13 illustrates some of these ideas for sets in
R2 . We will complete our discussion of open and closed sets by noting the corresponding
properties of closed sets to those we noted for open sets in Theorem A1.2.
Proof: The empty set and the whole of Rn are the only two sets that are both open and
closed in Rn . We have seen that they are open in Theorem A1.2. To see that they are
closed, notice that each is the complement of the other in Rn . Because ∅ = {Rn }c , and Rn
512 CHAPTER A1
x2
/ ∂S
x
S
x int S
∂S
x ∂S
x1
Another important concept is that of a bounded set. Very loosely, a set is bounded if
it can be ‘enclosed’ in a ball. The following definition makes things more precise.
By this definition, a set is bounded if we can always draw some ε-ball entirely around
it. The definition becomes more intuitive if we confine ourselves to ε-balls centred at the
origin, 0 ∈ Rn . From this perspective, S is bounded if there exists some finite distance ε
such that every point in S is no farther from the origin than ε.
The open ball Bε (x0 ) in Fig. A1.14 is a bounded set because it can be entirely
contained within the ball centred at x0 but with radius ε = ε + 1. Alternatively, we could
say that Bε (x0 ) is bounded because none of its points is farther than a distance ε from
the origin. Notice also that every open interval on the real line (a, b) ⊂ R is a bounded
SETS AND MAPPINGS 513
a b a b
(a) (b)
set. Again, in Fig. A1.14, we can entirely contain the open interval (a, b) within the ball
centred at x ∈ R with radius (for example) ε =| x − (a − 1) |.
There is some particular terminology that applies to bounded sets from the real line.
Let S ⊂ R be any non-empty set of real numbers. Any real number l (whether l is in S or
not) for which l ≤ x for all x ∈ S is called a lower bound for the set of numbers S. For
example, if S = {4, 6, 8}, the number 1 is a lower bound for S, as is the number 4. Likewise,
any real number u (whether u is in S or not) for which x ≤ u for all x ∈ S is called an upper
bound for S. In our previous example, 27 ∈ / S is an upper bound for S, as is the number 8 ∈
S. A set of numbers S ⊂ R is bounded from below if it has a lower bound, and is bounded
from above if it has an upper bound. The interval (−∞, 4) is bounded from above but is
not bounded from below. Any set of numbers that is both bounded from above and bounded
from below is of course bounded by Definition A1.7. (Make sure you see why.)
We have seen that any subset S of real numbers will generally have many lower
bounds and many upper bounds. The largest number among those lower bounds is called
the greatest lower bound (g.l.b.) of S. The smallest number among the upper bounds is
called the least upper bound (l.u.b.) of S. The basic axioms of the real number system can
be used to show that there will always exist a g.l.b. and a l.u.b. for any bounded subset of R.
Consider the open and closed intervals (a, b) and [a, b] depicted in Fig. A1.15. It is
intuitively obvious from the figure, and easy to prove in general, that any closed subset in
514 CHAPTER A1
R will contain its g.l.b. and its l.u.b. By contrast, any open subset in R will not contain
its g.l.b. or its l.u.b. In Fig. A1.15(a), a is the g.l.b. and b is the l.u.b. of the open interval
(a, b), and neither is contained within that interval. In Fig. A1.15(b), a is the g.l.b. and b
is the l.u.b. of the closed interval [a, b], and both are contained within that interval. This
result is worth recording for future reference.
open ball Bε (a) = (a − ε, a + ε) are contained in Sc by the definition of open sets. If a < x
for all x ∈ S and Bε (a) ⊂ Sc , then every point in the interval (a − ε, a + ε) must be strictly
less than every point in S, too. [If this were not the case, we would have x ≤ a − ε < a
for some x ∈ S, contradicting the claim that a is a lower bound for S, or we would have
x ∈ Bε (a) ⊂ Sc , contradicting x ∈ S.] In particular, the point a + 12 ε ∈ (a − ε, a + ε) and
a + 12 ε < x for all x ∈ S. But then a + 12 ε is a lower bound for S and a + 12 ε > a, so a is not
the greatest lower bound of S, contradicting our original assumption. We must conclude,
therefore, that a ∈ S.
We have discussed closed sets and bounded sets. Subsets of Rn that are both
closed and bounded are called compact sets, and these are very common in economic
applications. We will note the following for future reference.2
2 Compactness is actually a topological property all its own. However, the Heine-Borel theorem shows that for
sets in Rn that property is equivalent to being closed and bounded.
SETS AND MAPPINGS 515
Any open interval in R is not a compact set. It may be bounded, as we have seen,
but it is not closed. Similarly, an open ball in Rn is not compact. However, every closed
and bounded interval in R is compact, as is every closed ball with finite radius in Rn . All
of Rn is not compact because, although it may be closed, it is not bounded.
A1.3.1 CONTINUITY
The last topological concept we will consider is continuity. In most economic applica-
tions, we will either want to assume that the functions we are dealing with are continuous
functions or we will want to discover whether they are continuous when we are unwilling
simply to assume it. In either instance, it is best to have a good understanding of what it
means for a function to be continuous and what the properties of continuous functions are.
Intuitively, we know what a continuous function is. The function in Fig. A1.16(a) is
continuous, whereas the function in Fig. A1.16(b) is not. Basically, a function is continu-
ous if a ‘small movement’ in the domain does not cause a ‘big jump’ in the range. We can,
however, get a bit more precise than that. For simple functions like those in Fig. A1.16,
the following definition of continuity should be familiar from your single-variable calculus
course.
Let us take a moment to see that this definition gives results that correspond to our
intuition about functions such as those in Fig. A1.16. Consider Fig. A1.16(b) first and
we shall see why it fails to be continuous at x0 according to the earlier definition. The
definition requires that for any ε > 0, we be able to find some δ > 0 such that whenever
a point x lies within a distance δ of x0 , its image f (x) will lie within a distance ε of f (x0 ).
Suppose that we pick ε > 0. A moment’s reflection will convince you that there is no
δ > 0 satisfying this condition. To see this, notice that within any positive δ distance from
x0 , such as δ , there will be points to the right of x0 , such as x1 . Every point like x1 is
mapped by f into points like f (x1 ) on the upper segment of the curve, well beyond a
distance ε from the image of x0 , f (x0 ). Because we have found at least one x0 ∈ R and
ε > 0 such that no δ > 0 satisfying the condition exists, the function fails to be continuous
under the definition given.
On the other hand, it is clear that the function in Fig. A1.16(a) does satisfy the
definition. To convince yourself that it does, consider this: suppose we pick some x0 ∈ R
and some ε > 0, such as ε . The points f (x0 ) + ε and f (x0 ) − ε are mapped into by points
x0 + a and x0 − b, respectively. Now, if for this ε > 0, we choose δ as the smaller of a > 0
and b > 0, which in this case is a, we can be sure that the image of every point within a
distance δ = a on either side of x0 is mapped into a point no farther than a distance ε of
f (x0 )! Thus, f satisfies the definition of continuity at the point x0 .
The definition of continuity we have been considering is fine for characterising the
simple functions we have just been examining. It captures in a precise logical way how our
intuition tells us a continuous function should behave. If we look at that definition closely,
however, we will see that we can express exactly the same ideas more compactly and in a
language that makes those ideas more easily applied to situations involving a much broader
class of functions than just the very simple ones.
The simple definition of continuity tells us essentially that a function is continuous
at a point x0 in the domain if for all ε > 0, there exists a δ > 0 such that any point less
than a distance δ away from x0 (therefore every point less than a distance δ away from x0 )
is mapped by f into some point in the range that is less than a distance ε away from f (x0 ).
Now, we know how to characterise the entire set of points a distance less than δ from x0
in the domain. That is precisely the open ball centred at x0 with radius δ, Bδ (x0 ). In set
notation, we denote the set of points in the range mapped into by the points in Bδ (x0 ) as
f (Bδ (x0 )). Similarly, if f (x0 ) is the image of the point x0 , we can denote the set of points in
the range that lie a distance less than ε away from f (x0 ) by the open ball centred at f (x0 )
with radius ε, Bε (f (x0 )). To say that every point in Bδ (x0 ) is mapped by f into some point
no farther than ε from f (x0 ) is thus equivalent to saying that every point in f (Bδ (x0 )) is
in the set Bε (f (x0 )), or that f (Bδ (x0 )) ⊂ Bε (f (x0 )). Fig. A1.17, reproducing Fig. A1.16(a),
illustrates how these sets correspond to the more familiar terminology.
Fig. A1.17 is useful, but we need to build on the intuition behind it and to generalise
these ideas about continuity in two directions. First, we want a definition of continuity that
applies to functions from domains in Rm , not merely in R. Second, we need to account
for functions over domains that are subsets of Rm , rather than the whole of the space.3
√
3 For example, f (x) = x has domain equal to R+ ⊂ R.
SETS AND MAPPINGS 517
Before, we implicitly assumed that the domain of f was all of R. When that is the case, we
are assured that for any x0 ∈ R and any δ > 0, the ball Bδ (x0 ) is contained entirely within
the domain of f so that f (Bδ (x0 )) is well defined. However, when the domain D of some
function f is only a subset of Rm , we need not concern ourselves with all points in Rm
within distance δ of x0 , but only those in D within distance δ of x0 , namely, Bδ (x0 ) ∩ D.
The following definition generalises the notion of continuity in both these directions.
This definition of continuity focuses entirely on the relation between one set in the
image (the image of an open set in the domain) and another open set in the image. It
would be nice to know what, if any, properties of sets are preserved under continuous
mappings when we move back and forth between the image and the domain. Our intuition
suggests to us that a continuous function is a sufficiently ‘regular’ and predictable animal
that basic properties like openness and closedness are probably preserved in moving from
the domain to the range. Unfortunately, this is an instance where intuition fails. Except in a
very particular case to be mentioned later, we cannot take it for granted that every property
of sets in the domain is preserved in the image when those sets are mapped by continuous
functions. In particular, it is not true that a continuous function always maps an open set
in the domain into an open set in the range, or that closed sets are mapped into closed
sets. For example, the continuous function f (x) = a maps every point in the domain, thus
518 CHAPTER A1
every open set of points in the domain, into the single point a in the range. In Exercise
A1.25, you will convince yourself that a single point is a closed set, not an open set; so,
our intuition fails us.
When, as in Definition A1.9, we allow the domain of a function to be some (possibly
strict) subset D of Rm , it no longer makes sense to define open sets in the domain in terms
of open balls in Rm because these balls may lie partially, or entirely, outside of D. We need
to develop and use an appropriate language that accounts for possibilities of this sort. We
are thus led naturally to the following idea.
Thus, a set is open in D if for every point in the set, all nearby points are either in
the set or outside of D. Note that if D = Rm , this coincides with our definition of an open
set in Rm . Also note that D is always open in D.
As before, we define closedness in terms of openness.
Although we cannot, in general, be sure of what will happen as we move from the
domain to the range under a continuous mapping, we can say quite a bit about what hap-
pens as we move the other way – from the range to the domain. In fact, there is an intimate
relation between the continuity of a function and the properties of sets in the range and the
properties of their inverse images in the domain. The next theorem establishes a series of
equivalencies between the continuity of a function and the preservation of basic properties
of sets under its inverse mapping.
This is a very general and very powerful theorem. If we know something about
the inverse image of open sets or open balls in the range, we can use this theorem to
conclude whether the function involved is continuous. By the same token, if we know that
the function involved is continuous, we can use this theorem to tell us what properties the
inverse images of open balls and open sets in the range must possess. Still, it would be
nice to be able to say something about what happens to sets in the domain when they are
mapped into sets in the range. As you were warned earlier, we cannot say as much as we
think we would like to. Nonetheless, we can say something. In particular, it can be shown
that if S is a compact subset in the domain, and if f is a continuous function, then the
image set f (S) in the range of f is also a compact set. This, at last, is at least one intuitively
appealing result! Unfortunately, though, the proof takes us farther afield than we care to
go. Because it is an important result, it is worth recording here for future reference. The
interested (and equipped) reader can consult the reference for a proof.
In R1 , for example, the sequence 1, 1/2, 1/3, . . . converges to zero, even though zero
is not a member of the sequence. On the other hand, the sequence 1, 2, 3, . . . does not con-
verge to any real number. Thus, not all sequences are convergent. Indeed, even sequences
whose members are bounded need not converge. For example, the sequence 1, −1, 1, −1,
. . . is bounded, but it does not converge. On the other hand, if we consider only every other
member of this sequence beginning with the first, we would get the (sub)sequence 1, 1,
1, . . . , which clearly converges to 1. This example can be generalised into an important
result. To provide it, we need a few more definitions.
Proof: For a proof, see Royden (1963), or any good text in real analysis.
It turns out that we could have defined open and closed sets in terms of sequences.
Indeed, we end this section with a result that you are invited to prove for yourself.
Proof: Since f is continuous and S is compact, we know by Theorem A1.7 that f (S) is a
compact set. Because f is real-valued, f (S) ⊂ R. Since f (S) is compact, it is closed and
bounded. By Theorem A1.5, any closed and bounded subset of real numbers contains its
g.l.b., call it a, and its l.u.b., call it b. By definition of the image set, there exists some
x∗ ∈ S such that f (x∗ ) = b ∈ f (S) and some x̃ ∈ S such that f (x̃) = a ∈ f (S). Together
with the definitions of the g.l.b. and the l.u.b., we have f (x̃) ≤ f (x) and f (x) ≤ f (x∗ ) for
all x ∈ S.
The sense of Theorem A1.10 is illustrated in Fig. A1.18. In both (a) and (b), f : R →
R is a continuous real-valued function. In Fig. A1.18(a), the subset S = [1, 2] is closed,
bounded, and so compact. Because f is continuous, a minimum, f (x̃), and a maximum,
f (x∗ ), will, respectively, coincide with the g.l.b. and the l.u.b. in the image set f (S). To see
522 CHAPTER A1
f (x ) f (x)
f
f(x *)
f
f (S ) f (S')
f (x∼)
x x
x∼ x*
1 2 1 2
S S'
(a) (b)
Figure A1. 18. The Weierstrass theorem: (a) A minimum and a maximum are
guaranteed to exist, (b) Neither a minimum nor a maximum exists.
what can go wrong, however, consider Fig. A1.18(b). There we let the subset of the domain
be S = (1, 2), which is not compact. It is bounded, but is not closed. Clearly, no minimum
or maximum of f over S exists in this case. Because S is open, we can move closer and
closer to either end of the open interval without ever reaching the endpoint itself. These
movements are mapped into lower or higher values of f , respectively, never reaching either
a minimum or a maximum value.
Next, let us turn our attention to a general method for determining whether sys-
tems of simultaneous equations with as many equations as variables admit at least one
solution. Because the systems we encounter need not be linear, we wish to find a vector
x = (x1 , . . . , xn ) that simultaneously solves each of the n possibly non-linear equations,
g1 (x1 , . . . , xn ) = 0
.. (A1.1)
.
gn (x1 , . . . , xn ) = 0.
f1 (x1 , . . . , xn ) = x1
.. (A1.2)
.
fn (x1 , . . . , xn ) = xn .
SETS AND MAPPINGS 523
Proof: We will restrict our proof of Brouwer’s theorem to the special case in which S is the
unit simplex in Rn+ . That is, S = {(x1 , . . . , xn ) ∈ Rn+ | ni=1 xi = 1}. It is straightforward
to check that this set is non-empty, compact and convex. (But do check this!) For i =
1, . . . , n and x = (x1 , . . . , xn ) ∈ S, let fi (x) denote the ith coordinate of the
vector f (x).
Consequently, because f (x) ∈ S, the n coordinates of f (x) sum to one, i.e., ni=1 fi (x) = 1
for every x ∈ S. Consider the following claim. For every k = 1, 2, . . . , there exist n points,
x1,k , . . . , xn,k ∈ S, all within a single 1/k-ball, such that,
Let us put aside the proof of this claim for the moment and focus instead on what
it says and what it implies. The claim says that no matter how small a radius you have in
mind, there are n points in S, all within a single ball of the radius you specified, with the
property that the ith coordinate of the ith point is weakly greater than the ith coordinate of
its image under f .
What does the claim imply? Clearly, the claim implies that there are n sequences
of points in S, {x1,k }∞ n,k ∞
k=1 , . . . , {x }k=1 , such that (P.1) holds for every k. Because S is
compact, it is bounded. Therefore, by Theorem A1.8, there is a common subsequence
along which each sequence converges. (Why common? Think of {(x1,k , . . . , xn,k )}∞ k=1 as
a single sequence in Sn .) Moreover, because the kth points in the sequences are within
524 CHAPTER A1
distance 1/k from one another, the subsequences must converge to the same point, x∗ ∈ Rn
say. (Prove this!) And because the compactness of S implies that S is closed, we have that
x∗ ∈ S. Summarising, there is an infinite subset K of the indices k = 1, 2, . . . such that
each of the subsequences, {x1,k }k∈K , . . . , {xn,k }k∈K , converges to x∗ ∈ S. Taking the limit
of both sides of the inequality in (P.1) as k ∈ K tends to infinity gives,
where the second equality follows from the continuity of f . Hence, x∗ = (x1∗ , . . . , xn∗ ) ∈ S
satisfies,
Because both x∗ and f (x∗ ) are in S, their coordinates sum to one, i.e., both sides of (P.2)
sum to one. But this is possible only if each inequality in (P.2) is in fact an equality. Hence,
we have shown that x∗ = f (x∗ ), as desired!
Therefore (P.1) implies that f has a fixed point and so it suffices to prove (P.1). We
will do so only for the special case of n = 3. The ideas used in our proof generalise to any
number of dimensions. Consequently, the proof given here provides one way to see why
Brouwer’s theorem is true in general.
From this point in the proof onward we set n = 3. Therefore, S = {(x1 , x2 , x3 ) ∈
R3+ | x1 + x2 + x3 = 1} is the unit simplex in R3 , the flat triangular surface shown in
Figure A1.19. Rewriting (P.1) for the present special case in which n = 3, we wish to show
that for every k = 1, 2, . . ., there are three points, a, b, c ∈ S, all within a single 1/k-ball,
x3
(0,0,1)
(1,0,0) (0,1,0)
x1 x2
(1,0,0) (0,1,0)
such that,
a1 ≥ f1 (a), (P.3)
b2 ≥ f2 (b), and
c3 ≥ f3 (c),
The vertex x can be assigned the label i only if xi > fi (x). (P.4)
So, for example, if x = (1/4, 1/4, 1/2) is a vertex and f (x) = (0, 2/3, 1/3), then we can
assign x the label 1 or 3, but not 2. If a labelling of each of the vertices in the subdivision
satisfies (P.4), then the labelling is called feasible. As we have just seen, there can be more
than one feasible labelling of a subdivision. But are we sure that there exists at least one
feasible labelling? The answer is yes, because we have assumed that no vertex is a fixed
point of f . Therefore for any vertex x, at least one i ∈ {1, 2, 3} must satisfy xi > fi (x) (Do
you see why?), and so there is at least one feasible label for each vertex.
Fig. A1.21 is an example of a typical feasible labelling. Note that regardless of the
function f , the vertices (1, 0, 0), (0, 1, 0), and (0, 0, 1), of the original triangle S must be
assigned the labels 1, 2, and 3, respectively. Furthermore, any vertex on the bottom edge
4 Thiscan always be done, for example, by dividing each of the three sides of the original triangle into 1/k equal
intervals and then joining ‘opposite’ interval markers with lines that are parallel to the sides of the triangle.
526 CHAPTER A1
3 3
3 2 2
2 1
1 3
1 2 1 1
2
(i.e., any vertex that is a convex combination of the vertices (1, 0, 0) and (0, 1, 0)) must be
assigned the label 1 or 2, and cannot be assigned the label 3 because its third coordinate
is zero. Similarly, the labels of left-edge vertices must be either 1 or 3 and the labels of
right-edge vertices must be either 2 or 3. On the other hand, the labels assigned to vertices
in the interior of triangle S can, in principle, be either 1, 2, or 3.
Our objective is to show that for any feasible labelling, at least one of the small
triangles must be completely labelled, i.e., must have vertices labelled 1, 2, and 3. If this
is the case, i.e., if the vertices a, b, and c of some small triangle have labels 1, 2, and 3,
respectively, then according to (P.4),
a1 > f1 (a),
b2 > f2 (b), and
c3 > f3 (c),
3 3
3 3 3 3
3 2 2 2 2
3 3 2 2 2 2 2
1 2 2 2 1 1 1 3
1 1 2 2 2 1 1 1 3 3
1 2 2 2 1 1 1 1 1 1 2
the right keeping count of the number of 1–2 edges as we go.5 Note that the first time we
encounter a 2, the number of 1–2 edges goes from zero to one. We may then, in general,
encounter a number of 2’s in a row, in which case our total count of 1–2 edges does not
change. Our count will increase to two only if we encounter a vertex labelled 1. But if we
do, that cannot be the end of it. We must eventually encounter a vertex labelled 2 because
the right-most vertex has label 2. So, if our 1–2 edge count gets to two it cannot stop
there. It must get at least to three, at which point the current vertex is labelled 2. The same
logic implies that our count can never end at an even number because our count of 1–2
edges becomes even precisely when the previous vertex has label 2 and the current vertex
has label 1. Hence, there must be at least one more 1–2 edge because the last label is 2.
Because the count cannot end with an even number, it must end with an odd number, i.e.,
there are an odd number of 1–2 edges along the bottom edge of the triangle.
Where else can 1–2 edges occur? As we have already observed, they cannot occur on
either of the other two edges of the large triangle. Consequently, the only other place they
can occur is within the interior of S, and we claim that the total number of 1–2 edges in the
interior of S is even. To see why, look now at Fig. A1.22 and note that any interior edge
has a twin adjacent to it with same labels. This is because the two endpoints of any interior
edge and of its twin edge are in fact the same two points in S and hence are assigned the
same pair of labels. Consequently, interior 1–2 edges come in pairs and hence there must
be an even number of such edges.
Altogether then, when the subdivision is exploded into its separate pieces, the total
number of 1–2 edges appearing along the bottom edge of S is odd (there are 3 such edges
in Fig. A1.22) and the total number of 1–2 edges appearing in the interior of S is even
(there are 12 such edges in Fig. A1.22). Since 1–2 edges can appear nowhere else, there
must in total be an odd number of them.
The final step is to argue that if, looking at all the separate small triangles, there are
an odd number of 1–2 edges, then there must be an odd number of completely labelled
5 The order in which the 1–2 labels occur is not important. In particular, a small triangle edge along the bottom
of the large triangle whose left endpoint is 2 and whose right endpoint is 1 is considered a 1–2 edge.
528 CHAPTER A1
triangles (and hence there must be at least one!). Why? Let us count again the number
of 1–2 edges in a different way. How many 1–2 edges are there if we focus only on the
triangles that are not completely labelled? Some of these triangles have no 1–2 edges. But
all others have exactly two 1–2 edges because their labels must be either 1,1,2; or 1,2,2
(draw such labelled triangles and count the 1–2 edges). Consequently, the total number of
1–2 edges among triangles that are not completely labelled is even. But since we know
there are an odd number of 1–2 edges altogether, there must therefore be an odd number
of completely labelled triangles since each of these has precisely one 1–2 edge.6
f(x)
f(x)
S
f (x* )
45
x
a x* b
6 The fact that a feasibly labelled subdivision of the simplex must have an odd number of completely labelled
subtriangles is called Sperner’s lemma, and it generalises to any number of dimensions.
SETS AND MAPPINGS 529
Simply stated, f is real-valued if it maps elements of its domain into the real line. If the
domain is a subset of Rn , a real-valued
√ function maps
vectors in Rn into points in R.
n
The functions y = ax1 + bx2 , y = z2 + w2 , and y = i=1 ai xi2 are all examples of real-
valued functions because in each case, the left-hand side is a real number. The class of
real-valued functions is, of course, extremely broad. In this section, we will introduce
some particular types of real-valued functions and explore their important properties.
The real-valued functions in typical economic applications tend to be ones that either
rise or fall in a regular way over their domain. These are called increasing and decreas-
ing functions and we should define these terms carefully for future reference. Here, we
distinguish between three types of increasing function.
Look carefully at these definitions and recall how we use the symbols ≥ and in
the case of vector relations. We have defined a function as increasing whenever an increase
in one or more of the components xi of the vector x = (x1 , . . . , xn ) never causes the value
of the function to decrease. We have called the function strictly increasing whenever an
increase in all components of x causes the value of the function to strictly increase. We
have defined a function as strongly increasing whenever an increase in one or more of the
xi causes the value of the function to strictly increase. Before reading on, notice the hier-
archy here: an increasing function need not be strictly increasing, and a strictly increasing
function need not be strongly increasing, but every strongly increasing function is strictly
increasing, and every strictly increasing function is increasing.
Decreasing functions are defined analogously, and we make similar distinctions.
strictly decreasing. If, instead, f (x0 ) < f (x1 ) whenever x0 and x1 are distinct and x0 ≥ x1 ,
then we say that f is strongly decreasing.
Notice these are sets in the domain of the function. Because we can construct one
level set for any value in its image, we can completely represent the function by these sets
in its domain, thus reducing by one the number of dimensions needed to represent the
function. It is this characteristic of level sets that you have seen so often exploited in the
construction of indifference maps, isoquant maps, and so forth: level sets allow us to study
functions of three variables, which normally require awkward three-dimensional graphs to
depict, by focusing upon sets in the simple two-dimensional plane. Some level sets for the
function of three variables, y = f (x1 , x2 ), are depicted in Fig. A1.24.
We should note another property of level sets. We saw before that the map f : D → R
is a function if and only if it assigns a single number in the range to each element in the
domain. Therefore, two different level sets of a function can never cross or intersect each
other. If they did, that would mean two different numbers were being assigned to that one
element in the domain where they cross. This, of course, would violate the definition of a
function.
SETS AND MAPPINGS 531
x2
L (y 2)
L (y 1)
L (y 0 ) {(x 1, x 2) 冟 f(x 1, x 2) y 0 }
x1
x2
x1
x4
x0
x3
x2
x5
L (y 0 )
x1
Consider the level set for f (x) = y0 in Fig. A1.25. Because the point x0 is on the
y0 level set for f (x), we know that f (x0 ) = y0 . What do we know about points elsewhere
in the domain, such as x1 and x2 ? If f (x) is a strictly increasing function, we know that
f (x1 ) > f (x0 ) and that f (x2 ) < f (x0 ). This is clear because the coordinates of the vector
x1 (x2 ) are both strictly greater (smaller) than those of x0 , and a strictly increasing function
532 CHAPTER A1
assigns larger (smaller) numbers to vectors with larger (smaller) components. This is fairly
straightforward. But what do we know about other points on the same side of L(y0 ) as x1
or x2 , such as x3 or x4 ? Clearly, whether the function is increasing or decreasing, points
like x3 and x4 must give rise to a value of the function that is in the same relation to y0
as those given by the points x1 and x2 , respectively. If f (x) is strictly increasing, f (x1 )
and f (x3 ) must both be greater than y0 , while f (x2 ) and f (x4 ) must both be less than y0 . If
f (x) is strictly decreasing, f (x1 ) < y0 , f (x3 ) < y0 , and f (x2 ) > y0 , f (x4 ) > y0 . This is clear
because, for example, x3 is in the same relation to some other point on L(y0 ), such as x5 , as
x1 is in to x0 . Because x0 and x5 are both on L(y0 ), we know that f (x0 ) = f (x5 ) = y0 . We
can then make the same kind of argument as before to determine whether x3 gives a value
of the function greater or less than x5 does, depending on whether f (x) is strictly increasing
or decreasing. If f (x) is strictly increasing, f (x1 ) > f (x0 ) = y0 and f (x3 ) > f (x5 ) = y0 . If
f (x) is strictly decreasing, f (x1 ) < f (x0 ) = y0 and f (x3 ) < f (x5 ) = y0 .
Thinking along these lines, we can define some additional sets to divide up the
domain of a function in useful ways.
The superior set contains all points in D that give the function a value equal to or
greater than the value y0 , and the strictly superior set contains all points giving a value
strictly greater than y0 . The inferior set contains all points that give the function a value
less than or equal to y0 , and the strictly inferior set contains all points giving a value strictly
less than y0 . Because the level set itself contains all points that give the function the value
y0 , these sets are clearly related. The following theorem makes these relationships clear.
Its proof is left as an exercise.
1. L(y0 ) ⊂ S(y0 ).
2. L(y0 ) ⊂ I(y0 ).
3. L(y0 ) = S(y0 ) ∩ I(y0 ).
4. S (y0 ) ⊂ S(y0 ).
5. I (y0 ) ⊂ I(y0 ).
6. S (y0 ) ∩ L(y0 ) = ∅.
SETS AND MAPPINGS 533
x2 x2
S(y 0 ) I(y 0 )
I (y 0 ) L (y 0 ) S (y 0 ) L(y 0 )
x1 x1
(a) (b)
Figure A1. 26. Level, inferior, and superior sets for (a) an increasing function and
(b) a decreasing function.
7. I (y0 ) ∩ L(y0 ) = ∅.
8. S (y0 ) ∩ I (y0 ) = ∅.
Fig. A1.26 illustrates the superior and inferior sets for two different functions, one
increasing and the other decreasing. When f (x) is increasing, S(y0 ) will always lie on and
above the level set L(y0 ), and I(y0 ) will always lie on and below L(y0 ). S (y0 ), if not empty,
will always lie strictly above the level set L(y0 ), and I (y0 ) will always lie strictly below it.
When f (x) is decreasing, S(y0 ) will lie on and below the level set L(y0 ), and I(y0 ) will lie
on and above it. S (y0 ), if not empty, will lie strictly below L(y0 ), and I (y0 ) will lie strictly
above L(y0 ).
y1 f (x)
f(x t )
yt
y2
x
x2 xt x1
yt
f(x t )
x
x1 xt x2
If we now consider all values of t ∈ [0, 1], we trace out every point on the abscissa
between x1 and x2 . For each value of t, the same argument would hold. The vertical dis-
tance to the graph exceeds (or equals) the vertical distance to the chord at every value of
xt . This suggests a very simple and intuitive rule: A function is concave iff for every pair
of points on its graph, the chord joining them lies on or below the graph.
To see what happens when concavity fails to hold, consider the function in Fig.
A1.28. This function is concave over the regions [0, x1 ] and [x2 , ∞), as you can readily
see by drawing chords between points on the graph within each of those regions. It is
not concave, however, over the region [x1 , x2 ]. Here we can construct the chord between
(x1 , f (x1 )) and (x2 , f (x2 )) and find a t (say, t = 1/2) such that the convex combination of
the points on the graph, (xt , yt ), lies strictly above the point (xt , f (xt )). Because we have
found two points in the domain and at least one t ∈ [0, 1] such that f (xt ) < tf (x1 ) + (1 −
t)f (x2 ), the definition of concavity is violated.
Look again at Figs. A1.27 and A1.28. Can you sense what it is that distinguishes the
concave function in Fig. A1.27 from the non-concave one in Fig. A1.28? Look at the area
below the graph in Fig. A1.27 and below the concave regions of the graph in Fig. A1.28.
Compare these to the areas below the non-concave region of Fig. A1.28. The points below
the graph of all concave regions appear to be ‘nicely behaved’ in a way which we have
seen before. In particular, the set of points underneath the graph of the concave regions of
both functions are convex sets. The set of points beneath the non-concave region of Fig.
A1.28 is not a convex set. This relationship between a concave function and the set of
points beneath its graph is in fact a very general and intimate one. It holds for all concave
functions, not just for the functions of a single variable. It is important enough to warrant
stating as a theorem.
THEOREM A1.13 Points On and Below the Graph of a Concave Function Form a Convex Set
Let A ≡ {(x, y) | x ∈ D, f (x) ≥ y} be the set of points ‘on and below’ the graph of f : D →
R, where D ⊂ Rn is a convex set and R ⊂ R. Then,
We will shortly encounter several theorems like this one establishing the equivalence
between a certain type of function and related convex sets. The proofs of some will be
omitted and proofs of others will be left as exercises. Because it is important to develop
your intuition for these relationships, we will give a proof for this theorem here. To make
things as clear as possible, we will take an extended and leisurely approach.
Proof: (Extended) Because the theorem asserts an equivalence between concavity of the
function and convexity of the set A, we will have to break up the theorem and give a proof
in ‘both directions’. We will have to show that f concave ⇒ A convex and that A convex
⇒ f concave.
First part: f concave ⇒ A convex.
Assume f is a concave function. Then for xt ≡ tx1 + (1 − t)x2 and by the definition
of concave functions,
f (xt ) ≥ tf (x1 ) + (1 − t)f (x2 ) for all x1 , x2 ∈ D, and t ∈ [0, 1]. (P.1)
To prove that A is a convex set, we must show that the convex combination (xt , yt ) ≡
(tx1 + (1 − t)x2 , ty1 + (1 − t)y2 ) is also in A for all t ∈ [0, 1]. Because D is a convex set
by assumption, we know xt ∈ D for all t ∈ [0, 1]. Thus, we need only show that f (xt ) ≥ yt
for all t ∈ [0, 1] to establish (xt , yt ) ∈ A. But that is easy. From (P.2), we know that f (x1 ) ≥
y1 and f (x2 ) ≥ y2 . Multiplying the first of these by t ≥ 0 and the second by (1 − t) ≥
0 gives us tf (x1 ) ≥ ty1 and (1 − t)f (x2 ) ≥ (1 − t)y2 ∀ t ∈ [0, 1]. Adding these last two
inequalities together gives us
f (xt ) ≥ yt .
The points (x1 , y1 ) and (x2 , y2 ) are thus in A because they satisfy xi ∈ D and f (xi ) ≥ yi
for each i. Now form the convex combination of these two points, (xt , yt ). Because A is a
convex set, (xt , yt ) is also in A for all t ∈ [0, 1]. Thus,
f (xt ) ≥ yt . (P.4)
Now yt ≡ ty1 + (1 − t)y2 , so we can substitute for yi from (P.3) and write
Combining (P.4) and (P.5), we have f (xt ) ≥ tf (x1 ) + (1 − t)f (x2 ) ∀ t ∈ [0, 1], so f is a
concave function.
Because we have established the assertions in both directions (⇒ and ⇐), the proof
is complete.
We now have two equivalent ways of thinking about concave functions: one in terms
of the value the function takes at convex combinations of any two points, and one in
terms of the ‘shape’ of the set inscribed by the graph of the function. Either specification
completely defines a concave function.
According to the definition of concave functions, Fig. A1.29 is concave. Nothing in
the definition, or in Theorem A1.13, prohibits linear segments in the graph of the function.
The set beneath is still convex. At xt , the value of the function is exactly equal to the
convex combination of f (x1 ) and f (x2 ), so the inequality f (xt ) ≥ tf (x1 ) + (1 − t)f (x2 ) still
holds there. Geometrically, the point (xt , f (xt )) simply lies on, rather than strictly above,
the chord connecting x1 and x2 and that is quite all right.
It is sometimes convenient to exclude the possibility of linear segments in the graph
of the function. Strict concavity rules out this kind of thing.
f(x t )
y1
x
x1 xt x2
538 CHAPTER A1
Notice very carefully the small but important differences in the definitions of con-
cave and strictly concave functions. First, strict concavity requires f (xt ) to be strictly
greater than the convex combination of f (x1 ) and f (x2 ), rather than greater than or equal
to it, as required for concave functions. Next, the strict inequality must hold for all t in
the open interval (0, 1), rather than the closed interval [0, 1] as before. This makes perfect
sense because if t were either zero or one, the convex combination xt would coincide with
either x2 or x1 , and the strict inequality in the definition could not hold.
Geometrically, these modifications simply require the graph of the function to lie
strictly above the chord connecting any two points on the graph, except at each of the two
points themselves. This serves to rule out flat portions on the graph of the function.
Admittedly, this definition seems rather awkward at first. It says, if we take any two
points in the domain and form any convex combination of them, the value of the function
must be no lower than the lowest value it takes at the two points. Another way of describing
quasiconcave functions is in terms of their level sets.
Suppose we have y = f (x1 , x2 ) and pick any two points x1 and x2 in its domain.
Each of these gives rise to some value of the function and so each lies on some level set
in the plane of its domain. When we form any convex combination of the two points, we
get a point xt somewhere on the chord connecting x1 and x2 . The function has some value
at the point xt , too, so xt lies on some level set as well. Now consider the functions whose
level sets are depicted in Fig. A1.30. In each instance, we will assume that f (x1 ) ≥ f (x2 ).
7 Theoperator min[a, b] simply means ‘the smaller of a and b’. If a > b, then min[a, b] = b. If a = b, then
min[a, b] equals a and b.
SETS AND MAPPINGS 539
x2 x2
x2
xt
x1
x1
xt
x2 L (x 1)
L (x 2)
L (x t ) L (x t )
L (x 2) L (x 1)
x 1 x1
(a) (b)
Figure A1. 30. Level sets for quasiconcave functions. (a) The function is
quasiconcave and increasing. (b) The function is quasiconcave and decreasing.
When f (x) is an increasing function, it will be quasiconcave whenever the level set
relative to any convex combination of two points, L(xt ), is always on or above the lowest
of the level sets L(x1 ) and L(x2 ). This case is illustrated in Fig. A1.30(a). When f (x) is a
decreasing function, it will be quasiconcave whenever the level set relative to any convex
combination of two points is always on or below the highest of the two level sets. This
case is illustrated in Fig. A1.30(b).
The level sets in Fig. A1.30 were drawn nicely curved for a good reason. Besides
requiring the relative positioning of level sets already noted, quasiconcavity requires very
regular behaviour in its superior sets. As you may have guessed, these must be convex.
Proof: Sufficiency: First, we want to show that if f is quasiconcave, then S(y) is a convex
set for all y ∈ R. To begin, let y be any point in R, and let S(y) be the superior set relative
to y. Let x1 and x2 be any two points in S(y). (If S(y) is empty, our job is immediately done
because the empty set is convex.) We need to show that if f is quasiconcave, all points of
the form xt ≡ tx1 + (1 − t)x2 , t ∈ [0, 1], are also in S(y).
Because x1 ∈ S(y) and x2 ∈ S(y), the definition of the superior set tells us that x1
2
and x are both in D and satisfy
The first inequality is the definition of quasiconcavity, and the second follows from (P.1).
But if xt ∈ D and f (xt ) ≥ y, then xt satisfies the requirements for inclusion in S(y), so S(y)
must be a convex set. This completes the proof of sufficiency.
Necessity: Here we have to show that if S(y) is a convex set for all y ∈ R, then f (x)
is a quasiconcave function. To do that, let x1 and x2 be any two points in D. Without loss
of generality, assume we have labelled things so that
By assumption, S(y) is a convex set for any y ∈ R, so clearly S(f (x2 )) must be
convex, too. Obviously, x2 ∈ S(f (x2 )) and, by (P.3), x1 ∈ S(f (x2 )). Then for any convex
combination of x1 and x2 we must also have xt ∈ S(f (x2 )). From the definition of S(f (x2 )),
this implies f (xt ) ≥ f (x2 ). But in view of (P.3), this tells us
x2
x1
xt
x2
L (x 1) L(x 2) L(x t )
x1
x2 x2
x1
x2
xt
xt
x2
x1 x1 x1
(a) (b)
Factor out t ≥ 0 on the right-hand side, rearrange, and express (P.2) equivalently as
f (xt ) ≥ f (x2 ) + t f (x1 ) − f (x2 ) for all t ∈ [0, 1]. (P.3)
Now consider the product term on the right-hand side of (P.3). We know that t ≥ 0
and, by (P.1), that f (x1 ) − f (x2 ) ≥ 0, so the whole last term is non-negative and may be
strictly positive. In either case, the whole right-hand side, together, must be greater than
or equal to f (x2 ). At the same time, we know from (P.1) that f (x2 ) = min[f (x1 ), f (x2 )].
Therefore, (P.1) and (P.3), together, tell us that
y y
x x
(a) (b)
y y
x x
(c) (d)
(xt , f (xt )). Some examples of convex and non-convex functions are given in Fig. A1.33. As
the examples show, a convex function may have ‘linear segments’ in its graph. As before,
strict convexity is needed to rule out such things.
Concave and convex functions are very closely related – indeed, we said before that
the one was the ‘flip side’ of the other. More precisely, there is an equivalence between
concavity of a function and convexity of the negative of that function.
Proof: The proof just requires manipulating the definitions. We will show sufficiency and
leave necessity to the reader.
If f (x) is concave, then f (xt ) ≥ tf (x1 ) + (1 − t)f (x2 ) for all x1 and x2 in D, and
t ∈ [0, 1]. Multiply by −1 and get −f (xt ) ≤ t(−f (x1 )) + (1 − t)(−f (x2 )), so −f (x) is
convex.
Whereas concavity required points below the graph to form a convex set, convexity
requires the set of points on and above the graph of the function to be a convex set.
THEOREM A1.17 Points On and Above the Graph of a Convex Function Form a Convex Set
Let A∗ ≡ {(x, y) | x ∈ D, f (x) ≤ y} be the set of points ‘on and above’ the graph of f : D →
R, where D ⊂ Rn is a convex set and R ⊂ R. Then
Proof: By Theorem A1.16, f (x) is convex iff −f (x) is concave, and by Theorem A1.13, the
latter holds iff the set
A ≡ {(x, y) | x ∈ D, −f (x) ≥ y}
is a convex set. Note that because y may be a positive or negative real number, we may
rewrite the set A as
Hence, we have shown that f (x) is convex iff the set A ≡ {(x, −y) | x ∈ D, f (x) ≤ y} is
convex.
Finally, note that A is convex iff A∗ is convex because (x, y) ∈ A∗ iff (x, −y) ∈ A.
We conclude that f (x) is a convex function iff A∗ is a convex set.
Here again, these may seem rather awkward. Fortunately, we again know something
about the level and other related sets for quasiconvex functions. The results are essentially
the opposite of those obtained before. For a quasiconvex function, it is the inferior sets
that are convex sets. If the quasiconvex function is increasing, this will be the set of points
below the level set. If it is decreasing, this will be the set of points above the level set.
These are illustrated in Fig. A1.34 and detailed in what follows. We leave the proof of the
following theorem as an exercise.
8 Theoperator max[a, b] means ‘the larger of a and b’. If a > b, then max[a, b] = a. If a = b, then max[a, b]
equals a and b.
SETS AND MAPPINGS 545
x2 x2
x2
x2
xt xt
x1
x1
x1 x1
(a) (b)
Figure A1. 34. Quasiconvex functions have convex inferior sets. Strictly
quasiconvex functions have no linear segments in their level sets. (a) Strictly
quasiconvex and increasing. (b) Strictly quasiconvex and decreasing.
Proof: Again, we will just show sufficiency. If f (x) is quasiconcave, then f (xt ) ≥
min[f (x1 ), f (x2 )]. Multiply by −1 and get
A1.5 EXERCISES
A1.1 The set operations of union and intersection obey the commutative law and the distributive law.
The commutative law for unions states that S ∪ T = T ∪ S, and for intersections that S ∩ T = T ∩ S.
The distributive law for intersections says that for three sets R, S, and T, R ∩ (S ∩ T) = (R ∩ S) ∩ T,
and for unions that R ∪ (S ∪ T) = (R ∪ S) ∪ T. Verify these laws using diagrams similar to those in
Fig. A1.1.
A1.2 The following are intuitively ‘obvious’. Give a proof for each one.
(a) S ⊂ (S ∪ T).
(b) T ⊂ (S ∪ T).
(c) (S ∩ T) ⊂ S.
(d) (S ∩ T) ⊂ T.
A1.3 De Morgan’s laws tell us that
(S ∩ T)c = Sc ∪ T c
(S ∪ T)c = Sc ∩ T c .
Si = Sic
i∈I i∈I
c
Si = Sic .
i∈I i∈I
A1.5 Let A and B be convex sets. Show by counterexample that A ∪ B need not be a convex set.
A1.6 Extend Theorem A1.1 to the case of an arbitrary number of convex sets.
A1.7 Graph each of the following sets. If the set is convex, give a proof. If it is not convex, give a counter-
example.
(a) {(x, y) | y = ex }.
(b) {(x, y) | y ≥ ex }.
(c) {(x, y) | y ≥ 2x − x2 ; x > 0, y > 0}.
(d) {(x, y) | xy ≥ 1; x > 0, y > 0}.
(e) {(x, y) | y ≤ ln(x)}.
A1.8 Let S be the set of all people on earth. Let the relation R be defined by the statement ‘loves’. Is R
complete? Transitive?
A1.9 Let A and B be two sets in domain D, and suppose that B ⊂ A. Prove that f (B) ⊂ f (A) for any
mapping f : D → R.
SETS AND MAPPINGS 547
A1.10 Let A and B be two sets in range R, and suppose that B ⊂ A. Prove that f −1 (B) ⊂ f −1 (A) for any
mapping f : D → R.
A1.11 Consider the function f (x) = x2 . Describe the image set and determine whether the function is one-
to-one and whether it is onto if
(a) D = R, R = R.
(b) D = R, R = R+ .
(c) D = R+ , R = R.
(d) D = R+ , R = R+ .
A1.12 Does an inverse function exist for the function depicted in Fig. A1.8(a)? How about Fig. A1.8(b)?
Why or why not?
A1.13 Let f : D → R be any mapping and let B be any set in the range R. Prove that f −1 (Bc ) = (f −1 (B))c .
A1.14 For any mapping f : D → R and any two sets A and B in the range of f , show that
f −1 (A ∪ B) = f −1 (A) ∪ f −1 (B)
f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B).
A1.15 Let {Ai }i∈I ⊂ R be any (finite or infinite) collection of sets in the range of f . Extend your proof in the
preceding exercise to show that
f −1 Ai = f −1 (Ai )
i∈I i∈I
f −1 Ai = f −1 (Ai )
i∈I i∈I
A1.16 Let S and T be convex sets. Prove that each of the following is also a convex set:
(a) −S ≡ {x | x = −s, s ∈ S}.
(b) S − T ≡ {x | x = s − t, s ∈ S, t ∈ T}.
A1.17 Let Ai ⊂ Rm be a convex set for i = 1, . . . , n. Prove that each of the following is a convex set.
(a) ∩ni=1 Ai .
(b) ×ni=1 Ai (the Cartesian product).
n n
(c) i=1 Ai ≡ { i=1 ai | ai ∈ Ai , i = 1, . . . , n} (the sum of sets).
n n
i=1 α A ≡ { i=1 α ai | α ∈ R, ai ∈ Ai } (the linear combination of sets).
(d) i i i i
A1.18 Let f i (x) = ai · x + bi for ai ∈ Rn , bi ∈ R, and consider the inequalities, f i (x) ≥ 0 for i = 1, . . . , n.
Let Ω = {x | f i (x) ≥ 0, i = 1, . . . , n} be the set of solutions to these n linear inequalities. Show that
Ω is a convex set.
A1.19 We sometimes write x ≡ x − 0 to denote the distance from the origin in Rn to the point x.
Consider any vector tx, where t ≥ 0 is a non-negative scalar. Prove that ||tx = t x .
548 CHAPTER A1
where ε = (b − a)/2.
A1.21 Prove part 4 in Theorem A1.2. Is the intersection of infinitely many open sets also an open set?
A1.22 Consider any two points x1 and x2 in Rn . Let Bε (x1 ) be any open ball centred at x1 .
(a) Let Z ≡ {z | z = tx1 + (1 − t)x2 , t ∈ [0, 1]} be the set of all convex combinations of x1 and x2 .
Prove that Bε (x1 ) ∩ Z = ∅.
(b) Let Z ∗ ≡ {z | z = tx1 + (1 − t)x2 , t ∈ (0, 1)} be the subset of Z that excludes x1 and x2 . Prove
that Bε (x1 ) ∩ Z ∗ = ∅.
A1.23 Prove part 4 in Theorem A1.4.
A1.24 Consider intervals in R of the form [a, +∞) and (−∞, b]. Prove that they are both closed sets. Is
the same true for intervals of the form [a, c) and (−c, b] for c finite?
A1.25 Let S ⊂ R be a set consisting of a single point, S = {s}. Prove that S is a closed, convex set.
A1.26 Let (a, b) ⊂ R be any open interval. Prove that its complement, (a, b)c = (−∞, a] ∪ [b, +∞).
Conclude that the complement of every open interval is the union of two closed sets.
A1.27 Any closed set of real numbers possesses a rather special property: it can be viewed as a (possibly
infinite) intersection of unions of simple closed intervals. Specifically, for any closed set S ⊂ R,
S= ((−∞, ai ] ∪ [bi , +∞))
i∈I
for some real numbers ai < bi and some index set I. Give a proof of this claim.
A1.28 Let D be a subset of Rn . Prove the analogues of Theorems A1.2 and A1.4 for open and closed sets
in D. For example, the analogue of part 3 of Theorem A1.2 would read, ‘The union of open sets in
D is an open set in D’. Similarly for the others.
A1.29 Complete the following.
(a) Show that [0, 1) is open in R+ but not in R.
(b) Part (a) shows that open sets in R+ are not necessarily open in R. Show, however, that closed
sets in R+ are closed in R.
(c) More generally, show that if D is a subset of Rn and D is open (closed) in Rn , then S ⊂ D is
open (closed) in D if and only if it is open (closed) in Rn .
A1.30 Prove that if b is the l.u.b. of S ⊂ R and S is an open set, then b ∈
/ S. Prove that if b is the l.u.b. of
S ⊂ R and S is closed, then b ∈ S.
A1.31 Let α1 > 0, α2 > 0, and β > 0 all be real numbers. Consider the subset of points in R2 given by
Ω ≡ {x ∈ R2+ | α1 x1 + α2 x2 ≤ β}. Prove that Ω is a convex set. Sketch Ω in the plane. If x1 = 0,
what is the largest value that x2 can take? If x2 = 0, what is the largest value x1 can take? Mark these
points on your sketch. (Look familiar?) Prove that Ω is bounded.
SETS AND MAPPINGS 549
n
A1.32 The set Sn−1 ≡ {x | i=1 xi = 1, xi ≥ 0, i = 1, . . . , n} is called the (n − 1)-dimensional unit
simplex.
(a) Sketch this set for n = 2.
(b) Prove that Sn−1 is a convex set.
(c) Prove that Sn−1 is a compact set.
A1.33 Prove the analogue of Theorem A1.6 for closed sets. That is, show that the following statements are
equivalent:
(i) f : D → Rn is continuous.
(ii) For every closed ball B in Rn , the inverse image of B under f is closed in D.
(iii) For every closed subset S of Rn , the inverse image of S under f is closed in D.
A1.34 Prove that f : D → Rn is continuous if and only if f −1 (T) is compact in the domain D ⊂ Rm for
every compact set T in the range Rn .
A1.35 To help convince yourself that the conditions of Theorem A1.10 are sufficient, but not necessary,
illustrate a simple case like those in Fig. A1.18 where f is real-valued and continuous, S ⊂ D is
not compact, yet a minimum and a maximum over S both exist. Illustrate a case where neither is S
compact, nor is f continuous, yet both a maximum and a minimum of f over S exist.
A1.36 Every hyperplane divides Rn into two ‘half spaces’: the set of points ‘on and above’ the hyperplane,
H + = {x | a · x ≥ α}, and the set of points ‘on and below’ the hyperplane, H − = {x | a · x ≤ α}.
Prove that each of these two half spaces is a closed, convex set.
A1.37 Convince yourself that the conditions of Brouwer’s theorem are sufficient, but not necessary, for the
existence of a fixed point by illustrating the following situations:
(a) S is compact, S is convex, f is not continuous, and a fixed point of f exists.
(b) S is compact, S is not convex, f is continuous, and a fixed point of f exists.
(c) S is not compact, S is convex, f is continuous, and a fixed point of f exists.
(d) S is not compact, S is not convex, f is not continuous, and a fixed point of f exists.
A1.38 Let f (x) = x2 and suppose that S = (0, 1). Show that f has no fixed point even though it is a
continuous mapping from S to S. Does this contradict Brouwer’s theorem? Why or why not?
A1.39 Use Brouwer’s theorem to show that the equation cos(x) − x − 1/2 = 0 has a solution in the interval
0 ≤ x ≤ π/4.
A1.40 Sketch a few level sets for the following functions:
(a) y = x1 x2 .
(b) y = x1 + x2 .
(c) y = min[x1 , x2 ].
A1.41 Prove Theorem A1.12. Remember for parts 3 and 6 through 8 to prove that A ⊂ B and B ⊂ A.
A1.42 Let D = [−2, 2] and f : D → R be y = 4 − x2 . Carefully sketch this function. Using the definition
of a concave function, prove that f is concave. Demonstrate that the set A is a convex set.
550 CHAPTER A1
A2.1 CALCULUS
A2.1.1 FUNCTIONS OF A SINGLE VARIABLE
Roughly speaking, a function y = f (x) is differentiable if it is both continuous and
‘smooth’, with no breaks or kinks. The function in Fig. A2.1(b) is everywhere differen-
tiable, whereas the one in Fig. A2.1(a) is not differentiable at x0 . Differentiability is thus
a more stringent requirement than continuity. It is also a requirement we often impose
because it allows us to use familiar tools from calculus.
The concept of the derivative, f (x), is no doubt familiar to you. The derivative is a
function, too, giving, at each value of x, the slope or instantaneous rate of change in f (x).
We therefore sometimes write
dy
= f (x), (A2.1)
dx
to indicate that f (x) gives us the (instantaneous) amount, dy, by which y changes per
unit change, dx, in x. If the (first) derivative is a differentiable function, we can take its
derivative, too, getting the second derivative of the original function
d2 y
= f (x). (A2.2)
dx2
y y
x x
x0
(a) (b)
d
For Constants, α : (α)=0.
dx
d
For Sums: [f (x) ± g(x)]=f (x) ± g (x).
dx
d
Power Rule: (αxn )=nαxn−1 .
dx
d
Product Rule: [f (x)g(x)]=f (x)g (x) + f (x)g(x).
dx
d f (x) g(x)f (x) − f (x)g (x)
Quotient Rule: = .
dx g(x) [g(x)]2
d
Chain Rule: [f (g(x))]=f (g(x))g (x).
dx
the fact that its second derivative is non-positive. (Note that f (x0 ), the slope of the line l0 ,
is greater than f (x1 ), the slope of the line l1 .)
From Fig. A2.3, it appears that a function is concave precisely when its second
derivative is always non-positive. In a moment, we shall state a theorem to this effect.
Draw a few concave functions to convince yourself of this.
But something else is also apparent from Fig. A2.3. Note that both tangent lines,
l0 and l1 , lie entirely above (sometimes only weakly above) the function f . Let us focus
on line l0 . This is a straight line with slope f (x0 ) that goes through the point (x0 , f (x0 )).
Consequently, the equation describing this straight line is
l1
x
0 x0 x1
Now, saying that the line l0 lies above f is just saying that l0 (x) ≥ f (x) for all x. But
this then says that
for all x. Thus, this inequality seems to follow from the concavity of f .
Theorem A2.1, which we state without proof, puts together the preceding observa-
tions to characterise concave functions of a single variable in two ways: one in terms of the
function’s second derivative, and the other in terms of its first derivative and the tangent
lines it generates.
Because a function is convex if its negative is concave, Theorem A2.1 also gives a
characterisation of convex functions. Simply replace the word ‘concave’ with ‘convex’,
and reverse the sense of all the inequalities. One might think that the converse of statement
4 is true, i.e., that if f is strictly concave, then its second derivative must be strictly negative
everywhere. You are asked in Exercise A2.20 to show that this is not the case.
In the single-variable case, it is easy to think of the derivative of the function giving
us the slope or rate of change in y as x changes. There, the value of y, so any changes or
increments in y, depends only on the value of the single variable x. However, with real-
valued functions of n variables, y depends on the value of all n of the variables x1 , . . . , xn .
It is therefore harder to think of the slope or the rate of change in y in the singular. It is
quite natural, though, to think of the slope as x1 changes and the slope as x2 changes, and
so on, for all n of the variables on which y depends. Rather than having a single slope, a
function of n variables can be thought to have n partial slopes, each giving only the rate
at which y would change if one xi , alone, were to change. Each of these partial slopes is
called a partial derivative. Formally, each partial derivative is defined just like an ordinary
derivative, as the limiting value taken by the ratio of the increment in the value of the
function from a change in one of its variables to the change in that variable itself.
Various other notations are sometimes used to denote partial derivatives. Among the most
common are ∂y/∂xi or just fi (x).
Notice some important things about partial derivatives. First, as remarked, there are
n of them, one for each variable xi . Second, like the derivative in the single-variable case,
each partial derivative is itself a function. In particular, each partial derivative is a func-
tion that depends on the value taken by every variable, x1 , . . . , xn . Finally, notice that
the partial derivative is defined at every point in the domain to measure how the value of
the function changes as one xi changes, leaving the values of the other (n − 1) variables
unchanged. Thus, to calculate the partial derivative with respect to, say, xi , one simply
takes the ordinary derivative with respect to xi , treating all other variables xj , j = i, as
constants. Consider the following example of a function of two variables.
EXAMPLE A2.1 Let f (x1 , x2 ) = x12 + 3x1 x2 − x22 . This is a function of two variables, so
there will be two partial derivatives. We obtain the partial with respect to x1 by differenti-
ating with respect to x1 , treating every appearance of x2 as if it were some constant. Doing
that, we obtain
∂f (x1 , x2 )
= 2x1 + 3x2 .
∂x1
In the second term of the original function, we treated both multiplicative terms 3
and x2 as constants. In the third term of the function, x1 does not appear at all, so the entire
CALCULUS AND OPTIMISATION 555
∂f (x1 , x2 )
= 3x1 − 2x2 .
∂x2
Notice that each partial derivative in this example is a function of both x1 and x2 .
The value taken by each partial, therefore, will be different at different values of x1 and
x2 . At the point (1, 2), their values would be f1 (1, 2) = 8 and f2 (1, 2) = −1. At the point
(2, 1), their respective values would be f1 (2, 1) = 7 and f2 (2, 1) = 4.
It is easy to see that each partial derivative tells us whether the function is rising or
falling as we change one variable alone, holding all others constant. But this is just like
telling us how the value of the function changes as we move in the direction of one of the
n unit vectors. It is sometimes useful to know whether the value of the function is rising or
falling as we move in other directions away from a particular point in the domain.
So fix a point x = (x1 , . . . , xn ) in the domain of f , and suppose we wish to know how
the value of f changes from f (x) as we move away from x in the direction z = (z1 , . . . , zn ).
The function
g(t) = f (x + tz),
defined for t ∈ R, will help us in this regard. Note that g(t) takes on the value f (x) when
t = 0, and as t increases from zero, x + tz moves in the direction z. Consequently, if g(t)
increases as t moves from being zero to being positive, then we know that f increases as
we move from x in the direction z. Thus, we are interested in whether g (0) is positive,
negative, or zero.
We now give a heuristic description of how to calculate g (0). First, note that by
definition, g (0) is just the rate at which f changes per unit change in t. Now, the ith
coordinate in the domain of f increases at the rate zi per unit change in t. Moreover, the
rate at which f changes per unit change in the ith coordinate in the domain is just fi (x), the
ith partial derivative of f at x. Consequently, the rate at which f changes per unit change in
t due to the change in the ith coordinate is fi (x)zi . The total rate of change of f is then just
the sum of all of the changes induced by each of the n coordinates. That is,
n
g (0) = fi (x)zi .
i=1
The term on the right-hand side is known as the directional derivative of f at x in the
direction z.1 This can be written more compactly using vector notation.
1 Strictly speaking, for this calculation to be correct, f must be continuously differentiable.
556 CHAPTER A2
Before doing so, a word on some vector-related conventions. All vectors should be
assumed to be column vectors unless explicitly stated otherwise. In the text, we write x =
(x1 , . . . , xn ) even though x may be a column vector. This saves us from the inconvenient
and constant use of the transpose notation such as x = (x1 , . . . , xn )T .
With this convention in mind, assemble all preceding n partial derivatives into a row
vector ∇f (x) ≡ (f1 (x), . . . , fn (x)). The row vector ∇f (x) is called the gradient of f at x.
The directional derivative of f at x in the direction z then can be written as follows.
Note that the partial derivative of f with respect to xi is then just the directional
derivative of f in the direction (0, . . . , 0, 1, 0, . . . , 0), where the 1 appears in the ith posi-
tion. Thus, all partial derivatives are just special kinds of directional derivatives. On the
other hand, (A2.3) tells us that the rate at which f changes in any direction is determined
by the vector of partial derivatives, i.e., the gradient of f . Thus, it is helpful to think of the
gradient, ∇f , as being analogous to the derivative of a function of one variable. As before,
the gradient is itself a function, because it maps every x in the domain into a vector of
n ‘partial slopes’. Also, as before, we can take the derivative of these derivatives and get
something very much like the second derivative.
Let us consider one of the function’s partial derivatives, for instance, the partial with
respect to x1 . We note first that
∂f (x1 , . . . , xn )
∂x1
is a function of n variables itself. Changes in any of the xi could in principle affect its
value. Thus, f1 (x) itself has n partial derivatives.
There is no particular difficulty in calculating the n partial derivatives of the (first-
order) partial f1 (x). Each is calculated with respect to its given variable by simply treating
all other variables as though they were constants, and applying the familiar rules of single-
variable differentiation. The resulting derivative (itself also a function of n variables) is
called a second-order partial derivative. When f1 (x) is differentiated with respect to xi ,
the result is the second-order partial of f with respect to x1 and xi , denoted
∂ ∂f (x) ∂ 2 f (x)
, or , or f1i (x).
∂xi ∂x1 ∂xi ∂x1
Because there are n of these partials of the partial with respect to x1 , one with respect
to each of the xi , i = 1, . . . , n, they, too, can be arranged into a gradient vector. This time,
though, the vector will be the gradient of the partial with respect to x1 , f1 (x). We can write
this gradient vector as
∂ 2 f (x) ∂ 2 f (x)
∇f1 (x) = ,..., ≡ (f11 (x), . . . , f1n (x)).
∂x1 ∂x1 ∂xn ∂x1
CALCULUS AND OPTIMISATION 557
Now, there are n first-order partial derivatives in our original gradient vector, ∇f (x).
We can repeat the process we just completed for f1 and get a total of n gradients, ∇fi (x), i =
1, . . . , n. In essence, when we do this, we are taking the ‘gradient of the gradient’ of the
original function f , simply keeping in mind that each partial in the gradient n-vector itself
has n partials. If we arrange all the ∇fi (x) – each a vector of second-order partials – into a
matrix by stacking one on top of the other, we get
⎛ ⎞
f11 (x) f12 (x) . . . f1n (x)
⎜f21 (x) f22 (x) . . . f2n (x)⎟
⎜ ⎟
H(x) = ⎜ . .. .. .. ⎟ .
⎝ .. . . . ⎠
fn1 (x) fn2 (x) . . . fnn (x)
Notice that H(x) contains all possible second-order partial derivatives of the original
function. H(x) is called the Hessian matrix of the function f (x). Now recall the analogy
between the gradient and the first derivative. Remembering that the Hessian was obtained
by taking the gradient of the gradient, we can think of H(x) as analogous to the second
derivative of a function of a single variable.
There is an important theorem on second-order partial derivatives to which we
will have occasion to refer. It says that the order in which the partial derivatives are
differentiated makes no difference. The theorem is offered here without proof.
∂ 2 f (x) ∂ 2 f (x)
= ∀ i and j.
∂xi ∂xj ∂xj ∂xi
EXAMPLE A2.2 Consider the function f (x1 , x2 ) = x1 x22 + x1 x2 . Its two first-order par-
tials are
∂f ∂f
≡ f1 (x) = x22 + x2 and ≡ f2 (x) = 2x1 x2 + x1 .
∂x1 ∂x2
∂ 2f
≡ f12 (x) = 2x2 + 1.
∂x2 ∂x1
558 CHAPTER A2
∂ 2f
≡ f21 (x) = 2x2 + 1.
∂x1 ∂x2
Proof: We prove one direction for the concave case only, leaving the rest for you in Exercise
A2.21. Suppose then that f is a concave function. Let x ∈ D and z ∈ Rn . We must show
that g(t) = f (x + tz) is concave on C = {t ∈ R | x + tz ∈ D}.
So, choose t0 , t1 ∈ C, and α ∈ [0, 1]. We must show that
First, note that C is a convex set, so that αt0 + (1 − α)t1 ∈ C and g is therefore defined
there. To establish the desired inequality, we merely apply the definition of g. Indeed,
where the inequality follows from the concavity of f . (Note that because ti ∈ C, x + ti z ∈
D for i = 1, 2.)
Theorem A2.3 says, in effect, that to check that a multivariate function is concave, it
is enough to check, for each point x in the domain, and each direction z, that the function of
CALCULUS AND OPTIMISATION 559
a single variable defined by the values taken on by f on the line through x in the direction
z is concave. Because Theorem A2.1 characterises concave functions of a single variable,
we can then put Theorems A2.1 and A2.3 together to characterise concave functions of
many variables.
Before putting the two theorems together, it will be convenient to introduce some
terminology from matrix algebra. An n × n matrix is called negative semidefinite if for
all vectors z ∈ Rn ,
zT Az ≤ 0.
If the inequality is strict for all non-zero z, then A is called negative definite.2 The matrix A
is called positive semidefinite (respectively, positive definite) if −A is negative semidefi-
nite (negative definite). Think of negative semidefiniteness as the generalisation to matrices
of the notion of non-positive numbers. Indeed, note that a 1 × 1 matrix (i.e., a number) is
negative semidefinite if and only if its single entry is non-positive.
With this in mind, the analogue of a non-positive second derivative of a function
of one variable would be a negative semidefinite matrix of second-order partial deriva-
tives (i.e., the Hessian matrix) of a function of many variables. We can now put together
Theorems A2.1 and A2.3 to confirm this.
To make use of these results, we will have to calculate the first and second derivatives of g
in terms of f .
Now, g (t) is simply the directional derivative of f at the point x + tz in the direction
z. Consequently,
n
g (t) = fi (x + tz)zi ,
i=1
and then to differentiate the sum term by term. Now the derivative of fi (x + tz) with respect
to t is just the directional derivative of fi at x + tz in the direction z, which can be written as
n
fij (x + tz)zj .
j=1
n
n
g (t) = zi fij (x + tz)zj ,
i=1 j=1
Now, note that 0 ∈ C. Consequently, by (P.1), we must have g (0) ≤ 0. By using
(P.4), this means that
zT H(x)z ≤ 0.
But because z and x were arbitrary, this means that H(x) is negative semidefinite for all x.
Thus, we have shown that 1 ⇒ 2.
Note that this also proves statement 4, because if H(x) is negative definite for all x,
then regardless of the chosen x and z, so long as z is non-zero, g (t) < 0 for all t, so that
by Theorem A2.3, f must be strictly concave.
To see that 1 ⇒ 3, we must use (P.2). Choose any x0 ∈ D and let the previous z
be given by x0 − x. (Recall that z ∈ Rn was arbitrary.) Then both 0, 1 ∈ C. Consequently,
(P.2) implies that
But using (P.3) and the definition of g, this just says that
Therefore, statement 3 holds because both x and x0 were arbitrary. Hence, 1 ⇒ 3. The
proofs that 2 ⇒ 1 and 3 ⇒ 1 are similar, and we leave these as an exercise.
According to the theorem, a function is concave iff its Hessian is negative semidefi-
nite at all points in the domain. It is therefore convex iff its Hessian is positive semidefinite
at all points in the domain. At the same time, we know that the function will be strictly
concave (convex) when the Hessian is negative (positive) definite on the domain, though
the converse of this is not true.
There are many tests one can perform directly on the matrix H(x) to determine
the concavity, convexity, quasiconcavity, or quasiconvexity of the function. The rules and
regulations in this area are notoriously complicated. Their greatest applicability arises in
the context of optimisation problems, to be considered later. It therefore seems best to
postpone the details of these tests until then.
One fairly intuitive relation between the concavity/convexity of a function and its
second partial derivatives does seem worthy of note here. In the single-variable case, a
necessary and sufficient condition for a function to be concave (convex) is that its second
derivative not be rising (falling). In the multivariable case, we can note a necessary, but not
a sufficient, condition for concavity or convexity in terms of the signs of all ‘own’ second
partial derivatives. The proof is left as an exercise.
Two special cases are worthy of note: f (x) is homogeneous of degree 1, or linear homo-
geneous, if f (tx) ≡ tf (x) for all t > 0; it is homogeneous of degree zero if f (tx) ≡ f (x) for
all t > 0.
Homogeneous functions display very regular behaviour as all variables are increased
simultaneously and in the same proportion. When a function is homogeneous of degree 1,
for example, doubling or tripling all variables doubles or triples the value of the function.
When homogeneous of degree zero, equiproportionate changes in all variables leave the
value of the function unchanged.
β
f (x1 , x2 ) ≡ Ax1α x2 , A > 0, α > 0, β > 0,
is known as the Cobb-Douglas function. We can check whether this function is homo-
geneous by multiplying all variables by the same factor t and seeing what we get. We
find that
β
f (tx1 , tx2 ) ≡ A(tx1 )α (tx2 )β ≡ tα tβ Ax1α x2
= tα+β f (x1 , x2 ).
∂ ∂f (tx) ∂txi
(f (tx)) =
∂xi ∂xi ∂xi
∂f (tx)
= t. (P.2)
∂xi
CALCULUS AND OPTIMISATION 563
∂ k ∂f (x)
t f (x) = tk . (P.3)
∂xi ∂xi
∂f (tx) ∂f (x)
t = tk .
∂xi ∂xi
∂f (tx) ∂f (x)
= tk−1 ,
∂xi ∂xi
One frequent application arises in the case of functions that are homogeneous of
degree 1. If f (x) is homogeneous of degree 1, the theorem tells us its partial derivatives
will satisfy
∂f (tx) ∂f (x)
= ∀ t > 0.
∂xi ∂xi
This says that increasing (or decreasing) all variables in the same proportion leaves all n
partial derivatives unchanged. Let us verify this for the Cobb-Douglas form.
β
EXAMPLE A2.4 Let f (x1 , x2 ) ≡ Ax1α x2 , and suppose α + β = 1 so that it is linear
homogeneous. The partial with respect to x1 is
∂f (x1 , x2 ) β
= αAx1α−1 x2 .
∂x1
Multiply both x1 and x2 by the factor t, and evaluate the partial at (tx1 , tx2 ). We obtain
∂f (tx1 , tx2 )
= αA(tx1 )α−1 (tx2 )β
∂x1
β
= tα+β−1 αAx1α−1 x2
∂f (x1 , x2 )
= ,
∂x1
homogeneous if and only if it can always be written in terms of its own partial derivatives
and the degree of homogeneity.
n
∂f (x)
kf (x) = xi for all x.
∂xi
i=1
and to note some of its properties. Specifically, for fixed x, differentiate with respect to t
and obtain3
n
∂f (tx)
g (t) = xi , (P.2)
∂xi
i=1
which, at t = 1, gives
n
∂f (x)
g (1) = xi . (P.3)
∂xi
i=1
Now to prove necessity, suppose f (x) is homogeneous of degree k, so that f (tx) = tk f (x)
for all t > 0 and any x. From (P.1), we then have g(t) = tk f (x). Differentiating gives
g (t) = ktk−1 f (x) and, evaluating at t = 1, we get g (1) = kf (x). Therefore, by using (P.3),
n
∂f (x)
kf (x) = xi , (P.4)
∂xi
i=1
3 In case this is not crystal clear, remember that because g(t) ≡ f (tx1 , . . . , txn ), t multiplies all n variables, so
its effect enters separately through each of them. To get the derivative of g(·) with respect to t, we therefore
have to sum the separate effects a change in t will have on f (·) through all those separate avenues. Moreover, in
n
∂f (tx1 , . . . , txn ) ∂(txi )
computing each of them, we have to remember to apply the chain rule. Thus g (t) = .
∂xi ∂t
i=1
But ∂(txi )/∂t = xi , so (P.2) results.
CALCULUS AND OPTIMISATION 565
n
∂f (tx)
kf (tx) = txi . (P.5)
∂xi
i=1
Multiply both sides of (P.2) by t, compare to (P.5), and find that tg (t) = kf (tx). Substitute
from (P.1) to get
f (tx) = tk f (x)
n
∂f (x)
f (x) = xi .
∂xi
i=1
Multiply the first by x1 , the second by x2 , add, and use the fact that α + β = 1 to get
A2.2 OPTIMISATION
This section is devoted to the calculus approach to optimisation problems, the most com-
mon form of problem in microeconomic theory. After a very brief review of familiar results
from single-variable calculus, we will see how they can be extended to the multivariable
context. We then examine techniques to help solve optimisation problems involving con-
straints of the various kinds regularly encountered in theoretical economics. While we will
not dwell here long enough to gain a highly sophisticated command of all the mathematical
fine points in this area, we will strive for something deeper than a mere cookbook under-
standing of the techniques involved. Our goal will be to build from a good understanding
of the single-variable case to a strong intuitive grasp of the principles at work in some
very sophisticated and powerful methods, and to get some practice in their application. We
begin with a review of familiar ground.
Consider the function of a single variable, y = f (x), and assume it is differentiable.
When we say the function achieves a local maximum at x∗ , we mean that f (x∗ ) ≥ f (x)
for all x in some neighbourhood of x∗ . When we say the function achieves a global maxi-
mum at x∗ , we mean that f (x∗ ) ≥ f (x) for all x in the domain of the function. The function
achieves a unique local maximum at x∗ if f (x∗ ) > f (x) for all x = x∗ in some neighbour-
hood of x∗ . It achieves a unique global maximum when f (x∗ ) > f (x) for all x = x∗ in the
domain. Similarly, the function achieves a local minimum (unique local minimum) at x̃
whenever f (x̃) ≤ f (x) (f (x̃) < f (x)) for all x = x̃ in some neighbourhood of x̃, and achieves
a global minimum (unique global minimum) at x̃ whenever f (x̃) ≤ f (x) (f (x̃) < f (x)) for
all x = x̃ in the domain.
Various types of optima are illustrated in Fig. A2.4. The function achieves local
maxima at x1 , x3 , and x5 ; a global maximum is achieved at x3 . The global maximum at
x3 , however, is not unique. The local maxima at x1 and x3 are called interior maxima,
because x1 and x3 are in the interior of the domain D, not at its ‘edges’. Maxima such as
the one achieved at x5 are called boundary maxima. Likewise, at x0 , x2 , and x4 , there are
local minima; at x4 , a global minimum. Those at x2 and x4 are interior minima, and that at
x0 is a boundary minimum.
x
x0 x1 x2 x3 x4 x5
D
CALCULUS AND OPTIMISATION 567
f (x) f(x)
∼ ⱖ0
f"( x)
f'( x*) ⫽ 0
f'(x 1) ⬍ 0 f'(x 2) ⬎ 0
f'(x 1) ⬎ 0
f'(x 2) ⬍0
∼ ⫽0
f'(x)
f"( x*) ⱕ 0
x ∼ x
x1 x* x2 x1 x x2
Figure A2.5. (a) f (x∗ ) = 0 and f (x) is decreasing where f (x) achieves a maximum. (b)
f (x̃) = 0 and f (x) is increasing where f (x) achieves a minimum.
You are familiar with the calculus approach to problems of maximising or minimis-
ing functions of a single variable. In your calculus courses, you were probably introduced
to the logic of ‘first- and second-derivative tests’, then given practice finding the optima
of many different functions. The emphasis tended to be on applying these tests and learn-
ing how to calculate a function’s optima. In theoretical economics, however, we seldom
actually need to calculate optima. Instead, we usually just want to characterise them – to
spell out the conditions that we know must hold at the optimum, and then work with those
conditions, rather than with specific numbers.
For completeness, we recall the familiar first-order necessary conditions (FONC)
and second-order necessary conditions (SONC) characterising optima of an arbitrary
twice continuously differentiable function of one variable. The geometrical content of this
theorem is contained in Fig. A2.5.
THEOREM A2.8 Necessary Conditions for Local Interior Optima in the Single-Variable Case
Let f (x) be a twice continuously differentiable function of one variable. Then f (x) reaches
a local interior
THEOREM A2.9 First-Order Necessary Condition for Local Interior Optima of Real-Valued Functions
If the differentiable function f (x) reaches a local interior maximum or minimum at x∗ , then
x∗ solves the system of simultaneous equations,
∂f (x∗ )
=0
∂x1
∂f (x∗ )
=0
∂x2
..
.
∂f (x∗ )
= 0.
∂xn
CALCULUS AND OPTIMISATION 569
Proof: We suppose that f (x) reaches a local interior extremum at x∗ and seek to show that
∇f (x∗ ) = 0. The proof we give is not the simplest, but it will be useful when we consider
the second-order conditions. To begin, choose any vector z ∈ Rn . Then, for any scalar t,
construct the familiar function of a single variable,
Carefully recall a few things about g. First, for t = 0, x∗ + tz is just some vector different
from x∗ , so g(t) coincides with some value of f . For t = 0, x∗ + tz is the same as x∗ , so
g(0) coincides with the value of f at x∗ . Because g(t) coincides with some value of f for
every t, and with f (x∗ ) for t = 0, g(t) must reach a local extremum at t = 0 because we
have assumed that f reaches an extremum at x∗ . From Theorem A2.8, we know that if
g(t) reaches a local extremum at t = 0, then g (0) = 0. As we have done before, we can
differentiate (P.1) using the chain rule to obtain
n
∂f (x∗ + tz)
g (t) = zi
∂xi
i=1
for any t. If we evaluate this at t = 0 and apply the condition g (0) = 0, the local extremum
of g at zero implies that
n
∂f (x∗ )
g (0) = zi
∂xi
i=1
= ∇f (x∗ )z
= 0.
Because this must hold for every vector z in Rn – in particular for each of the n unit
vectors – this implies that each of f ’s partials must be zero, or that
∇f (x∗ ) = 0,
as we sought to show.
EXAMPLE A2.6 Let y = x2 − 4x12 + 3x1 x2 − x22 . To find a critical point of this function,
take each of its partial derivatives:
∂f (x1 , x2 )
= −8x1 + 3x2 ,
∂x1
∂f (x1 , x2 )
= 1 + 3x1 − 2x2 .
∂x2
570 CHAPTER A2
We will have a critical point at a vector (x1∗ , x2∗ ) where both of these equal zero
simultaneously. To find x1∗ and x2∗ , set each partial equal to zero:
∂f (x1∗ , x2∗ )
= −8x1∗ + 3x2∗ = 0
∂x1
∂f (x1∗ , x2∗ )
= 1 + 3x1∗ − 2x2∗ = 0, (E.1)
∂x2
−8x1∗ + 3x2∗ = 0
3x1∗ − 2x2∗ = −1,
If we invert A, we get
−2 −3
−1 1 −2 −3
A = = −37 7
−8
|A| −3 −8 7 7
Thus, the function reaches a critical point at x1∗ = 3/7 and x2∗ = 8/7. We do not yet know
whether we have found a maximum or a minimum, though. For that we have to look at the
second-order conditions.
a maximum if the function is ‘locally concave’ there, and we know we have a minimum
if the function is ‘locally convex’. Theorem A2.4 pointed out that curvature depends on
the definiteness property of the Hessian of f . Intuitively, it appears that the function will
be locally concave around x if H(x) is negative semidefinite, and will be locally convex
if it is positive semidefinite. Intuition thus suggests the following second-order necessary
condition for local interior optima.
THEOREM A2.10 Second-Order Necessary Condition for Local Interior Optima of Real-Valued
Functions
Let f (x) be twice continuously differentiable.
1. If f (x) reaches a local interior maximum at x∗ , then H(x∗ ) is negative semidefi-
nite.
2. If f (x) reaches a local interior minimum at x̃, then H(x̃) is positive semidefinite.
Proof: We can build directly from the proof of Theorem A2.9. Recall that we defined the
function
g(t) = f (x + tz)
n
∂f (x + tz)
g (t) = zi .
∂xi
i=1
Differentiating once again with respect to t, and again using the chain rule, we obtain the
second derivative,
n
n
∂ 2 f (x + tz)
g"(t) = zi zj . (P.1)
∂xi ∂xj
j=1 i=1
n
n
∂ 2 f (x∗ )
g"(0) = zi zj ≤ 0,
∂xi ∂xj
j=1 i=1
or that zT H(x)z ≤ 0. Because z was arbitrary, this means that H(x∗ ) is negative semidef-
inite. Similarly, if f is minimised at x = x̃, then g (0) ≥ 0, so that H(x̃) is positive
semidefinite, completing the proof.
572 CHAPTER A2
Theorems A2.9 and A2.10 are important and useful. We can use them to characterise
an (interior) optimum whenever we know, or assume, one exists. Both are necessary con-
ditions, allowing us to make statements like, ‘If x∗ maximises f (x), then fi (x∗ ) = 0, i =
1, . . . , n, and H(x∗ ) is negative semidefinite’. These conditions can help in locating poten-
tial maxima (or minima) of specific functions, but to verify that they actually maximise (or
minimise) the function, we need sufficient conditions.
Sufficient conditions allow us to make statements like, ‘If such and such obtains at x,
then x optimises the function’. With conditions like this, we could solve for x and know that
the function is optimised there. Sufficient conditions for optima can be derived, but as one
would suspect, they are more stringent than necessary conditions. Simply stated, sufficient
conditions for interior optima are as follows: (1) If fi (x∗ ) = 0 for i = 1, . . . , n and H(x∗ )
is negative definite at x∗ , then f (x) reaches a local maximum at x∗ ; (2) if fi (x̃) = 0 for
i = 1, . . . , n, and H(x̃) is positive definite at x̃, then f (x) reaches a local minimum at x̃.
The sufficient conditions require the point in question to be a critical point, and require the
curvature conditions to hold in their strict forms. (This serves to rule out the possibility
of mistaking an inflection point for an optimum.) For example, when H(x∗ ) is negative
definite, the function will be strictly concave in some ball around x∗ .
Locating a critical point is easy. We simply set all first-order partial derivatives equal
to zero and solve the system of n equations. Determining whether the Hessian is negative
or positive definite there will generally be less easy.
Various tests for determining the definiteness property of the Hessian key on the sign
pattern displayed by the determinants of certain submatrices formed from it at the point (or
region) in question. These determinants are called the principal minors of the Hessian.
By the first through nth principal minors of H(x) at the point x, we mean the determinants
where it is understood that fij is evaluated at x. Each is the determinant of a matrix resulting
when the last (n − i) rows and columns of the Hessian H(x) are deleted, for i = 1, . . . , n.
CALCULUS AND OPTIMISATION 573
They are called the principal minors because they are obtained from submatrices formed
as we move down the principal diagonal of the Hessian.
The following theorem gives requirements on its principal minors sufficient to ensure
definiteness of the Hessian.
THEOREM A2.11 Sufficient Conditions for Negative and Positive Definiteness of the Hessian
Let f (x) be twice continuously differentiable, and let Di (x) be the ith-order principal minor
of the Hessian matrix H(x).
1. If (−1)i Di (x) > 0, i = 1, . . . , n, then H(x) is negative definite.
2. If Di (x) > 0, i = 1, . . . , n, then H(x) is positive definite.
If condition 1 holds for all x in the domain, then f is strictly concave. If condition 2 holds
for all x in the domain, then f is strictly convex.
In particular, this theorem says that the function will be strictly concave if the prin-
cipal minors of the Hessian matrix always alternate in sign, beginning with negative. It
will be strictly convex if the principal minors of the Hessian are all positive.
Proof: A completely general proof would invoke part 4 of Theorem A2.4 and then reduce
the problem to one of establishing that if the principal minors of a matrix alternate in sign,
then the corresponding quadratic form is negative definite, and that if the principal minors
are all positive, then the corresponding quadratic form is positive definite. This, in turn, is
a well-known result in linear algebra. The interested reader may consult any standard text
on this point. For example, see Hohn (1973). Here, we will give a simple proof for the case
of two variables.
Suppose that y = f (x1 , x2 ) is twice continuously differentiable. The first and second
principal minors of its Hessian are
where we have used the fact that f12 = f21 . For z = (z1 , z2 ) = (0, 0), zT H(x)z can be
written
2
2
zT H(x)z = fij zi zj = f11 (z1 )2 + 2f12 z1 z2 + f22 (z2 )2 . (P.2)
j=1 i=1
If we can show that zT H(x)z < 0 whenever those principal minors in (P.1) alternate in
sign, beginning with negative, and that zT H(x)z > 0 whenever they are all positive, the
theorem will be ‘proved’.
Because (z1 , z2 ) is not the zero vector, at least one of z1 , z2 is non-zero. Suppose
z2 = 0. Note that we can add and subtract the same thing from the right-hand side of (P. 2)
574 CHAPTER A2
without changing anything. Adding and subtracting the quantity (f12 )2 (z2 )2 /f11 , we get
Factoring out f11 from the first few terms and (z2 )2 from the last two, we get
2
f12 f12 (f12 )2
z H(x)z = f11
T
(z1 ) + 2 z1 z2 +
2
(z2 )2
+ f22 − (z2 )2 .
f11 f11 f11
Recognising the first term as a square and putting the second term over a common
denominator, we can write
2
f12 f11 f22 − (f12 )2
z H(x)z = f11
T
z1 + z2 + (z2 )2 . (P.3)
f11 f11
Suppose that the principal minors in (P.1) alternate in sign, beginning negative. Then
the first product in (P.3) is non-positive and the last is strictly negative because z2 = 0
and because the numerator and denominator in the expression it multiplies have opposite
signs by assumption. Consequently, zT H(x)z < 0. Similarly, if the principal minors in
(P.1) are both positive, then both terms in (P.3) are non-negative and one is positive, so that
zT H(x)z > 0.
We are now prepared to state first- and second-order sufficient conditions for local
interior optima. These conditions follow directly from what has already been established,
so they need no further justification. We simply pull the threads together and write the
conditions compactly to facilitate future reference.
THEOREM A2.12 Sufficient Conditions for Local Interior Optima of Real-Valued Functions
Let f (x) be twice continuously differentiable.
1. If fi (x∗ ) = 0 and (−1)i Di (x∗ ) > 0, i = 1, . . . , n, then f (x) reaches a local
maximum at x∗ .
2. If fi (x̃) = 0 and Di (x̃) > 0, i = 1, . . . , n, then f (x) reaches a local minimum at x̃.
EXAMPLE A2.7 Let us check whether the critical point we found in the last example was
a maximum or a minimum. We had
∂ 2f
= −8
∂x12
∂ 2f
=3
∂x1 ∂x2
∂ 2f
=3
∂x2 ∂x1
∂ 2f
= −2
∂x22
Before, we found a critical point at x∗ = (3/7, 8/7). Checking the principal minors,
we find that
D1 (x) = | − 8| = −8 < 0
−8 3
D2 (x) = = 16 − 9 = 7 > 0.
3 −2
Because these alternate in sign, beginning with negative, Theorem A2.12 tells us that x∗ =
(3/7, 8/7) is a local maximum.
You may have noticed in this example that the Hessian matrix was completely inde-
pendent of x. We would therefore obtain the same alternating sign pattern on the principal
minors regardless of where we evaluated them. In Theorem A2.11, we observed that this
is sufficient to ensure that the function involved is strictly concave. Now try to imagine the
graph of some such strictly concave function in three dimensions. If it has any hills at all,
it would seem that it can have only one and this must have a single highest point.
Indeed, from Fig. A2.5, it is intuitively clear that any local maximum (minimum) of
a concave (convex) function must also be a global maximum (minimum). That intuition
extends to the multivariable case as well. In multivariable (unconstrained) optimisation
problems, local and global optima coincide when the function is either concave or convex.
As usual, we treat only the case of concave functions.
1. ∇f (x∗ ) = 0.
2. f achieves a local maximum at x∗ .
3. f achieves a global maximum at x∗ .
∇f (x∗ ) = 0.
Because f is concave, Theorem A2.4 implies that for all x in the domain,
f (x) ≤ f (x∗ ),
Theorem A2.13 says that under convexity or concavity, any local optimum is a global
optimum. Notice, however, that it is still possible that the lowest (highest) value is reached
at more than one point in the domain. If we want the highest or lowest value of the function
to be achieved at a unique point, we have to impose strict concavity or strict convexity.
Proof: Again, we will prove the theorem for strictly concave functions. We again suppose
the contrary and derive a contradiction.
If x∗ is a global maximiser of f but x∗ is not unique, then there exists some other
point x = x∗ such that f (x ) = f (x∗ ). If we let xt = tx + (1 − t)x∗ , then strict concavity
requires that
or, simply,
This, however, contradicts the assumption that x is a global maximiser of f . Thus, any
global maximiser of a strictly concave function must be unique.
Proof: This theorem has tremendous intuitive appeal. As usual, we only treat strictly
concave functions, leaving strictly convex functions for you.
If f is strictly concave, then by Theorem A2.13, because ∇f (x∗ ) = 0, f achieves
a global maximum at x∗ . Theorem A2.14 then implies that x∗ is the unique global
maximiser.
Here, f (x1 , x2 ) is called the objective function, or maximand. The x1 and x2 are
called choice variables and are usually written beneath the operator ‘max’ to remind us
that it is values of x1 and x2 we seek. The function g(x1 , x2 ) is called the constraint and
it jointly specifies those values of the choice variables that we are allowed to consider as
feasible or permissible in solving the problem. The set of all x1 and x2 that satisfy the
constraint are sometimes called the constraint set or the feasible set.
One way to solve this problem is by substitution. If the constraint function allows us
to solve for one of the xi in terms of the other, we can reduce the constrained problem in
two variables to one without constraints, and with one less variable. For example, suppose
that g(x1 , x2 ) = 0 can be written to isolate x2 on one side as
x2 = g̃(x1 ). (A2.5)
We can substitute this directly into the objective function and it will guarantee that x2 bears
the required relation to x1 . This way, the two-variable constrained maximisation problem
can be rephrased as the single-variable problem with no constraints:
Now we just maximise this by our usual methods. The usual first-order conditions
require that we set the total derivative, df /dx1 , equal to zero and solve for the optimal x1∗ . In
doing that here, we have to keep in mind that x1 now influences f in two ways: ‘directly’
through its ‘own’ position within f , and ‘indirectly’ through the original position of x2 .
Thus, when we differentiate (A2.6), we must remember that f has two partial derivatives,
and we must remember to use the chain rule. Keeping this in mind, we want x1∗ , where
When we have found x1∗ , we plug it back into the constraint (A2.5) and find x2∗ = g̃(x1∗ ). The
pair (x1∗ , x2∗ ) then solves the constrained problem, provided the appropriate second-order
condition is also fulfilled.
Unfortunately, it is easy to imagine cases where the constraint relation is complicated
and where it is not so easy to solve for one variable in terms of the other. What is more,
many interesting problems involve more than two choice variables and more than one
constraint. The substitution method is not well suited to these more complicated problems.
In some cases, substitution would be unnecessarily burdensome. In others, it would simply
CALCULUS AND OPTIMISATION 579
be impossible. Fortunately, there is a better way – one capable of handling a much broader
class of problems.
Suppose we multiply the constraint equation by a new variable, call it λ (lambda), which
we simply pull out of the air because it will prove useful. If we subtract this product from
the objective function, we will have constructed a new function, called the Lagrangian
function, or Lagrangian for short, and denoted by a script L(·). This new function has
three variables instead of two: namely, x1 , x2 , and λ:
Now, how would we determine the critical points of L(·) if it were an ordi-
nary (unconstrained) function of three variables? We would take all three of its partial
derivatives and set them equal to zero. Doing this gives
These are three equations in the three unknowns, x1 , x2 , and λ. Lagrange’s method asserts
that if we can find values x1∗ , x2∗ , and λ∗ that solve these three equations simultaneously,
then we will have a critical point of f (x1 , x2 ) along the constraint g(x1 , x2 ) = 0.
580 CHAPTER A2
Suppose we can find x1∗ , x2∗ , and λ∗ that solve (A2.7) through (A2.9). Notice some-
thing very important. Because they solve (A2.9), x1∗ and x2∗ must satisfy the constraint
g(x1∗ , x2∗ ) = 0. Showing that they also make f (x1 , x2 ) as large (or small) as possible subject
to that constraint is a little harder, but it can be done.
Consider our contrived function L(·) and take its total differential, remembering that
λ is a full-fledged variable of the function:
∂L ∂L ∂L
dL = dx1 + dx2 − dλ.
∂x1 ∂x2 ∂λ
By assumption, x1∗ , x2∗ , and λ∗ satisfy the first-order conditions (A2.7) through (A2.9) for
an optimum of L, so dL evaluated there must equal zero. Substituting from those first-order
conditions, we have
for all dx1 , dx2 , and dλ. To convince you that the solutions to the first-order conditions for
an optimum of Lagrange’s function also optimise the objective function f (x1 , x2 ) subject
to the constraint g(x1 , x2 ), we have to show that at (x1∗ , x2∗ , λ∗ ), the total differential of the
objective function f is also equal to zero – at least for all permissible dx1 and dx2 that
satisfy the constraint equation g. In essence, we want to show that dL = 0 for all dx1 , dx2 ,
and dλ implies that df = 0 for the permissible dx1 and dx2 .
One thing we can do immediately to simplify things is to notice again that (A2.9)
tells us the constraint is satisfied at x1∗ and x2∗ , so g(x1∗ , x2∗ ) = 0. This means that the third
term in (A2.10) is zero and our problem can be reduced to showing that
∂f (x1∗ , x2∗ ) ∂f (x1∗ , x2∗ ) ∂g(x1∗ , x2∗ ) ∂g(x1∗ , x2∗ )
dL = dx1 + dx2 − λ∗ dx1 + dx2 = 0
∂x1 ∂x2 ∂x1 ∂x2
(A2.11)
for all dxi implies that df = 0 for those dxi that satisfy the constraint g.
Next, we have to figure out what those permissible values of dx1 and dx2 are. Look
again at the constraint equation. Clearly, if g(x1 , x2 ) must always equal zero, then after x1
and x2 change, it must again be equal to zero. Stated differently, permissible changes in
x1 and x2 are those that lead to no change in the value of the constraint function g(x1 , x2 ).
Let dx1 and dx2 stand for those ‘permissible changes’ in x1 and x2 from their values x1∗
and x2∗ , respectively. To say that there is no change in the value of g is to say that its total
differential dg must equal zero. With this in mind, we can identify those changes dx1 and
dx2 that make dg = 0 by totally differentiating the constraint equation and setting it equal
CALCULUS AND OPTIMISATION 581
must hold between permissible changes dx1 and dx2 from x1∗ and x2∗ , respectively,
Putting (A2.11) and (A2.12) together gives us the result. If we are only considering
changes in the variables that satisfy (A2.12), then the third term in (A2.11) must be zero.
Therefore, at (x1∗ , x2∗ ), (A2.11) reduces to
for all dx1 and dx2 satisfying the constraint. But this is precisely what we want. It says
that the solutions (x1∗ , x2∗ , λ∗ ) to the first-order conditions for an unconstrained optimum of
Lagrange’s function guarantee that the value of the objective function f cannot be increased
or decreased for small changes in x1∗ and x2∗ that satisfy the constraint. Therefore, we must
be at a maximum or a minimum of the objective function along the constraint.
To recapitulate, we have shown that if (x1∗ , x2∗ , λ∗ ) solves dL(x1∗ , x2∗ , λ∗ ) = 0 for all
(dx1 , dx2 , dλ), then df (x1∗ , x2∗ ) = 0 for all dx1 and dx2 that satisfy the constraint. In words,
(x1∗ , x2∗ ) is a critical point of f given that the variables must satisfy the constraint and that
any movement away from (x1∗ , x2∗ ) must be a movement along the constraint. The first-
order conditions (A2.7) through (A2.9) thus characterise the critical points of the objective
function along the constraint. Whether those critical points are constrained maxima or
minima cannot be determined from the first-order conditions alone. To distinguish between
the two requires that we know the ‘curvature’ of the objective and constraint relations at
the critical point in question. We will examine these issues later.
EXAMPLE A2.8 Let us consider a problem of the sort we have been discussing and apply
Lagrange’s method to solve it. Suppose our problem is to
∂L
= −2ax1 − λ = 0 (E.2)
∂x1
582 CHAPTER A2
∂L
= −2bx2 − λ = 0 (E.3)
∂x2
∂L
= x1 + x2 − 1 = 0. (E.4)
∂λ
To solve for x1 , x2 , and λ, notice that (E.2) and (E.3) imply
2ax1 = 2bx2 ,
or
b
x1 = x2 . (E.5)
a
By substituting from (E.5) into (E.4),
b
x2 + x2 = 1,
a
or
a
x2 = . (E.6)
a+b
b
x1 = . (E.7)
a+b
To find λ, we can substitute from (E.6) or (E.7) into (E.3) or (E.2), respectively. Either
way, we get
a
−2b − λ = 0,
a+b
or
−2ab
λ= . (E.8)
a+b
The solutions to (E.2) through (E.5) are therefore the three values
b a −2ab
x1 = , x2 = , λ= . (E.9)
a+b a+b a+b
Only x1 and x2 in (E.9) are candidate solutions to the problem (E.1). The additional
bit of information we have acquired – the value of the Lagrangian multiplier there – is only
CALCULUS AND OPTIMISATION 583
‘incidental’. We may obtain the value the objective function achieves along the constraint
by substituting the values for x1 and x2 into the objective function in (E.1):
2 2
∗ b a −(ab2 + ba2 )
y = −a −b = . (E.10)
a+b a+b (a + b)2
Remember, from the first-order conditions alone, we are unable to tell whether this is a
maximum or a minimum value of the objective function subject to the constraint.
The Lagrangian method is quite capable of addressing a much broader class of
problems than we have yet considered. Lagrange’s method ‘works’ for functions with
any number of variables, and in problems with any number of constraints, as long as the
number of constraints is less than the number of variables being chosen.
Suppose we have a function of n variables and we face m constraints, where m < n.
Our problem is
g2 (x1 , . . . , xn ) = 0 (A2.13)
..
.
gm (x1 , . . . , xn ) = 0.
To solve this, form the Lagrangian by multiplying each constraint equation gj by a different
Lagrangian multiplier λj and subtracting them all from the objective function f . For x =
(x1 , . . . , xn ) and λ = (λ1 , . . . , λm ), we obtain the function of n + m variables
m
L(x, λ) = f (x) − λj gj (x). (A2.14)
j=1
The first-order conditions again require that all partial derivatives of L be equal to zero at
the optimum. Because L has n + m variables, there will be a system of n + m equations
determining the n + m variables x∗ and λ∗ :
In principle, these can be solved for the n + m values, x∗ and λ∗ . All solution vectors x∗
will then be candidates for the solution to the constrained optimisation problem in (A2.13).
Lagrange’s method is very clever and very useful. In effect, it offers us an algorithm
for identifying the constrained optima in a wide class of practical problems. Yet the some-
what casual exposition given here presupposes a great deal. There is first the question of
584 CHAPTER A2
whether solutions to the constrained optimisation problems (A2.4) or (A2.13) even exist.
In many of cases, there will at least be an easy answer to this question. If the objec-
tive function is real-valued and continuous (which it must be to be differentiable), and
if the constraint set defined by the constraint equations is compact, we are assured by
the Weierstrass theorem (Theorem A1.10) that optima of the objective function over the
constraint set do exist.
Once this question is answered, however, there remains a more subtle question so
far left open in our discussion of Lagrange’s method. How, for instance, do we know that
the Lagrangian multipliers we just ‘picked out of thin air’ even exist? More precisely, how
do we know that there exists λ∗ such that the critical points of L(·, λ∗ ) coincide with the
constrained optima of f over the constraint set? Surely, there must be some conditions that
have to be satisfied for this to be so.
In fact, there are some conditions of this sort, and they have primarily to do with
requirements on the constraint set. In the simple two-variable, one-constraint problem,
these conditions boil down to the requirement that at least one of the partial derivatives of
the constraint equation be strictly non-zero. The plausibility of this restriction will become
clearer when we examine the geometry of this simple problem in the next section. In the
general case, this expands to the requirement that the gradient vectors of the m constraint
equations, ∇gj , j = 1, . . . , m, be linearly independent.
For the sake of completeness and to facilitate reference – if not for the sake of
enlightenment – we will state Lagrange’s theorem, which addresses these issues. The proof
requires more advanced methods than we have attempted here, and so will be omitted. The
interested reader can find it in any good text on multivariable calculus.
4 For more geometric intuition, see Fig. A2.12 and the discussion leading to it.
CALCULUS AND OPTIMISATION 585
We again consider problem (A2.4). We can represent the objective function geomet-
rically by its level sets, L(y0 ) ≡ {(x1 , x2 )|f (x1 , x2 ) = y0 } for some y0 in the range. (Keep in
mind that there is one of these for every value y that the function can take.) By definition,
all points in the set must satisfy the equation
f (x1 , x2 ) = y0 .
If we change x1 and x2 , and are to remain on that level set, dx1 and dx2 must be such as to
leave the value of f unchanged at y0 . They must therefore satisfy
∂f (x1 , x2 ) ∂f (x1 , x2 )
dx1 + dx2 = 0, (A2.16)
∂x1 ∂x2
which is obtained by totally differentiating each side of the equation for the level set and
remembering that the total differential of the constant y0 is equal to zero. This must hold
at any point along any level set of the function.
We can derive an expression for the slope of any one of these level curves at some
arbitrary point. In the (x1 , x2 ) plane, the slope of any level curve will be ‘rise’ (dx2 ) over
‘run’ (dx1 ) for dx2 and dx1 satisfying (A2.16). By solving (A2.16) for dx2 /dx1 , the slope
of the level set through (x1 , x2 ) will be
dx2 f1 (x1 , x2 )
= (−1) . (A2.17)
dx1 along L(y0 ) f2 (x1 , x2 )
The notation |along... is used to remind you of the very particular sort of changes dx1 and
dx2 we are considering. Thus, as depicted in Fig. A2.6, the slope of the level set through
any point (x1 , x2 ) is given by the (negative) ratio of first-order partial derivatives of f at
(x1 , x2 ).
By the same token, suppose that the constraint g looks like Fig. A2.7 when plotted
in the same plane. We can think of the constraint function, too, as a kind of level set. It is
x
x2
⫺f 1 (x)
L (y 0 )
f 2 (x)
x1
x1
586 CHAPTER A2
x2
g (x) ⫽ 0 ⫺g 1(x)
g 2(x)
x1
x1
g(x1 , x2 ) = 0.
Just as before, we can derive the slope of this constraint set at any point along it by
totally differentiating both sides of this equation, remembering that the differential of (the
constant) zero is zero. For any (x1 , x2 ) satisfying the constraint, the relation
∂g(x1 , x2 ) ∂g(x1 , x2 )
dx1 + dx2 = 0
∂x1 ∂x2
must therefore hold for changes dx1 and dx2 along the constraint. By rearranging terms
again, the slope of the constraint at the point (x1 , x2 ) will be
dx2 g1 (x1 , x2 )
= (−1) . (A2.18)
dx1 along g(·)=0 g2 (x1 , x2 )
Now let us recall our problem (A2.4) and look at the first-order conditions for
a critical point of the Lagrangian function given in (A2.7) through (A2.9). According
to Lagrange’s method, these conditions determine the solution values to our problem,
(x1∗ , x2∗ ), plus an ‘incidental’ Lagrangian multiplier, λ∗ . Because we seek the solution val-
ues of the choice variables alone, and have no direct interest in the value of the Lagrangian
multiplier, we can rewrite (A2.7) through (A2.9) to eliminate λ∗ and get an expression for
x1∗ and x2∗ alone. Simple rearrangement of (A2.7) through (A2.9) gives
For the sake of this discussion, suppose λ∗ = 0. Dividing the first of the equations by
the second eliminates the variable λ∗ altogether and leaves us with just two conditions to
determine the two variables x1∗ and x2∗ :
What do these two conditions say? Look again at (A2.17) and (A2.18) and consider
the first condition (A2.19). The left-hand side of (A2.19) is −1 times the slope of the level
set for the objective function through the point (x1∗ , x2∗ ). The right-hand side is −1 times
the slope of the level set for the constraint function. The condition says that the solution
values of x1 and x2 will be at a point where the slope of the level set for the objective
function and the slope of the level set for the constraint are equal. That is not all, though.
The second condition (A2.20) tells us we must also be on the level set of the constraint
equation. A point that is on the constraint and where the slope of the level set of the
objective function and the slope of the constraint are equal is, by definition, a point of
tangency between the constraint and the level set.
The situation for a maximum of the objective function subject to the constraint is
depicted in Fig. A2.8(a). Clearly, the highest value of f along the constraint is the one
achieved at the point of tangency picked out by (A2.19) and (A2.20) and hence, by the
first-order conditions for the unconstrained optimum of the Lagrangian (A2.7) through
(A2.9). The same principles apply in the case of minimisation problems, as depicted in
Fig. A2.8(b).
x2 x2
∼ ∼
⫺f 1(x*) ⫺g 1(x*) ⫺f 1(x ) ⫺g 1(x )
Slope at tangency ⫽ ⫽ Slope at tangency ⫽ ∼ ⫽ ∼
f2 (x*) g2 (x*) f 2(x ) g 2(x )
x 2*
x∼2
L (y*)
g (x) ⫽ 0
L (y)
g (x) ⫽ 0 ∼
L (y) L (y)
x1 x1
x 1* x∼1
Figure A2.8. The first-order conditions for a solution to Lagrange’s problem identify a point of
tangency between a level set of the objective function and the constraint.
588 CHAPTER A2
g(x1 , x2 (x1 )) ≡ 0.
dx2 −g1
= (A2.21)
dx1 g2
for the slope of the constraint relation in the (x1 , x2 ) plane. Letting y = f (x1 , x2 (x1 )) be
the value of the objective function subject to the constraint, we get y as a function of the
single variable x1 . Differentiating with respect to x1 , we get dy/dx1 = f1 + f2 (dx2 /dx1 ).
Substituting from (A2.21) gives
dy g1
= f1 − f2 . (A2.22)
dx1 g2
Differentiating again, remembering always that x2 is a function of x1 , and that the fi and
gi all depend on x1 both directly and through its influence on x2 , we obtain the second
derivative,
d2 y dx2 dx2 g1
= f11 + f12 − f21 + f22
dx12 dx1 dx1 g2
g2 [g11 + g12 (dx2 /dx1 )] − g1 [g21 + g22 (dx2 /dx1 )]
−f2 . (A2.23)
g22
CALCULUS AND OPTIMISATION 589
Second-order necessary conditions for a maximum in one variable require that this
second derivative be less than or equal to zero at the point where the first-order condi-
tions are satisfied. Sufficient conditions require that the inequalities hold strictly at that
point. The first-order conditions (A2.7) through (A2.9) require that f1 = λg1 and f2 = λg2 .
Young’s theorem tells us that f12 = f21 and g12 = g21 . Substituting from these, from
(A2.21), and using some algebra, we can re-express (A2.23) as
d2 y 1
= (f11 − λg11 )(g2 )2 − 2(f12 − λg12 )g1 g2 + (f22 − λg22 )(g1 )2 . (A2.24)
dx12 (g2 )2
Now look carefully at the terms involving λ inside the brackets. Recall that when
we formed the first-order conditions (A2.7) through (A2.9), we found that the first-order
partials of the Lagrangian function with respect to the xi were
Li = fi − λgi .
It is clear that the terms involving λ inside the brackets are just the second-order par-
tials of the Lagrangian with respect to the xi . The entire bracketed term now can be seen
to involve these second-order partials plus the first-order partials of the constraint. To
the trained eye, the quadratic expression in the bracketed term can be recognised as the
determinant of a symmetric matrix. Suppose we form the symmetric matrix
⎛ ⎞
0 g1 g2
H̄ ≡ ⎝g1 L11 L12 ⎠ .
g2 L21 L22
This matrix is called the bordered Hessian of the Lagrangian function, because it
involves the second-order partials of L bordered by the first-order partials of the constraint
equation and a zero. If we take its determinant (e.g., by expanding along the last column),
we see that
0 g1 g2
D̄ ≡ g1 L11 L12 = −[L11 (g2 )2 − 2L12 g1 g2 + L22 (g1 )2 ]. (A2.26)
g2 L21 L22
By combining (A2.24), (A2.25), and (A2.26), the second derivative of the objective func-
tion subject to the constraint can be written in terms of the determinant of the bordered
590 CHAPTER A2
d2 y (−1)
= D̄. (A2.27)
dx12 (g2 )2
Thus, the curvature of the objective function along the constraint, indicated by the
sign of the second derivative d2 y/dx12 , can be inferred directly from the sign of the determi-
nant of the bordered Hessian of the Lagrangian function (assuming that g2 = 0). Care is in
order because the sign of the one will always be opposite the sign of the other, because the
determinant in (A2.27) is multiplied by −1. We are now in a position to state a sufficient
condition for the two-variable, one-constraint problem.
THEOREM A2.17 A Sufficient Condition for a Local Optimum in the Two-Variable, One-Constraint
Optimisation Problem
If (x1∗ , x2∗ , λ∗ ) solves the first-order conditions (A2.7) through (A2.9), and if D̄ >
0 (< 0) in (A2.26) when evaluated at (x1∗ , x2∗ , λ∗ ), then (x1∗ , x2∗ ) is a local maximum
(minimum) of f (x1 , x2 ) subject to the constraint g(x1 , x2 ) = 0.
EXAMPLE A2.9 Let us consider whether the critical point we obtained in Example A2.8
is a minimum or a maximum. Referring back, it is easy to see that L11 =
−2a, L12 = 0, L21 = 0, and L22 = −2b. From the constraint equation, g1 = 1 and g2 =
1. Constructing the bordered Hessian, its determinant will be
−2a
0 1
D̄ = 0 −2b 1 = 2(a + b) > 0. (A2.28)
1 1 0
Because here, D̄ > 0 for all values of x1 , x2 , and λ, it must be so at the solution (E.9) to
the first-order conditions in Example A2.8. The value of the objective function in (E.10)
must therefore be a maximum subject to the constraint.
With n variables and m < n constraints, the second-order sufficient conditions again
tell us we will have a maximum (minimum) if the second differential of the objective
function is less than zero (greater than zero) at the point where the first-order conditions
are satisfied. The sign of the second differential can once more be reduced to knowing
the definiteness of the bordered Hessian of the Lagrangian. In the multivariable, multicon-
straint case, the bordered Hessian is again formed by bordering the matrix of second-order
partials of L by all the first-order partials of the constraints and enough zeros to form a
symmetric matrix. The test for definiteness then involves checking the sign pattern on the
CALCULUS AND OPTIMISATION 591
Its principal minors are the determinants of submatrices obtained by moving down the
principal diagonal. The n − m principal minors of interest here are those beginning with
the (2m + 1)-st and ending with the (n + m)-th, i.e., the determinant of H̄. That is, the
principal minors
0 ... 0 g11 . . . g1k
.. .. . .. .. ..
.
. .. . . .
0 ... 0 gm . . . gm
D̄k = 1 1 k , k = m + 1, . . . , n. (A2.29)
g1 . . . gm
1 L11 . . . L1k
. .. . .. .. ..
.. . .. . . .
g1 . . . gm Lk1 ... L
k k kk
We can summarise the sufficient conditions for optima in the general case with the
following theorem.
THEOREM A2.18 Sufficient Conditions for Local Optima with Equality Constraints
Let the objective function be f (x) and the m < n constraints be gj (x) = 0, j = 1, . . . , m.
Let the Lagrangian be given by (A2.14). Let (x∗ , λ∗ ) solve the first-order conditions in
(A2.15). Then
1. x∗ is a local maximum of f (x) subject to the constraints if the n − m principal
minors in (A2.29) alternate in sign beginning with positive D̄m+1 > 0, D̄m+2 <
0, . . . , when evaluated at (x∗ , λ∗ ).
2. x∗ is a local minimum f (x) subject to the constraint if the n − m principal
minors in (A2.29) are all negative D̄m+1 < 0, D̄m+2 < 0, . . . , when evaluated
at (x∗ , λ∗ ).
∼) ⬎ 0
f ' (x
x 1 x* ⫽ 0 x* ⫽ 0 x∼ ⫽ 0 x* ⬎ 0
(a) Case 1 (b) Case 2 (c) Case 3
the feasible region. Once more, two things characterise the solution in case 3: x∗ > 0 and
f (x∗ ) = 0.
At this point, the question arises: is there a convenient set of conditions that we can
use to summarise all three possibilities? Again, the three possibilities are
Is there something common to all three? Look carefully. In each case, multiply the two
conditions together and notice that the product will always be zero! Thus, in all three
cases, x∗ [f (x∗ )] = 0.
This alone, however, is not quite enough. Look again at case 3. Clearly, x̃ = 0 does
not give a maximum of the function in the feasible region. There, f (x̃) > 0, so as we
increase x away from the boundary and into the feasible region, the value of the function
will increase. Nonetheless, the product x̃[f (x̃)] = 0 even though x̃ is a minimum, not a
maximum, subject to x ≥ 0. We can rule out this unwanted possibility by simply requiring
the function to be non-increasing as we increase x.
All together, we have identified three conditions that characterise the solution to the
simple maximisation problem with non-negativity constraints. If x∗ solves (A2.30), then
all three of the following must hold:
Condition 1. f (x∗ ) ≤ 0
Condition 2. x∗ [f (x∗ )] = 0 (A2.31)
∗
Condition 3. x ≥ 0.
Notice that these three conditions, together, rule out the ‘minimum’ problem just
described. At x̃ = 0, even though x̃f (x) = 0, condition 1 is violated because f (x̃) > 0.
max 6 − x2 − 4x subject to x ≥ 0.
x
1. −2x∗ − 4 ≤ 0
2. x∗ [−2x∗− 4] = 0
3. x∗ ≥ 0.
Trying to solve conditions like these can sometimes get messy. A rule of thumb that
usually works is to focus on the product term (2). Solve that first, then make sure the other
594 CHAPTER A2
conditions are satisfied. There, we can multiply through by −1, factor out a 2, and get
2x∗ [x∗ + 2] = 0.
The only values that satisfy this are x = 0 and x = −2. However, condition 3 rules out
x = −2, leaving only x = 0 as a candidate. Making sure this satisfies condition 1 as well,
we get 0 − 4 = −4 ≤ 0, so the solution must be x∗ = 0.
The conditions for a minimum of f (x) subject to x ≥ 0 can also be easily derived. The
reasoning is just as before, except that this time the troublesome case arises if the function
is decreasing at the boundary of the feasible set. We rule this out by requiring the derivative
to be non-negative at the point in question. If x∗ solves the minimisation problem with
non-negativity constraints, then
Condition 1. f (x∗ ) ≥ 0
Condition 2. x∗ [f (x∗ )] = 0 (A2.32)
Condition 3. x∗ ≥ 0.
In quite sensible ways, (A2.31) and (A2.32) generalise to the case of optimising
real-valued functions of any number of variables subject to non-negativity constraints on
all of them. In the multivariable case, the three conditions must hold for each variable
separately, with the function’s partial derivatives being substituted for the single derivative.
The following theorem is straightforward and its proof is left as an exercise.
∂f (x∗ )
(i) ≤ 0, i = 1, . . . , n
∂xi
∂f (x∗ )
(ii) xi∗ = 0, i = 1, . . . , n
∂xi
(iii) xi∗ ≥ 0, i = 1, . . . , n.
CALCULUS AND OPTIMISATION 595
∂f (x∗ )
(i) ≥ 0, i = 1, . . . , n
∂xi
∂f (x∗ )
(ii) xi∗ = 0, i = 1, . . . , n
∂xi
(iii) xi∗ ≥ 0, i = 1, . . . , n.
x2
g2(x1, x2) ⫽ 0
g1(x1, x2) ⫽ 0
x1
which, after multiplying by −1 and taking reciprocals, can be equivalently written as,
Let us now recall a bit of geometry with vectors. A vector (z1 , z2 ), being the
line segment from the origin to the point (z1 , z2 ), has slope z2 /z1 . Thus, the first term
in (A2.34) is the slope of the vector ∇g1 (x∗ ) = (∂g1 (x∗ )/∂x1 , ∂g1 (x∗ )/∂x2 ) which is
the gradient of g1 at x∗ , the second term is the slope of the gradient vector ∇f (x∗ ) =
(∂f (x∗ )/∂x1 , ∂f (x∗ )/∂x2 ), and the third term is the slope of the gradient vector ∇g2 (x∗ ) =
(∂g2 (x∗ )/∂x1 , ∂g2 (x∗ )/∂x2 ).
CALCULUS AND OPTIMISATION 597
x2
g2 ⫽ 0
x*
f g1 ⫽ 0
x1
Consequently, (A2.34) tells us that the slope of the gradient vector ∇f (x∗ ) lies
between the slopes of the gradient vectors ∇g1 (x∗ ) and ∇g2 (x∗ ). Because a gradient vec-
tor is perpendicular to its level set,5 the situation is therefore as shown in Fig. A2.12, where
we have drawn each gradient vector as if x∗ is the origin.
The north-east shaded cone in Fig. A2.12 is the set of all vectors (once again, think
of x∗ as the origin) that can be written as a non-negative linear combination of ∇g1 (x∗ )
and ∇g2 (x∗ ). Evidently, ∇f (x∗ ) lies in this set and so we may conclude that if x∗ solves
(A2.33), there exist real numbers λ∗1 , λ∗2 such that,
λ∗1 ≥ 0, λ∗2 ≥ 0,
and of course, in our example, the constraints are satisfied with equality, i.e.,
5 We have just shown that the slope of the gradient vectors are the negative reciprocals of the slopes of their level
sets. Hence, they are perpendicular to their level sets.
598 CHAPTER A2
x2
g2 ⫽ 0
ⵜg1(x*)
ⵜf(x*)
ⵜg2(x*)
x*
f g1 ⫽ 0
x1
In general, not all constraints need be satisfied with equality at the optimal solution.
The Kuhn-Tucker theorem below provides necessary conditions of the sort we have just
derived, while also handling situations in which some constraints are not binding at the
optimum. The theorem shows that, at the optimum, the gradient of f can be expressed as
a non-negative linear combination of the gradients of the gj associated with binding con-
straints j. We state the theorem for maximisation problems only because any minimisation
problem can be solved by maximising the negative of the objective function subject to the
same constraints.
THEOREM A2.20 (Kuhn-Tucker) Necessary Conditions for Maxima of Real-Valued Functions Subject to
Inequality Constraints
Let f (x) and gj (x), j = 1, . . . , m be continuous real-valued functions defined over some
domain D ∈ Rn . Let x∗ be an interior point of D and suppose that x∗ maximises f (x) on D
subject to the constraints gj (x) ≤ 0, j = 1, . . . , m, and that f and each gj are continuously
differentiable on an open set containing x∗ . If the gradient vectors ∇gj (x∗ ) associated
with constraints j that bind at x∗ are linearly independent, then there is a unique vector
CALCULUS AND OPTIMISATION 599
Proof: Without loss, suppose that the first K ≥ 0 constraints are binding and the remainder
are not binding. Define λ∗j = 0 for j = K + 1, . . . , m. Hence, regardless of the values of
λ∗ , . . . , λ∗K , it will be the case that λ∗j gj (x∗ ) = 0, j = 1, . . . , m. Define B = {b ∈ Rn | b =
1 K j ∗
j=1 λj ∇g (x ), for some λ1 ≥ 0, . . . , λK ≥ 0}, and note that B is convex. It can also be
shown that B is closed. See Exercise A2.29.
If ∇f (x∗ ) ∈ B, then ∇f (x∗ ) − K ∗ j ∗ ∗ ∗
j=1 λj ∇g (x ) = 0 for some λ1 ≥ 0, . . . , λK ≥ 0.
∗ ∗
K ∗
Moreover, such λi are unique since if also ∇f (x ) − j=1 λ̂j ∇g (x ) = 0, then sub- j
∗ j ∗
tracting the two equalities gives K j=1 (λj − λ̂j )∇g (x ) = 0. The linear independence of
∇g1 (x∗ ), . . . , ∇gK (x∗ ) implies λ∗j = λ̂j for j = 1, . . . , K. Therefore, it suffices to show
that ∇f (x∗ ) is contained in B.
Let a∗ = ∇f (x∗ ). Suppose, by way of contradiction, that a∗ ∈ / B. Then the two
closed convex sets A = {a∗ } and B are disjoint. By Theorem A2.24, there exists p ∈ Rn
such that,
p · a∗ > p · b (P.1)
for every b ∈ B. In particular, p · a∗ > 0 because 0 ∈ B. Also, p · ∇gj (x∗ ) ≤ 0 for every
j = 1, 2, . . . , K, since if this fails for some such j, (P.1) would be violated by setting b =
λ∇gj (x∗ ) for λ > 0 large enough. Thus, we have
Because ∇g1 (x∗ ), . . . , ∇gK (x∗ ) are linearly independent vectors in Rn , the K × n
matrix G whose jth row is ∇gj (x∗ ) has range equal to all of RK .6 In particular, if w ∈RK
is the column vector (−1, −1, . . . , −1), there exists z ∈ Rn such that Gz = w. So, in
particular,
6 This
is a basic fact from linear algebra. However, it can be proven quite directly using Theorem A2.24, a proof
you might wish to try.
600 CHAPTER A2
Consequently, for ε > 0 small enough, x∗ + ε(p + δz) is feasible and must therefore yield
a value of f that is no greater than the maximum value f (x∗ ). We must therefore have,
df (x∗ + ε(p + δz))
dε ε = 0 ≤ 0.
x1 ≥ 0, . . . , xn ≥ 0,
given explicitly. Theorem A2.20 still applies to such a situation since each non-negativity
constraint can be written as a constraint function gj . Indeed, if the above non-negativity
constraints are the only constraints, then Theorem A2.20 reduces to theorem A2.19. See
Exercise A2.30.
The conclusion that λ∗j gj (x∗ ) = 0 for j = 1, . . . , m is called complementary slack-
ness. It says that if a constraint is slack its associated Lagrange multiplier must be zero,
while if a Lagrange multiplier is positive its associated constraint must be binding. As you
CALCULUS AND OPTIMISATION 601
will be asked to show in exercise A2.33, the Lagrange multiplier λ∗j can be interpreted as
the marginal increase in the objective function when the jth constraint is relaxed.
The linear independence condition in Theorem A2.20 is one among a variety of pos-
sible constraint qualifications. To see that some such qualification is needed, consider
the problem of maximising, f (x) = x subject to g(x) = x3 ≤ 0, where D = (−∞, ∞). In
this case, x∗ = 0, ∇g(x∗ ) = 0 and ∇f (x∗ ) = 1. Hence, the conclusion of Theorem A2.20
fails. This does not contradict the theorem of course, because the singleton set of gradi-
ents {∇g(x∗ )} corresponding to the single binding constraint is not linearly independent.
Thus, one cannot simply remove the constraint qualification. Exercise A2.31 provides sev-
eral constraint qualifications, each of which can replace the linear independence condition
given in Theorem A2.20 without changing the conclusion of the theorem except insofar as
the uniqueness of the Lagrange multipliers is concerned.
whenever the maximum exists. The maximum value of an objective function, f , subject to
a single (binding) constraint, g, is illustrated in Fig. A2.13.
Clearly, the solutions to (A2.35) will depend in some way on the vector of param-
eters, a ∈ A. Do the solutions vary continuously with a? Does the maximised value V(a)
vary continuously with a ∈ A. We will provide answers to both of these questions.
To ensure continuity of the value function or of the solution in the vector of param-
eters a, we not only need to ensure that the objective function f is continuous, we need
also to ensure that small changes in a have only a small effect on the set of feasible values
of x. There are essentially two ways this can fail. The set of feasible values of x might
602 CHAPTER A2
x2
x 2 (a)
L (y*); y* ⫽ f (x (a), a)
g (x, a) ⫽ 0
x1
x 1 (a)
dramatically shrink or expand. Continuity of the gj functions ensures that dramatic expan-
sions cannot occur when a changes only slightly. To ensure that dramatic shrinkages do
not occur requires an additional condition. Both conditions are contained in the following
definition.
(i) A solution to (A2.35) exists for every a ∈ A, and therefore the value function V(a)
is defined on all of A.
(ii) The value function, V : A → R, is continuous.
(iii) Suppose that (xk , ak ) is a sequence in Rn × A converging to (x∗ , a∗ ) ∈ Rn × A.
If for every k, xk is a solution to (A2.35) when a = ak , then x∗ is a solution to
(A2.35) when a = a∗ .
(iv) If for every a ∈ A the solution to (A2.35) is unique and given by the function x(a),
then x : A → Rn is continuous.
7 This definition is equivalent to notions of upper and lower semicontinuity in the theory of correspondences.
CALCULUS AND OPTIMISATION 603
Proof: Part (i) follows immediately from Theorem A1.10 because the compactness of S
and the continuity of each gj implies that for every a ∈ A the set of x ∈ Rn satisfying the
m constraints g1 (x, a) ≤ 0, . . . , gm (x, a) ≤ 0 is compact and because we have assumed
throughout that it is non-empty.
Let us prove part (iii) next. Suppose, by way of contradiction, that (iii) fails. Then x∗
is not a solution to (A2.35) when a = a∗ . This means that there is some x̂ ∈ Rn such that
(x̂, a∗ ) ∈ S and f (x̂, a∗ ) > f (x∗ , a∗ ). Because ak converges to a∗ , constraint-continuity
applied to (x̂, a∗ ) implies there is a sequence x̂k in Rn converging to x̂ such that (x̂k , ak )
satisfies the constraints for every k. The continuity of f implies that (see Theorem A1.9)
f (x̂k , ak ) converges to f (x̂, a∗ ) and that f (xk , ak ) converges to f (x∗ , a∗ ). Consequently,
because f (x̂, a∗ ) > f (x∗ , a∗ ) we have,
But this contradicts the fact that xk solves (A2.35) when a = ak , and completes the proof
of part (iii).
To prove part (ii), suppose that {ak }k∈I is a sequence in A converging to a∗ ∈ A. By
Theorem A1.9 it suffices to show that V(ak ) converges to V(a∗ ). Suppose by way of con-
tradiction, that V(ak ) does not converge to V(a∗ ). Then for some ε > 0, there is an infinite
subset I of I such that for every k ∈ I , V(ak ) fails to be within ε of V(a∗ ). By definition of
the value function, for each k ∈ I there is a solution xk of (A2.35) when a = ak such that
V(ak ) = f (xk , ak ). Because each (xk , ak ) is in the compact set S, Theorem A1.8 implies
that the sequence {(xk , ak )}k∈I has a convergent subsequence, {(xk , ak )}k∈I , converging
to, say (x̂, â), where I is an infinite subset of I . Because {ak }k∈I converges to a∗ , the sub-
sequence {ak }k∈I also converges to a∗ . Consequently, {V(ak ) = f (xk , ak )}k∈I converges
to f (x̂, a∗ ) by the continuity of f . But because for k ∈ I each xk solves (A2.35) when
a = ak , part (iii) implies that x̂ solves (A2.35) when a = a∗ . Hence, V(a∗ ) = f (x̂, a∗ ),
from which we conclude that {V(ak )}k∈I converges to V(a∗ ). But this contradicts the fact
that V(ak ) fails to be within ε > 0 of V(a∗ ) for every k ∈ I ⊇ I .
To prove part (iv), suppose that {ak }k∈I is a sequence in A converging to a∗ ∈ A.
By Theorem A1.9 it suffices to show that x(ak ) converges to x(a∗ ). Suppose by way of
contradiction, that x(ak ) does not converge to x(a∗ ). Then for some ε > 0, there is an
infinite subset I of I such that for every k ∈ I , x(ak ) fails to be within ε of x(a∗ ). Defining
xk = x(ak ) for every k, the proof now proceeds as in the proof of part (ii) and is left as an
exercise.
If the solution to (A2.35) is always unique and the objective function, constraint, and
solutions are differentiable in the parameter, a, there is a very powerful theorem that can
be used to analyse the behaviour of the value function V(a) as the vector of parameters,
a, changes. This is known as the Envelope theorem. To keep the notation simple, we will
prove the theorem when there is just a single constraint, i.e., when m = 1. You are invited
to generalise the result to the case of many constraints in the exercises.
604 CHAPTER A2
where the right-hand side denotes the partial derivative of the Lagrangian function with
respect to the parameter aj evaluated at the point (x(a), λ(a)).
The theorem says that the total effect on the optimised value of the objective function
when a parameter changes (and so, presumably, the whole problem must be reoptimised)
can be deduced simply by taking the partial of the problem’s Lagrangian with respect to
the parameter and then evaluating that derivative at the solution to the original problem’s
first-order Kuhn-Tucker conditions. Although we have confined ourselves in the statement
of the theorem to the case of a single constraint, the theorem applies regardless of the
number of constraints, with the usual proviso that there be fewer constraints than choice
variables. Because of the importance of this theorem, and because it is not so obviously
true, we will work through a rather extended proof of the version given here.
By hypothesis, x(a) and λ(a) satisfy the first-order Kuhn-Tucker conditions given in
Theorem A2.20. Therefore for every a ∈ U, and because the constraint is binding, we
have,
∂f (x(a), a) ∂g(x(a), a)
− λ(a) = 0, i = 1, . . . , n
∂xi ∂xi (P.1)
g(x(a), a) = 0.
∂L ∂f (x, a) ∂g(x, a)
= −λ .
∂aj aj ∂aj
CALCULUS AND OPTIMISATION 605
If we can show that the partial derivative of the maximum-value function with respect to
aj is equal to the right-hand side of (P.2), we will have proved the theorem.
We begin by directly differentiating V(a) with respect to aj . Because aj affects
f directly and indirectly through its influence on each variable xi (a), we will have to
remember to use the chain rule. We get
n
∂V(a) ∂f (x(a), a) ∂xi (a) ∂f (x(a), a)
= + .
∂aj ∂xi ∂aj ∂aj
i=1
chain rule
Now, go back to the first-order conditions (P.1). Rearranging the first one gives
∂f (x(a), a) ∂g(x(a), a)
≡ λ(a) , i = 1, . . . , n.
∂xi ∂xi
Substituting into the bracketed term of the summation, we can rewrite the partial derivative
of V(a) as
n
∂V(a) ∂g(x(a), a) ∂xi (a) ∂f (x(a), a)
= λ(a) + . (P.3)
∂aj ∂xi ∂aj ∂aj
i=1
The final ‘trick’ is to go back again to the first-order conditions (P.1) and look at
the second identity in the system. Because g(x(a), a) ≡ 0, we can differentiate both sides
of this identity with respect to aj and they must be equal. Because the derivative of the
constant zero is zero, we obtain
n
∂g(x(a), a) ∂xi (a) ∂g(x(a), a)
+ ≡ 0.
∂xi ∂aj ∂aj
i=1
chain rule again
Rearranging yields
n
∂g(x(a), a) ∂g(x(a), a) ∂xi (a)
≡− .
∂aj ∂xi ∂aj
i=1
606 CHAPTER A2
Moving the minus sign into the brackets, we can substitute the left-hand side of this identity
for the entire summation term in (P.3) to get
The right-hand side of (P.4) is the same as the right-hand side of (P.2). Thus,
∂V(a) ∂ L
=
∂aj ∂aj x(a),λ(a)
as we wanted to show.
EXAMPLE A2.11 Let us see if we can verify the Envelope theorem. Suppose we have
f (x1 , x2 ) ≡ x1 x2 and a simple constraint g(x1 , x2 ) ≡ 2x1 + 4x2 − a. We are given the
problem
and would like to know how the maximum value of the objective function varies with
the (single, scalar) parameter a. We will do this two ways: first, we will derive the func-
tion V(a) explicitly and differentiate it to get our answer. Then we will use the Envelope
theorem to see if we get the same thing.
To form V(a), we must first solve for the optimal values of the choice variables in
terms of the parameter. We would then substitute these into the objective function as in
(A2.36) to get an expression for V(a). Notice that this problem differs slightly from the
one in (A2.35) because we do not require non-negativity on the choice variables. Thus, we
can dispense with the Kuhn-Tucker conditions and just use the simple Lagrangian method.
Forming the Lagrangian, we get
L1 = x2 − 2λ = 0
L2 = x1 − 4λ = 0 (E.1)
Lλ = a − 2x1 − 4x2 = 0.
These can be solved to find x1 (a) = a/4, x2 (a) = a/8, and λ(a) = a/16. We form the
maximum-value function by substituting the solutions for x1 and x2 into the objective
CALCULUS AND OPTIMISATION 607
function. Thus,
a a a2
V(a) = x1 (a)x2 (a) = = .
4 8 32
Differentiating V(a) with respect to a will tell us how the maximised value of the objective
function varies with a. Doing that we get
dV(a) a
= .
da 16
Now let us verify this using the Envelope theorem. The theorem tells us that to see
how the maximised value of the function varies with a parameter, simply differentiate the
Lagrangian for the maximisation problem with respect to the parameter and evaluate that
derivative at the solution to the first-order conditions (E.1). Applying the theorem, we first
obtain
dV(a) ∂L
= = λ.
da ∂a
We then evaluate this at the solution to (E.1), where λ(a) = a/16. This gives us
dV(a) a
= λ(a) = ,
da 16
which checks.
Besides verifying that the Envelope theorem ‘works’, this example has also given
us some insight into what interpretation we can give to those ‘incidental’ variables, the
Lagrangian multipliers. This is pursued further in the exercises.
Although we have confined attention here to maximisation problems and their asso-
ciated value functions, it should be clear that we could also construct value functions for
minimisation problems analogously, and that the Envelope theorem would apply for them
as well.
p1 x1 + p2 x2 = I,
608 CHAPTER A2
x2
x1
where p1 , p2 , and I are positive constants, then every point (a1 , a2 ) ∈ A is such that
p1 a1 + p2 a2 > I,
p1 b1 + p2 b2 < I.
Imagine now two disjoint convex sets in R3 , say a sphere and a box with the sphere
entirely outside the box. Again, it is obvious that we can separate the two sets, this time
with a plane, and an identical analytic expression, but now with all vectors in R3 , describes
the situation.
The separation theorems below generalise this to any number of dimensions. We will
provide two theorems. The second theorem strictly generalises the first and allows the sets
A and B to be, for example, open and ‘tangent’ to one another.
p · c ≥ α, for every c ∈ C.
CALCULUS AND OPTIMISATION 609
If the closed set C were bounded, it would be compact. Then, because c is a
continuous real-valued function of c, we could apply Theorem A1.10 and conclude that
a solution to (P.1) exists. But C need not be bounded. However, choose any c0 ∈ C, and
consider the problem,
2ĉ · c − 2ĉ · ĉ ≥ 0,
Because ĉ ∈ C and 0 ∈
/ C imply that ĉ = 0, we may conclude that,
2
ĉ · c ≥ ĉ > 0, for every c ∈ C.
Hence, setting p = ĉ/ ĉ and α = ĉ > 0, we have
p · c ≥ α, for every c ∈ C,
as desired.
610 CHAPTER A2
If, in addition, the sets A and B are closed and at least one is bounded, p ∈ Rn can be
chosen so that for some α > 0,
Proof: Let us begin with the second part of the theorem, where it is assumed in addition
that both A and B are closed and one is bounded. Define C to be the set difference A − B
consisting of all points of the form a − b where a ∈ A and b ∈ B. It is not difficult to argue
that C is convex (try it!). With a little more effort it can also be shown that C is closed (see
Exercise A2.37). Moreover, because A and B are disjoint, C does not contain the origin, 0.
Hence, we may apply Theorem A2.23 and conclude that there is a vector p ∈ Rn of length
one and α > 0 such that p · c ≥ α for every c ∈ C. But, by the definition of C, this means
that,
as desired.
Let us turn now to the first part of the theorem, where neither A nor B need be
closed or bounded. Once again letting C = A − B, it is still the case that 0 ∈ / C and C
is convex, but C need no longer be closed. Thus, we cannot appeal directly to Theorem
A2.23. Instead, let C̄ be the set of all limits of convergent sequences of points in C. The
set C̄ is closed and convex (convince yourself!). Moreover, C̄ contains C because every
point c in C is the limit of the constant sequence c, c, . . . . If 0 ∈ / C̄, then we may apply
Theorem A2.23 exactly as when C is closed, and we are done. So, suppose that 0 ∈ C̄. By
Exercise A2.38, it suffices to show that 0 is a member of ∂ C̄, the boundary of C̄, because
then there is a vector p ∈ Rn of length one such that p · c ≥ 0 for every c ∈ C and the
desired conclusion follows from the definition of C. Hence, it remains only to show that
0 ∈ ∂ C̄.
Suppose, by way of contradiction, that 0 ∈ C̄ but that 0 ∈ / ∂ C̄. Consequently, by the
definition of the boundary of C̄, 0 ∈ C̄ is not the limit of any sequence of points outside of
C̄. Therefore, there exists ε > 0 such that Bε (0), the ε-ball with centre 0, is contained in
C̄. (Think about why this must be so.) Let ei denote the ith unit vector in Rn and let 1 be
the n-vector of 1’s. Choose δ > 0 small enough so that δei and −δ1 are in Bε (0) ⊂ C̄ for
every i = 1, . . . , n. By the definition of C̄, for each i = 0, 1, . . . , n, there is a sequence,
{cki }∞
k=1 , of points in C such that
For each k, let Ck be the set of all convex combinations of ck0 , ck1 , . . . , ckn . That is,
Ck = {c ∈ Rn | c = ni=0 λi cki for some non-negative λ0 , λ1 . . . , λn summing to one}. The
set Ck is closed and convex (check this!). Moreover, Ck is contained in C because every
point in Ck is a convex combination of points in the convex set C. Consequently, 0 ∈ / Ck .
We may therefore appeal to Theorem A2.23 to conclude that there is a vector p ∈ Rn of k
Because the sequence {pk } is bounded, Theorem A1.8 implies that it has a conver-
gent subsequence, {pk }k∈K , where K is an infinite
subset of the the indices 1, 2, . . .. Let p̂
be the limit of this subsequence and note that p̂ = 1 being the limit of vectors whose
length is one. Taking the limit in (P.2) as k ∈ K tends to infinity and using (P.1) gives,
The last n inequalities in (P.3) imply that p̂i ≥ 0 for i = 1, . . . , n. Together with the first
inequality in (P.3), this implies that p̂ = 0, contradicting the fact that p̂ has length one and
completing the proof.
The two separation theorems presented here are sufficient for most purposes. One
might wonder about other such theorems. For example, can a point on the boundary of a
convex set be separated from the set? Exercise A2.39 explores this question.
A2.6 EXERCISES
A2.1 Differentiate the following functions. State whether the function is increasing, decreasing, or
constant at the point x = 2. Classify each as locally concave, convex, or linear at the point x = 2.
(a) 11x3 − 6x + 8.
(b) (3x2 − x)(6x + 1).
(c) x2 − (1/x3 ).
(d) (x2 + 2x)3 .
(e) [3x/(x3 + 1)]2 .
(f) [(1/x2 + 2) − (1/x − 2)]4 .
1 2
(g) x et dt.
A2.2 Find all first-order partial derivatives.
(a) f (x1 , x2 ) = 2x1 − x12 − x22 .
(b) f (x1 , x2 ) = x12 + 2x22 − 4x2 .
(c) f (x1 , x2 ) = x13 − x22 − 2x2 .
612 CHAPTER A2
∂y ∂y ∂y
+ + = (x1 + x2 + x3 )2 .
∂x1 ∂x2 ∂x3
A2.5 Find the Hessian matrix and construct the quadratic form, zT H(x)z, when
(a) y = 2x1 − x12 − x22 .
(b) y = x12 + 2x22 − 4x2 .
(c) y = x13 − x22 + 2x2 .
(d) y = 4x1 + 2x2 − x12 + x1 x2 − x22 .
(e) y = x13 − 6x1 x2 + x23 .
A2.6 Prove that the second-order own partial derivatives of a convex function must always be
non-negative.
A2.7 Complete Example A2.4 for the partial with respect to x2 .
A2.8 Suppose f (x1 , x2 ) = x12 + x22 .
(a) Show that f (x1 , x2 ) is homogeneous of degree 1.
(b) According to Euler’s theorem, we should have f (x1 , x2 ) = (∂f /∂x1 )x1 + (∂f /∂x2 )x2 . Verify this.
A2.9 Suppose f (x1 , x2 ) = (x1 x2 )2 and g(x1 , x2 ) = (x12 x2 )3 .
(a) f (x1 , x2 ) is homogeneous. What is its degree?
(b) g(x1 , x2 ) is homogeneous. What is its degree?
(c) h(x1 , x2 ) = f (x1 , x2 )g(x1 , x2 ) is homogeneous. What is its degree?
(d) k(x1 , x2 ) = g(f (x1 , x2 ), f (x1 , x2 )) is homogeneous. What is its degree?
(e) Prove that whenever f (x1 , x2 ) is homogeneous of degree m and g(x1 , x2 ) is homogeneous of
degree n, then k(x1 , x2 ) = g(f (x1 , x2 ), f (x1 , x2 )) is homogeneous of degree mn.
A2.10 A real-valued function h on D ⊂ Rn is called homothetic if it can be written in the form g(f (x)),
where g : R → R is strictly increasing and f : D → R is homogeneous of degree 1. Show that if the
CALCULUS AND OPTIMISATION 613
∂h(tx)/∂xi
∂h(tx)/∂xj
is constant in t > 0. What does this say about the level sets of the function h?
A2.11 Let F(z) be an increasing function of the single variable z. Form the composite function, F(f (x)).
Show that x∗ is a local maximum (minimum) of f (x) if and only if x∗ is a local maximum (minimum)
of F(f (x)).
A2.12 Suppose that f (x) is a concave function and M is the set of all points in Rn that give global maxima
of f . Prove that M is a convex set.
A2.13 Let f (x) be a convex function. Prove that f (x) reaches a local minimum at x̃ if and only if f (x)
reaches a global minimum at x̃.
A2.14 Prove that if f (x) is strictly convex, and if x̃ is a global minimiser of f (x), then x̃ is the unique global
minimiser of f (x).
A2.15 Check the calculations in Example A2.6 by using the substitution method to solve the system of
first-order partials. Then evaluate the function at x1∗ = 3/7 and x2∗ = 8/7 and find y∗ . Verify what we
found in Example A2.7 by evaluating the function at any other point and comparing to y∗ .
A2.16 Find the critical points when
A2.17 Prove Theorem A2.15 for the case of strictly convex functions.
A2.18 Let f (x) be a real-valued function defined on Rn+ , and consider the matrix
⎛ ⎞
0 f1 . . . fn
⎜f1 f11 . . . f1n ⎟
⎜ ⎟
H∗ ≡ ⎜ . .. .. . ⎟.
⎝ .. . . .. ⎠
fn fn1 . . . fnn
This is a different sort of bordered Hessian than we considered in the text. Here, the matrix of
second-order partials is bordered by the first-order partials and a zero to complete the square matrix.
The principal minors of this matrix are the determinants
0 f1 f2
0 f1
D2 = , D3 = f1 f11 f12 , ... Dn = |H∗ |.
f1 f11 f2 f21 f22
614 CHAPTER A2
Arrow and Enthoven (1961) use the sign pattern of these principal minors to establish the following
useful results:
(i) If f (x) is quasiconcave, these principal minors alternate in sign as follows: D2 ≤ 0, D3 ≥ 0, . . . .
(ii) If for all x ≥ 0, these principal minors (which depend on x) alternate in sign beginning with
strictly negative: D2 < 0, D3 > 0, . . . , then f (x) is quasiconcave on the non-negative orthant.
Further, it can be shown that if, for all x 0, we have this same alternating sign pattern on
those principal minors, then f (x) is strictly quasiconcave on the (strictly) positive orthant.
(a) The function f (x1 , x2 ) = x1 x2 + x1 is quasiconcave on R2+ . Verify that its principal minors
alternate in sign as in (ii).
(b) Let f (x1 , x2 ) = a ln(x1 + x2 ) + b, where a > 0. Is this function strictly quasiconcave for x 0?
Is it quasiconcave? How about for x ≥ 0, but not equal to zero? Justify.
A2.19 Let f (x1 , x2 ) = (x1 x2 )2 . Is f (x) concave on R2+ ? Is it quasiconcave on R2+ ?
A2.20 Show that the converse of statement 4 of Theorems A2.1 and A2.4 are not true, by showing that
f (x) = −x4 is strictly concave on R, but its second derivative is not everywhere strictly positive.
A2.21 Complete the proof of Theorem A2.3.
A2.22 Complete the proof of Theorem A2.4.
A2.23 Use part 2 of Theorem A2.4 to prove Theorem A2.5. In particular, consider the product zT H(x)z
when z is one of the n unit vectors in Rn .
A2.24 Find the local extreme values and classify the stationary points as maxima, minima, or neither.
(a) f (x1 , x2 ) = 2x1 − x12 − x22 .
(b) f (x1 , x2 ) = x12 + 2x22 − 4x2 .
(c) f (x1 , x2 ) = x13 − x22 + 2x2 .
(d) f (x1 , x2 ) = 4x1 + 2x2 − x12 + x1 x2 − x22 .
(e) f (x1 , x2 ) = x13 − 6x1 x2 + x23 .
A2.25 Solve the following problems. State the optimised value of the function at the solution.
(a) minx1 ,x2 x12 + x22 s.t. x1 x2 = 1.
(b) minx1 ,x2 x1 x2 s.t. x12 + x22 = 1.
(c) maxx1 ,x2 x1 x22 s.t. x12 /a2 + x22 /b2 = 1.
(d) maxx1 ,x2 x1 + x2 s.t. x14 + x24 = 1.
(e) maxx1 ,x2 ,x3 x1 x22 x33 s.t. x1 + x2 + x3 = 1.
A2.26 Graph f (x) = 6 − x2 − 4x. Find the point where the function achieves its unconstrained (global)
maximum and calculate the value of the function at that point. Compare this to the value it achieves
when maximised subject to the non-negativity constraint x ≥ 0.
A2.27 In minimising f (x), subject to x ≥ 0, there are thee possible cases that could arise. The constraint
could be binding, binding but irrelevant, or not binding. Construct three graphs like those in Fig.
A2.9 to illustrate these three cases. Convince yourself that the three conditions in (A2.32) account
CALCULUS AND OPTIMISATION 615
for all three cases. Construct a fourth case showing the ‘troublesome’ case alluded to in the text and
explain why it would be ruled out by the conditions in (A2.32).
A2.28 State the Kuhn-Tucker theorem for the following minimisation problem
A2.29 In the proof of Theorem A2.20 we used the fact that the set of non-negative linear combinations of
finitely many vectors in Rn is a closed set. This exercise
will guide you towards a proof of this. Let
a1 , . . . , aN be vectors in Rn and let B = {b ∈ Rn | b = Ni=1 λi a , for some λ1 ≥ 0, . . . , λN ≥ 0}.
i
is in B.
(a) Argue that any b in B can always be written as a minimal non-negative linear combination of
the ai , where minimal means that the number of ai ’s given positive weight by the λi ’s cannot be
reduced.
(b) Prove that if bk = N i=1 λi a for each k = 1, 2, . . . , and for each i the non-negative sequence
k i
∞
{λi }k=1 is bounded, then b∗ is in B.
k
(c) Suppose that bk = N i=1 λi a for each k = 1, 2, . . . , and that for some i the non-negative
k i
∞
sequence {λi }k=1 is unbounded.
k
∗ i
(i) Divide bk by the sum λk1 + . . . + λkN and conclude that N i=1 βi a = 0, the zero vector, for
∗ ∗
some non-negative β1 , . . . , βN summing to one.
(ii) Argue that there exists βi∗ > 0 and k such that,
λkj λki
≥ > 0, for all j such that βj∗ > 0.
βj∗ βi∗
(iii) Conclude from (i) and (ii) that for the i and k identified there
N λki j
bk = λki ai = (λkj − βj∗ )a ,
βi∗
i=1 j =i
N
i=1 λi a
so that k i does not express bk as a minimal non-negative linear combination of
the ai .
(d) Conclude from (a)–(c) that because each term in the sequence b1 , b2 , . . . can be written as a
minimal non-negative linear combination of the ai , the sequences of weights in those linear
combinations must be bounded and therefore that b∗ is in B as desired.
A2.30 Show that Theorem A2.20 reduces to Theorem A2.19 when the only constraints are x1 ≥
0, . . . , xn ≥ 0.
A2.31 Let f (x) and gj (x), j = 1, . . . , m, be real-valued functions over some domain D ∈ Rn . Let x∗ be an
interior point of D and suppose that x∗ maximises f (x) on D subject to the constraints, gj (x) ≤ 0
j = 1, . . . , m. Assume that at the optimum, x∗ , f and each gj are continuously differentiable and
616 CHAPTER A2
constraints j = 1, . . . , K are binding and constraints j = K + 1, . . . , m are not binding. Call con-
straint j linear if gj (x) = aj + bj · x, some aj ∈ R and bj ∈ Rn . Otherwise, constraint j is non-linear.
Consider the following collection of constraint qualification conditions.
(i) ∇g1 (x∗ ), . . . , ∇gK (x∗ ) are linearly independent.
(ii) No convex combination of ∇g1 (x∗ ), . . . , ∇gK (x∗ ) is the zero vector.
(iii) There exists z ∈ Rn such that ∇gj (x∗ ) · z < 0 for every j = 1, . . . , K
(iv) There exists z ∈ Rn such that ∇gj (x∗ ) · z ≤ 0 for every j = 1, . . . , K with the inequality strict
for non-linear constraints.
(v) For every p ∈ Rn such that ∇g1 (x∗ ) · p ≤ 0, . . . ,∇gK (x∗ ) · p ≤ 0 and for every δ > 0, there
exists ε > 0 and a continuously differentiable function h : (−ε, ε) → Rn such that h(0) = x∗ ,
∇h(0) is within δ of p, and gj (h(s)) ≤ 0 for every s ∈ (−ε, ε) and every j = 1, . . . , K.
(a) Show that (i)⇒(ii)⇒(iii)⇒(iv)⇒(v).
(b) Show that (iv) is always satisfied if all the constraints are linear. Conclude, by (a), that (iv) and
(v) are always satisfied if all the constraints are linear.
(c) Using the proof of Theorem A2.20 as a guide, prove that if (v) holds, there exist Lagrange
multipliers, λ∗1 , . . . , λ∗K , all non-negative, such that,
K
∇f (x∗ ) − λ∗j ∇gj (x∗ ) = 0.
j=1
You need not prove that the λ∗j are unique. Note that, by (a), you will then have proved that
such λ∗j exist when any one of the constraint qualification conditions (i)–(v) holds. (Of course,
Theorem A2.20 covers the case when (i) holds, and in that particular case the λ∗j are unique.)
You will therefore have generalised Theorem A2.20.
A2.32 Arrow and Enthoven (1961) consider the quasiconcave programming problem
It is obvious that increasing a cannot reduce the maximised value of f because the feasible set
increases. Prove this another way by appealing to the envelope and Kuhn-Tucker theorems. (This
second proof is of course not as good as the first, both because it is not as simple and because it
requires additional assumptions.)
A2.35 Complete the proof of Theorem A2.21.
A2.36 Generalise the Envelope theorem to the case of many constraints. Assume that, locally (i.e., for all
a ∈ U), some constraints are always binding and the remainder are always non-binding.
A2.37 Suppose that A and B are closed subsets of Rn and that A is bounded.
(a) Prove that A − B is closed.
(b) Let A be the subset of R2 weakly below the horizontal axis, and let B be the subset of R2 weakly
above the hyperbola in the positive orthant defined by y = 1/x. Show that A and B are closed,
but that A − B is not.
A2.38 Suppose that A is a closed convex subset of Rn and that a∗ is an element of the boundary of A.
(a) Use the definition of the boundary of a set to show that there is a sequence of points a1 , a2 , . . .
not contained in A and converging to a∗ .
(b) For each k, use Theorem A2.23 to establish the existence of a vector pk of length one satisfying,
pk · a ≥ pk · ak for every a ∈ A.
p̂ · a ≥ p̂ · a∗ for every a ∈ A.
A2.39 Repeat Exercise A2.38 without assuming that A is closed. In part (b) use Theorem A2.24 rather than
Theorem A2.23.
LI S T OF TH E O R E M S
CHAPTER 1
1.1 Existence of a Real-Valued Function Representing the Preference Relation 14
1.2 Invariance of the Utility Function to Positive Monotonic Transforms 17
1.3 Properties of Preferences and Utility Functions 17
1.4 Sufficiency of Consumer’s First-Order Conditions 24
1.5 Differentiable Demand 27
1.6 Properties of the Indirect Utility Function 29
1.7 Properties of the Expenditure Function 37
1.8 Relations Between Indirect Utility and Expenditure Functions 42
1.9 Duality Between Marshallian and Hicksian Demand Functions 45
1.10 Homogeneity and Budget Balancedness 49
1.11 The Slutsky Equation 53
1.12 Negative Own-Substitution Terms 55
1.13 The Law of Demand 56
1.14 Symmetric Substitution Terms 56
1.15 Negative Semidefinite Substitution Matrix 57
1.16 Symmetric and Negative Semidefinite Slutsky Matrix 58
1.17 Aggregation in Consumer Demand 61
CHAPTER 2
2.1 Constructing a Utility Function from an Expenditure Function 75
2.2 The Expenditure Function of Derived Utility, u, is E 76
620 LIST OF THEOREMS
CHAPTER 3
3.1 (Shephard) Homogeneous Production Functions are Concave 131
3.2 Properties of the Cost Function 138
3.3 Properties of Conditional Input Demands 139
3.4 Cost and Conditional Input Demands when Production is Homothetic 140
3.5 Recovering a Production Function from a Cost Function 144
3.6 Integrability for Cost Functions 144
3.7 Properties of the Profit Function 148
3.8 Properties of Output Supply and Input Demand Functions 149
3.9 The Short-Run, or Restricted, Profit Function 152
CHAPTER 4
CHAPTER 5
5.1 Basic Properties of Demand 203
5.2 Properties of Aggregate Excess Demand Functions 204
5.3 Aggregate Excess Demand and Walrasian Equilibrium 207
5.4 Utility and Aggregate Excess Demand 209
5.5 Existence of Walrasian Equilibrium 211
5.6 Core and Equilibria in Competitive Economies 215
5.7 First Welfare Theorem 217
5.8 Second Welfare Theorem 218
Cor. 5.1 Another Look at the Second Welfare Theorem 219
5.9 Basic Properties of Supply and Profits 221
5.10 Properties of Y 222
LIST OF THEOREMS 621
CHAPTER 6
6.1 Arrow’s Impossibility Theorem 272
6.2 Rawlsian Social Welfare Functions 282
6.3 Utilitarian Social Welfare Functions 284
6.4 The Gibbard-Satterthwaite Theorem 291
CHAPTER 7
7.1 Simplified Nash Equilibrium Tests 315
7.2 (Nash) Existence of Nash Equilibrium 317
7.3 Existence of Bayesian-Nash Equilibrium 323
7.4 (Kuhn) Backward Induction and Nash Equilibrium 336
Cor. 7.1 Existence of Pure Strategy Nash Equilibrium 337
7.5 Subgame Perfect Equilibrium Generalises Backward Induction 342
7.6 (Selten) Existence of Subgame Perfect Equilibrium 346
7.7 (Kreps and Wilson) Existence of Sequential Equilibrium 363
CHAPTER 8
8.1 Separating Equilibrium Characterisation 392
8.2 Pooling Equilibrium Characterisation 397
8.3 Intuitive Criterion Equilibrium 401
8.4 Non-existence of Pooling Equilibria 408
8.5 Separating Equilibrium Characterisation 409
622 LIST OF THEOREMS
CHAPTER 9
9.1 First-Price Auction Symmetric Equilibrium 431
9.2 Dutch Auction Symmetric Equilibrium 432
9.3 Second-Price Auction Equilibrium 434
9.4 English Auction Equilibrium 434
9.5 Incentive-Compatible Direct Selling Mechanisms 441
9.6 Revenue Equivalence 443
9.7 An Optimal Selling Mechanism 451
9.8 The Optimal Selling Mechanism Simplified 453
9.9 An Optimal Auction Under Symmetry 455
9.10 Truth-Telling is Dominant in the VCG Mechanism 463
9.11 The Budget-Balanced Expected Externality Mechanism 467
9.12 Achieving a Balanced Budget 474
9.13 IR-VCG Expected Surplus: Sufficiency 476
9.14 Costs Differ by a Constant 479
9.15 A General Revenue Equivalence Theorem 479
9.16 Maximal Revenue Subject to Efficiency and Individual Rationality 480
9.17 IR-VCG Expected Surplus: Necessity 481
MATHEMATICAL APPENDIX
CHAPTER A1
A1.1 The Intersection of Convex Sets is Convex 502
A1.2 On Open Sets in Rn 509
A1.3 Every Open Set is a Collection of Open Balls 510
A1.4 On Closed Sets in Rn 511
A1.5 Upper and Lower Bounds in Subsets of Real Numbers 514
A1.6 Continuity and Inverse Images 518
A1.7 The Continuous Image of a Compact Set is a Compact Set 519
A1.8 On Bounded Sequences 520
A1.9 Sequences, Sets, and Continuous Functions 520
A1.10 (Weierstrass) Existence of Extreme Values 521
A1.11 The Brouwer Fixed-Point Theorem 523
LIST OF THEOREMS 623
CHAPTER A2
A2.1 Concavity and First and Second Derivatives 553
A2.2 Young’s Theorem 557
A2.3 Single-Variable and Multivariable Concavity 558
A2.4 Slope, Curvature, and Concavity in Many Variables 559
A2.5 Concavity, Convexity, and Second-Order Own Partial Derivatives 561
A2.6 Partial Derivatives of Homogeneous Functions 562
A2.7 Euler’s Theorem 564
A2.8 Necessary Conditions for Local Interior Optima in the Single-Variable Case 567
A2.9 First-Order Necessary Condition for Local Interior Optima of Real-Valued Functions 568
A2.10 Second-Order Necessary Condition for Local Interior Optima of Real-Valued Functions 571
A2.11 Sufficient Conditions for Negative and Positive Definiteness of the Hessian 573
A2.12 Sufficient Conditions for Local Interior Optima of Real-Valued Functions 574
A2.13 (Unconstrained) Local–Global Theorem 575
A2.14 Strict Concavity/Convexity and the Uniqueness of Global Optima 576
A2.15 Sufficient Condition for Unique Global Optima 577
A2.16 Lagrange’s Theorem 584
A2.17 A Sufficient Condition for a Local Optimum in the Two-Variable, One-Constraint Optimisation
Problem 590
A2.18 Sufficient Conditions for Local Optima with Equality Constraints 591
A2.19 Necessary Conditions for Optima of Real-Valued Functions Subject to Non-Negativity
Constraints 594
A2.20 (Kuhn-Tucker) Necessary Conditions for Maxima of Real-Valued Functions Subject to Inequality
Constraints 598
624 LIST OF THEOREMS
CHAPTER 1
1.1 Preference Relation 6
1.2 Strict Preference Relation 6
1.3 Indifference Relation 6
1.4 Sets in X Derived from the Preference Relation 7
1.5 A Utility Function Representing the Preference Relation 13
1.6 Demand Elasticities and Income Shares 60
CHAPTER 2
2.1 Weak Axiom of Revealed Preference (WARP) 92
2.2 Simple Gambles 98
2.3 Expected Utility Property 102
2.4 Risk Aversion, Risk Neutrality, and Risk Loving 111
2.5 Certainty Equivalent and Risk Premium 113
2.6 The Arrow-Pratt Measure of Absolute Risk Aversion 113
CHAPTER 3
3.1 Separable Production Functions 128
3.2 The Elasticity of Substitution 129
3.3 (Global) Returns to Scale 133
626 LIST OF DEFINITIONS
CHAPTER 4
CHAPTER 5
5.1 Pareto-Efficient Allocations 199
5.2 Blocking Coalitions 200
5.3 The Core of an Exchange Economy 201
5.4 Excess Demand 204
5.5 Walrasian Equilibrium 206
5.6 Walrasian Equilibrium Allocations (WEAs) 214
5.7 The Set of WEAs 215
5.8 WEAs in a Production Economy 232
5.9 Pareto-Efficient Allocation with Production 233
5.10 An r-Fold Replica Economy 241
CHAPTER 6
6.1 A Social Preference Relation 269
6.2 Measurability, Comparability, and Invariance 281
6.3 Two More Ethical Assumptions on the Social Welfare Function 282
6.4 Dictatorial Social Choice Function 290
6.5 Strategy-Proof Social Choice Function 291
6.6 Pareto-Efficient Social Choice Function 292
6.7 Monotonic Social Choice Function 292
CHAPTER 7
7.1 Strategic Form Game 307
7.2 Strictly Dominant Strategies 309
LIST OF DEFINITIONS 627
CHAPTER 8
8.1 Signalling Game Pure Strategy Sequential Equilibrium 387
8.2 Separating and Pooling Signalling Equilibria 392
8.3 (Cho and Kreps) An Intuitive Criterion 401
8.4 Separating and Pooling Screening Equilibria 405
CHAPTER 9
9.1 Direct Selling Mechanism 437
9.2 Incentive-Compatible Direct Selling Mechanisms 439
9.3 Ex Post Pareto Efficiency 458
9.4 Direct Mechanisms 459
628 LIST OF DEFINITIONS
MATHEMATICAL APPENDIX
CHAPTER A1
A1.1 Convex Sets in Rn 500
A1.2 Completeness 503
A1.3 Transitivity 504
A1.4 Open and Closed ε-Balls 507
A1.5 Open Sets in Rn 508
A1.6 Closed Sets in Rn 510
A1.7 Bounded Sets 512
A1.8 (Heine-Borel) Compact Sets 515
A1.9 (Cauchy) Continuity 517
A1.10 Open Sets in D 518
A1.11 Closed Sets in D 518
A1.12 Sequences in Rn 519
A1.13 Convergent Sequences 520
A1.14 Bounded Sequences 520
A1.15 Subsequences 520
A1.16 Real-Valued Functions 529
A1.17 Increasing, Strictly Increasing and Strongly Increasing Functions 529
A1.18 Decreasing, Strictly Decreasing and Strongly Decreasing Functions 529
A1.19 Level Sets 530
A1.20 Level Sets Relative to a Point 531
A1.21 Superior and Inferior Sets 532
A1.22 Concave Functions 534
A1.23 Strictly Concave Functions 538
A1.24 Quasiconcave Functions 538
A1.25 Strictly Quasiconcave Functions 541
LIST OF DEFINITIONS 629
CHAPTER A2
A2.1 Partial Derivatives 554
A2.2 Homogeneous Functions 561
A2.3 Constraint-Continuity 602
HI N T S AND AN S W E R S
CHAPTER 1
1.2 Use the definitions.
1.4 To get you started, take the indifference relation. Consider any three points xi ∈ X, i = 1, 2, 3, where
x1 ∼ x2 and x2 ∼ x3 . We want to show that x1 ∼ x2 and x2 ∼ x3 ⇒ x1 ∼ x3 . By definition of
∼, x1 ∼ x2 ⇒ x1 x2 and x2 x1 . Similarly, x2 ∼ x3 ⇒ x2 x3 and x3 x2 . By transitivity of
, x1 x2 and x2 x3 ⇒ x1 x3 . Keep going.
1.16 For (a), suppose there is some other feasible bundle x , where x ∼ x∗ . Use the fact that B is convex,
together with strict convexity of preferences, to derive a contradiction. For (b), suppose not. Use
strict monotonicity to derive a contradiction.
1.22 Use a method similar to that employed in (1.11) to eliminate the Lagrangian multiplier and reduce
(n + 1) conditions to only n conditions.
1.23 For part (2), see Axiom 5 : Note that the sets (x) are precisely the superior sets for the function
u(x). Recall Theorem A1.14.
1.27 Sketch out the indifference map.
1.28 For part (a), suppose by way of contradiction that the derivative is negative.
1.29 Set down all first-order conditions. Look at the one for choice of x0∗ . Use the constraint, and find a
geometric series. Does it converge?
1.32 Feel free to assume that any necessary derivatives exist.
1.33 Roy’s identity.
1.41 Theorem A2.6.
1.46 Euler’s theorem and any demand function, xi (p, y).
1.47 For part (a), start with the definition of e(p, 1). Multiply the constraint by u and invoke homogeneity.
Let z ≡ ux and rewrite the objective function as a choice over z.
632 HINTS AND ANSWERS
Integrate both sides of the inequality from ȳ to y and look for logs. Take it from there.
1.54 For part (b),
n
v(p, y) = A∗ y p−α
i ,
i
i=1
n αi
where A∗ = A i=1 αi .
1.60 Use Slutsky.
1.63 No hints on this.
1.66 For (b), u0 must be v(p0 , y0 ), right? Rewrite the denominator.
1.67 For (a), you need the expenditure function and you need to figure out u0 . For (b), I = (u0 − 1/8)/
(2u0 − 1). For (c), if you could show that the expenditure function must be multiplicatively separable
in prices and utility, the rest would be easy.
CHAPTER 2
2.3 It should be a Cobb-Douglas form.
2.9 Use a diagram.
2.10 To get you started, x2 is revealed preferred to x1 .
2.12 For (a), use GARP to show that, unless φ(xj ) is zero, there is a minimising sequence of distinct
numbers k1 , ..., km defining φ(xj ) such that no k1 , ..., km is equal to j. Hence, k1 , ..., km , j is a feasible
sequence for the minimisation problem defining φ(xk ). For (b), use (a). For (c), recall that each
pk ∈ Rn++ . For (e), the minimum of quasiconcave functions is quasiconcave.
2.13 Let x0 = x(p0 , 1), x1 = x(p1 , 1), and consider f (t) ≡ (p0 − p1 ) · x(p1 + t(p0 − p1 ), (p1 + t(p0 −
p1 )) · x0 ) for t ∈ [0, 1]. Show that if x0 is revealed preferred to x1 at (p0 , 1), then f attains a
maximum uniquely at 0 on [0, 1].
2.14 In each of the two gambles, some of the outcomes in A will have zero probability.
2.16 Remember that each outcome in A is also a gamble in G , offering that outcome with probability 1.
2.17 Axiom G4.
2.19 Which of the other axioms would be violated by the existence of two unequal indifference
probabilities for the same gamble?
2.28 Risk averse.
2.32 Rearrange the definition and see a differential equation. Solve it for u(w).
HINTS AND ANSWERS 633
2.33 If you index his utility function by his initial wealth, then given two distinct wealth levels, how must
the two utility functions be related to one another?
2.34 u(w) = wα+1 /(α + 1).
2.38 For (a), x0 = x1 = 1. For (b), the agent faces two constraints, and x0 = 1, x1H = 3/2 and x1L = 1/2.
For (c), note that future income in the certainty case is equal to the expected value of income in the
uncertain case.
CHAPTER 3
3.16 First find MRTSij and write it as a function of r = xj /xi . Take logs and it should be clear.
3.17 For (a), first take logs to get
n
1 ρ
ln(y) = ln α1 xi .
ρ
i=1
Note that limρ→0 ln(y) = 0/0, so L’Hôpital’s rule applies. Apply that rule to find limρ→0 ln(y), then
convert to an expression for limρ→0 y. Part (b) is tough. If you become exasperated, try consulting
Hardy, Littlewood, and Pólya (1934), Theorem 4.
3.20 Just work with the definitions and the properties of the production function.
3.23 For the second part, let z2 = z1 ≥ 0.
3.32 c(y) ≡ atc(y)y.
3.43 Equations (3.3) and (3.4).
3.45 Work from the first-order conditions.
3.50 Define
sometimes called the variable profit function, and note that πv (p, w, x̄) = π(p, w, w̄, x̄) + w̄ · x̄.
Note that πv possesses every property listed in Theorem 3.7, and that the partial derivatives of πv
and π(p, w, w̄, x̄) with respect to p and w are equal to each other.
3.55 K ∗ = 5 wf /wk .
CHAPTER 4
4.1 Exercise 1.65.
4.2 Try to construct a counterexample.
4.9 In part (b), q1 = 215/6, q2 = 110/6, and p = 275/6.
4.13 p∗1 = p∗2 = 80/3.
634 HINTS AND ANSWERS
CHAPTER 5
5.2 Differentiate the indirect utility function with respect to the price that rises and use Roy’s identity.
5.10 Don’t use fancy maths. Just think clearly about what it means to be Pareto efficient and what it means
to solve the given set of problems.
5.12 Use x2 as numéraire. For (b), remember that neither consumption nor prices can be negative.
5.15 Derive the consumers’ demand functions.
5.16 The function u2 (x) is a Leontief form.
5.17 The relative price of x1 will have to be α/(1 − α).
5.18 For (a), x1 = (10/3, 40/3).
5.19 Calculate z(p) and convince yourself if p∗ is a Walrasian equilibrium, then p∗ 0. Solve the system
of excess demand functions.
5.20 For (b), remember that total consumption of each good must equal the total endowment. Suppose
that p̄ is a market-clearing relative price of good x1 , but that p̄ = p∗ . Derive a contradiction.
5.21 Consider the excess demand for good 2 when the price of good one is positive, and consider the
excess demand for good one when the price of good one is zero.
5.22 For
part (a),
show first that if u(·) is strictly
increasing and quasiconcave, then for α > 0, v(x) =
u x1 + α ni=1 xi , . . . , xn + α ni=1 xi is strongly increasing and quasiconcave. Show next that if
u(·) is strongly increasing and quasiconcave, then for ε ∈ (0, 1), v(x) = u(x1ε , . . . , xnε ) is strongly
increasing and strictly quasiconcave. Now put the two together. For part (c), equilibrium prices can
always be chosen to be non-negative and sum to one and hence contained in a compact set. Hence,
any such sequence has a convergent subsequence.
5.23 See Assumption 5.2 for a definition of strong convexity.
√
5.26 (py /ph )∗ = 4 2 and he works an 8-hour day.
HINTS AND ANSWERS 635
5.27 To show proportionality of the gradients, suppose they are not. Let z = (∇ui (x̄i )/∇ui (x̄i )) −
(∇uj (x̄j )/∇uj (x̄j )), and show that ui (x̄i + tz) and uj (x̄j − tz) are both strictly increasing in t at
t = 0. You may use the Cauchy-Schwartz inequality here, which says that for any two vectors, v and
w, v · w ≤ vw, with equality if and only if the two vectors are proportional.
5.38 Look carefully at the proof in the text. Construct the coalition of worst off members of every type.
Give each coalition member the ‘average’ allocation for his type.
5.39 For (b), translate into terms of these utility functions and these endowments what it means to be
(1) ‘in the box’, (2) ‘inside the lens’, and (3) ‘on the contract curve’. For (d), consider the coalition
S = {11, 12, 21} and find a feasible assignment of goods to consumers that the consumers in S prefer.
5.40 Redistribute endowments equally. This will be envy-free. Invoke Theorem 5.5 and consider the
resulting WEA, x∗ . Invoke Theorem 5.7. Now prove that x∗ is also envy-free.
5.41 For (b), see the preceding exercise.
5.42 Fair allocations are defined in Exercise 5.40.
5.43 For (a), indifference curves must be tangent and all goods allocated. For (b), not in general.
5.46 Exercises 1.65 and 4.1 [Actually, this problem only tells you half the story. It follows from Antonelli’s
theorem that z(p) is both independent of the distribution of endowments and behaves like a single
consumer’s excess demand system if and only if preferences are identical and homothetic. See Shafer
and Sonnenschein (1982) for a proof.]
CHAPTER 6
6.2 Show that VWP and IIA together imply WP.
6.4 Here is the proof mentioned in the stem of the question: Suppose we want u(xk ) = ak for k =
1, 2, . . . , m, where the xk are distinct members of X. Let 2ε > 0 be the minimum Euclidean dis-
tance between any distinct pair of the xk . Letting · denote
Euclidean distance, define u(x) = 0 if
x−xk
for every k we have x − x ≥ ε, and define u(x) = 1 − ε
k ak if x − xk < ε for some k
(by the triangle inequality there can be at most one such k). Prove that u(·) is continuous and that
u(xk ) = ak for every k.
6.5 For part (c), what assumptions did we make about X? What additional assumptions did we make
about social preferences? Do our additional assumptions about individual preferences play a role?
6.8 In part (b), notice that for ε > 0 small enough, u j − ε < u j < α + ε < ui . Now apply HE.
6.9 For part (e), consider changing individual 2’s profile in (a) so that it becomes identical to 3’s profile.
What happens to the social preference between x and z?
6.11 For (a), if x∗ 0 is a WEA, there must exist n prices (p∗i , . . . , p∗n ) such that every (xi )∗ maximises
agent i’s utility over their budget set. Look at these first-order conditions and remember that the
Lagrangian multiplier for agent i will be equal to the marginal utility of income for agent i at the
WEA, ∂vi (p∗ , p∗ · ei )/∂y. Next, note that W must be strictly concave. Thus, if we have some set of
weights α i for i ∈ I and an n-vector of numbers θ = (θ1 , . . . , θn ) such that α i ∇ui ((xi )∗ ) = θ and x∗
satisfies the constraints, then x∗ maximises W subject to the constraints. What if we choose the α i to
636 HINTS AND ANSWERS
be equal to the reciprocal of the marginal utility of income for agent i at the WEA? What could we
use for the vector θ ? Pull the pieces together.
6.12 For (b), consider this three-person, three-alternative case due to Sen (1970a). First, let
xP1 yP1 z, zP2 xP2 y, and zP3 xP3 y. Determine the ranking of x versus z under the Borda rule. Next,
let the preferences of 2 and 3 remain unchanged, but suppose those of 1 become xP̄1 zP̄1 y. Now
consider the same comparison between x and z and make your argument.
6.13 Why can’t (x, y) and (z, w) be the same pair? If x = z, invoke U and suppose that xPk y, wPj x, and
yPi w for all i. Use L∗ and WP to show that transitivity is violated. If x, y, z, and w are all distinct, let
xPk y, zPj w, and suppose that wPi x and yPi z for all i. Take it from here.
6.15 For (b) and (c), see Exercise A2.10 for the necessary definition. For (e),
N i ρ 1/ρ
1 y
E(w, y) = .
N μ
i=1
CHAPTER 7
7.3 For part (b), show first that if a strategy is strictly dominated in some round, then it remains strictly
dominated by some remaining strategy in every subsequent round.
7.5 For (c), when is 99 a best response? To find Wi1 , follow these steps. Step 1: Show that if 14 results
in a tie, then 15, 16, . . . , 100 either lose or tie. Step 2: Show that 14 wins if all other players choose
a number strictly above 14. Step 3: Show that 14 loses only if one-third the average of the numbers
is strictly less than 14. Conclude from steps 2 and 3 that if 14 loses, so do 15, 16, . . . , 100.
7.7 For (a), use the Nash equilibrium existence theorem.
7.8 Employ a fixed-point function similar to the one used to prove the existence of a Nash equilibrium
to prove the existence of a strategy m∗ ∈ M1 for player 1, which maximises u1 (m, m∗ , . . . , m∗ ) over
m ∈ M1 . Then invoke symmetry.
7.9 See Exercise 7.7, part (c), for the definition of a game’s value. See Exercise 7.8 for the definition of
a symmetric game.
7.21 Would a player ever choose a strictly dominated strategy?
7.22 Do not rule out weakly dominated strategies. Are all of these equilibria in pure strategies? Verify
that the high-cost type of a firm 2 earns zero profits in every Bayesian-Nash equilibrium.
7.32 Allow information sets to ‘cross’ one another.
7.42 Can 3’s beliefs about 2 be affected by 1’s choice?
HINTS AND ANSWERS 637
CHAPTER 8
8.1 Recall that wealth in one state is a different commodity from wealth in another state and that the
state in which consumer i alone has an accident is different than that in which only consumer j = i
does. Verify the hypotheses of the First Welfare Theorem and conclude that the competitive outcome
of Section 8.1 is efficient in the sense described there.
8.5 For part (c), suppose there are at least three fixed points. Let x∗ be a fixed point between two others
and think about what would be implied if f (x∗ ) ≥ 1 and what would be implied if f (x∗ ) ≤ 1.
8.7 For (a), suppose not. Could demand for used cars equal supply?
8.12 For part (c), it is not a pooling contract.
8.13 First show that Ll=k (πl (0) − πl (1)) > 0 for all k > 0 by writing the sum instead as Ll=k ππli (0) −
L (1)
1 πl (1) and arguing by contradiction. Finally, apply the following identity: l=0 l l ≡
a b
L L
k=0 l=k la (b k − b k−1 ) for every pair of real sequences {a } L , {b }L , and where b
l l=0 l l=0 −1 ≡ 0.
8.16 For part (a), use the fact that the owner is risk neutral and the worker is risk-averse.
8.17 (a) Because the manager observes only output and not effort, the wage can depend only on output.
Let w(y) denote the worker’s wage when output is y. The manager’s problem is therefore as
follows:
m
max p(yi |e)(yi − w(yi )),
e,w(y1 ),...,w(ym )
i=1
m
m
p(yi |e)u(w(yi ), e) ≥ p(yi |ej )u(w(yi ), ej ), for all j = 1, . . . , n
i=1 i=1
and
m
p(yi |e)u(w(yi ), e) ≥ u(0, e1 ).
i=1
(b) Let e∗ > e1 denote the effort level chosen by the worker in the optimal solution. Suppose, by
way of contradiction, that the wage contract is non-increasing, i.e., that w(yi ) ≥ w(yi+1 ) for all i.
Then, by the monotone likelihood ratio property, and Exercise 8.13,
m
m
p(yi |e∗ )u(w(yi ), e∗ ) ≤ p(yi |e1 )u(w(yi ), e∗ ),
i=1 i=1
638 HINTS AND ANSWERS
m
m
p(yi |e1 )u(w(yi ), e∗ ) < p(yi |e1 )u(w(yi ), e1 ).
i=1 i=1
m
m
p(yi |e∗ )u(w(yi ), e∗ ) < p(yi |e1 )u(w(yi ), e1 ),
i=1 i=1
in violation of the incentive compatibility constraint. This contradiction proves the result.
CHAPTER 9
9.3 Note that (9.3) holds for all v, including v = r.
9.7 What are the first- and second-order conditions for bidder i implied by incentive compatibility?
Because the first-order condition must hold for all vi , it may be differentiated. Use the derivative to
substitute into the second-order condition.
9.13 Did any of our results depend on the values being in [0, 1]?
9.15 (b) What is the induced direct selling mechanism?
9.17 (b) Use Theorem 9.5, and don’t forget about individual rationality.
9.19 You will need to use our assumption that each vi − (1 − Fi (vi ))/fi (vi ) is strictly increasing.
MATHEMATICAL APPENDIX
CHAPTER A1
A1.2 Just use the definitions of subsets, unions, and intersections.
A1.3 To get you started, consider the first one. Pick any x ∈ (S ∩ T)c . If x ∈ (S ∩ T)c , then x ∈ / S ∩ T.
If x ∈
/ S ∩ T, then x ∈ / S or x ∈
/ T. (Remember, this is the inclusive ‘or’.) If x ∈ / S, then x ∈ Sc . If
x∈/ T, then x ∈ T c . Because x ∈ Sc or x ∈ T c , x ∈ Sc ∪ T c . Because x was chosen arbitrarily, what we
have established holds for all x ∈ (S ∩ T)c . Thus, x ∈ (S ∩ T)c ⇒ x ∈ Sc ∪ T c , and we have shown
that (S ∩ T)c ⊂ Sc ∪ T c . To complete the proof of the first law, you must now show that Sc ∪ T c ⊂
(S ∩ T)c .
A1.13 To get you started, let x ∈ f −1 (Bc ). By definition of the inverse image, x ∈ D and f (x) ∈ Bc . By def-
inition of the complement of B in R, x ∈ D and f (x) ∈ / B. Again, by the definition of the inverse
image, x ∈ D and x ∈ / f −1 (B). By the definition of the complement of f −1 (B) in D, x ∈ D and
x ∈ (f −1 (B))c , so f −1 (Bc ) ⊂ (f −1 (B))c . Complete the proof.
A1.18 Let Ωi = {x | ai · x + bi ≥ 0}. Use part (b) of Exercise A1.17.
A1.21 First, model your proof after the one for part 3. Then consider ∩∞
i=1 Ai , where Ai = (−1/i, 1/i).
HINTS AND ANSWERS 639
CHAPTER A2
A2.1 For (g), f (x) = −exp(x2 ) < 0.
A2.2 For (a), f1 = 2 − 2x1 and f2 = −2x2 . For (e), f1 = 3x12 − 6x2 and f2 = −6x1 + 3x22 .
A2.3 Chain rule.
A2.5 For (a),
−2 0
H(x) = .
0 −2
A2.11 Use the definition of an increasing function and the definitions of local optima.
A2.19 Strict quasiconcavity implies quasiconcavity.
A2.24 For (a), x∗ = (1, 0) is a maximum. For (b), x∗ = (0, 1) is a minimum.
√ √ √ √
A2.25 (a) (1,
1) and (−1, −1); f (−1, −1) = 2; (b) (− 1/2, 1/2) and ( 1/2, − 1/2);
f (1, 1) =
(c) ( a2 /3, 2b2 /3) and ( a2 /3, − 2b2 /3); (d) ((1/2)1/4 , (1/2)1/4 ); (e) (1/6, 2/6, 3/6).
A2.37 Use the fact that sequences in A are bounded and therefore have convergent subsequences.
RE F E R E N C E S
Afriat, S. (1967). ‘The Construction of a Utility Function from Expenditure Data’, International Economic Review, 8:
67–77.
Akerlof, G. (1970). ‘The Market for Lemons: Quality Uncertainty and the Market Mechanism’, Quarterly Journal of
Economics, 89: 488–500.
Antonelli, G. (1886). ‘Sulla Teoria Matematica della Economia Politica’: Pisa. Translated as ‘On the Mathematical Theory
of Political Economy’, in J. L. Chipman et al. (eds.), Preferences Utility and Demand: A Minnesota Symposium. New
York: Harcourt Brace Jovanovich, 333–364.
Arrow, K. (1951). Social Choice and Individual Values. New York: John Wiley.
——— (1970). ‘The Theory of Risk Aversion’, in K. Arrow (ed.), Essays in the Theory of Risk Bearing. Chicago: Markham,
90–109.
——— (1973). ‘Some Ordinalist Utilitarian Notes on Rawls’ Theory of Justice’, Journal of Philosophy, 70: 245–263.
——— (1979). ‘The Property Rights Doctrine and Demand Revelation under Incomplete Information’, in M. Boskin (ed.),
Economics and Human Welfare, New York: Academic Press, 23–39.
Arrow, K., and G. Debreu (1954). ‘Existence of Equilibrium for a Competitive Economy’, Econometrica, 22: 265–290.
Arrow, K., and A. Enthoven (1961). ‘Quasi-Concave Programming’, Econometrica, 29: 779–800.
Atkinson A. (1970). ‘On the Measurement of Inequality’, Journal of Economic Theory, 2: 244–263.
Aumann, R. J. (1964). ‘Markets with a Continuum of Traders’, Econometrica, 32: 39–50.
Barten A., and V. Böhm (1982). ‘Consumer Theory’, in K. Arrow and M. Intrilligator (eds.), Handbook of Mathematical
Economics. Amsterdam: North Holland, 2: 382–429.
Bertrand, J. (1883). ‘Review of Théorie Mathématique de la Richesse Sociale and Recherches sur les Principes
Mathématiques de la Théorie des Richesses’, Journal des Savants, 499–508.
Black, D. (1948). ‘On the Rationale of Group Decision-Making’, Journal of Political Economy, 56: 23–34.
Blackorby, C., and D. Donaldson (1978). ‘Measures of Relative Equality and Their Meaning in Terms of Social Welfare’,
Journal of Economic Theory 18: 59–80.
Blackorby, C., D. Donaldson, and J. Weymark (1984). ‘Social Choice with Interpersonal Utility Comparisons: A
Diagrammatic Introduction’, International Economic Review, 25: 327–356.
Cho, I. K., and D. M. Kreps (1987). ‘Signaling Games and Stable Equilibria’, Quarterly Journal of Economics, 102: 179–
221.
Clarke, E. (1971). ‘Multipart Pricing of Public Goods’, Public Choice, 2: 19–33.
Cornwall, R. (1984). Introduction to the Use of General Equilibrium Analysis. Amsterdam: North Holland.
642 REFERENCES
Cournot, A. (1838). Recherches sur les principes mathématiques de la théorie des richesses. Paris, Hachette. English trans.
N. Bacon (1960). Researches into the Mathematical Principles of the Theory of Wealth. New York: Kelley.
Cramton, P., R. Gibbons, and P. Klemperer (1987). ‘Dissolving a Partnership Efficiently’, Econometrica, 55: 615–632.
d’Aspremont, C., and L. Gevers (1977). ‘Equity and the Informational Basis of Collective Choice’, Review of Economic
Studies, 44: 199–209.
d’Aspremont, C. and L. A. Gérard-Varet (1979). ‘Incentives and Incomplete Information’, Journal of Public Economics,
11: 25–45.
Debreu, G. (1954). ‘Representation of a Preference Ordering by a Numerical Function’, in R. M. Thrall et al. (eds.), Decision
Processes. New York: John Wiley, 159–165.
——— (1959). Theory of Value. New York: John Wiley.
——— (1972). ‘Smooth Preferences’, Econometrica, 40: 603–615.
Debreu, G., and H. Scarf (1963). ‘A Limit Theorem on the Core of an Economy’, International Economic Review, 4:
235–246.
Diewert, W. E. (1974). ‘Applications of Duality Theory’, in M. D. Intrilligator and D. A. Kendrick (eds.), Frontiers of
Quantitative Economics. Amsterdam: North Holland, 2: 106–199.
Edgeworth, F. Y. (1881). Mathematical Psychics. London: Paul Kegan.
Eisenberg, B. (1961). ‘Aggregation of Utility Functions’, Management Science, 7: 337–350.
Geanakoplos, J. (1996). ‘Three Brief Proofs of Arrow’s Impossibility Theorem’, mimeo, Cowles Foundation, Yale
University.
Gibbard, A. (1973). ‘Manipulation of Voting Schemes: A General result’, Econometrica, 41, 587–601.
Goldman, S. M., and H. Uzawa (1964). ‘A Note on Separability in Demand Analysis’, Econometrica, 32: 387–398.
Green, J., and J-J. Laffont (1977). ‘Characterization of Satisfactory Mechanisms for the Revelation of Preferences for Public
Goods’, Econometrica, 45: 727–738.
Grossman, S. J., and O. D. Hart (1983). ‘An Analysis of the Principal-Agent Problem’, Econometrica, 51: 7–45.
Groves, T. (1973). ‘Incentives in Teams’, Econometrica, 41: 617–631.
Hammond, P. J. (1976). ‘Equity, Arrow’s Conditions, and Rawls’ Difference Principle’, Econometrica, 44: 793–804.
Hardy, G., J. Littlewood, and G. Pólya (1934). Inequalities. Cambridge: Oxford University Press.
Harsanyi, J. (1953). ‘Cardinal Utility in Welfare Economics and in the Theory of Risk-Taking’, Journal of Political
Economy, 61: 434–435.
——— (1955). ‘Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility’, Journal of Political
Economy, 63: 309–321.
——— (1967–1968). ‘Games with Incomplete Information Played by “Bayesian” Players, Parts I, II, and III’, Management
Science, 14: 159–182, 320–334, and 486–502.
——— (1975). ‘Can the Maximin Principle Serve as a Basis for Morality? A Critique of John Rawls’s Theory’, American
Political Science Review, 69: 594–606.
Hicks, J. (1939). Value and Capital. Oxford: Clarendon Press.
——— (1956). A Revision of Demand Theory. Oxford: Clarendon Press.
Hildenbrand, W. (1974). Core and Equilibria of a Large Economy. Princeton: Princeton University Press.
Hohn, F. (1973). Elementary Matrix Algebra, 3rd ed. New York: Macmillan.
Holmstrom, B. (1979a). ‘Moral Hazard and Observability’, Bell Journal of Economics, 10: 74–91.
——— (1979b). ‘Groves’ Scheme on Restricted Domains’, Econometrica, 47: 1137–1144.
——— (1982). ‘Moral Hazard in Teams’, Bell Journal of Economics, 13: 324–340.
Houthakker, H. (1950). ‘Revealed Preference and the Utility Function’, Economica, 17(66): 159–174.
Hurwicz, L., and H. Uzawa (1971). ‘On the Integrability of Demand Functions’, in J. S. Chipman et al. (eds.), Utility
Preferences and Demand; A Minnesota Symposium. New York: Harcourt Brace Jovanovich, pp. 114–148.
Kakutani, S. (1941). ‘A Generalization of Brouwer’s Fixed-Point Theorem’, Duke Mathematical Journal, 8: 451–459.
REFERENCES 643
Knoblauch, V. (1992). ‘A Tight Upper Bound on the Money Metric Utility Function’, American Economic Review, 82(3):
660–663.
Kohlberg, E., and P. J. Reny (1997). ‘Independence on Relative Probability Spaces and Consistent Assessments in Game
Trees’, Journal of Economic Theory, 75: 280–313.
Kreps, D. M., and B. D. Wilson (1982). ‘Sequential Equilibria’, Econometrica, 50: 863–894.
Krishna, V. and M. Perry (1998). ‘Efficient Mechanism Design’, mimeo, http://economics.huji.ac.il/facultye/perry/cv.html.
Kuhn, H. (1953). ‘Extensive Games and the Problem of Information’, in H. W. Kuhn and A. W. Tucker (eds.), Contributions
to the Theory of Games, Volume II (Annals of Mathematics Studies 28). Princeton: Princeton University Press, 2:
193–216.
Kuhn, H., and A. W. Tucker (1951). ‘Nonlinear Programming’, in J. Neyman (ed.), Proceedings of the Second Berkeley
Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 481–492.
Luenberger, D. G. (1973). Introduction to Linear and Nonlinear Programming. New York: John Wiley.
McKenzie, L. (1954). ‘On Equilibrium in Graham’s Model of World Trade and Other Competitive Systems’, Econometrica
22: 147–161.
Muller, E. and M. A. Satterthwaite (1977). ‘The Equivalence of Strong Positive Association and Strategy-Proofness’,
Journal of Economic Theory, 14, 412–418.
Murata, Y. (1977). Mathematics for Stability and Optimization of Economic Systems. New York: Academic Press.
Myerson, R. (1981). ‘Optimal Auction Design’, Mathematics of Operations Research, 6: 58–73.
Myerson, R. and M. A. Satterthwaite (1983). ‘Efficient Mechanisms for Bilateral Trading’, Journal of Economic Theory,
29: 265–281.
Nash, J. (1951). ‘Non-cooperative Games’, Annals of Mathematics, 54: 286–295.
Nikaido, H. (1968). Convex Structures and Economic Theory. New York: Academic Press.
Osborne, M. J., and A. Rubinstein (1994). A Course in Game Theory. Cambridge, MA: The MIT Press.
Pareto, V. (1896). Cours d’économie politique. Lausanne: Rouge.
Pratt, J. (1964). ‘Risk Aversion in the Small and in the Large’, Econometrica, 32: 122–136.
Rawls, J. (1971). A Theory of Justice. Cambridge, MA: Harvard University Press.
Reny, P. J. (1992). ‘Rationality In Extensive Form Games’, Journal of Economic Perspectives, 6: 103–118.
——– (2001). ‘Arrow’s Theorem and the Gibbard-Satterthwaite Theorem: A Unified Approach’, Economics Letters, 70:
99–105.
Richter, M. (1966). ‘Revealed Preference Theory’, Econometrica, 34(3): 635–645.
Roberts, K. W. S. (1980). ‘Possibility Theorems with Interpersonally Comparable Welfare Levels’, Review of Economic
Studies, 47: 409–420.
Rothschild, M., and J. E. Stiglitz (1976). ‘Equilibrium in Competitive Insurance Markets: An Essay in the Economics of
Imperfect Information’, Quarterly Journal of Economics, 80: 629–649.
Roy, R. (1942). De l’utilité: contribution à la théorie des choix. Paris: Hermann.
Royden, H. (1963). Real Analysis. New York: Macmillan.
Samuelson, P. A. (1947). Foundations of Economic Analysis. Cambridge, MA, Harvard University Press.
Satterthwaite, M. A. (1975). ‘Strategy-Proofness and Arrow’s Conditions: Existence and Correspondence Theorems for
Voting Procedures and Social Welfare Functions’, Journal of Economic Theory, 10: 187–217.
Selten, R. (1965). ‘Spieltheoretische Behandlung eines Oligopolmodells mit Nachfrageträgheit’, Zeitschrift für die gesamte
Staatswissenschaft, 121: 301–324.
——— (1975). ‘Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games’, International
Journal of Game Theory, 4: 25–55.
Sen, A. (1970a). Collective Choice and Social Welfare. Amsterdam: North Holland.
——— (1970b). ‘The Impossibility of a Paretian Liberal’, Journal of Political Economy, 78: 152–157.
——— (1984). ‘Social Choice Theory’, in K. Arrow and M. Intrilligator (eds.), Handbook of Mathematical Economics.
Amsterdam: North Holland, 3, 1073–1181.
644 REFERENCES
Shafer, W., and H. Sonnenschein (1982). ‘Market Demand and Excess Demand Functions’, in K. Arrow and M. Intrilligator
(eds.), Handbook of Mathematical Economics. Amsterdam: North Holland, 2, 671–693.
Shephard, R. W. (1970). Theory of Cost and Production Functions. Princeton: Princeton University Press.
Slutsky, E. (1915). ‘Sulla Teoria del Bilancio del Consumatore’, Giornale degli Economisti, 51, 1, 26, English translation
‘On the Theory of the Budget of the Consumer’, in G. J. Stigler and K. E. Boulding (eds.), (1953) Readings in Price
Theory. London: Allen and Unwin, 27–56.
Smith, A. (1776). The Wealth of Nations, Cannan Ed. (1976) Chicago: University of Chicago Press.
Spence, A. M. (1973). ‘Job Market Signaling’, Quarterly Journal of Economics, 87: 355–374.
Tarski, A. (1955). ‘A Lattice-Theoretical Fixpoint Theorem and its Applications’, Pacific Journal of Mathematics, 5: 285–
309.
Varian, H. (1982). ‘The Nonparametric Approach to Demand Analysis’, Econometrica, 50(4): 945–973.
Vickrey, W. (1961). ‘Counterspeculation, Auctions and Competitive Sealed Tenders’, Journal of Finance, 16: 8-37.
Von Neumann, J., and O. Morgenstern (1944). Theory of Games and Economic Behavior. Princeton: Princeton University
Press.
Wald, A. (1936). ‘Uber einige Gleiehungssysteme der mathematischen Okonomie’, Zeitschrift für Nationalökonomic, 7:
637–670. English translation ‘Some Systems of Equations of Mathematical Economics’, Econometrica, 19: 368–403.
Walras, L. (1874). Eléments d’économie politique pure. Lausanne: L. Corbaz. English trans. William Jaffé (1954) Elements
of Pure Economics. London: Allen and Unwin.
Williams, S. (1999). ‘A Characterization of Efficient Bayesian Incentive Compatible Mechanisms’, Economic Theory, 14:
155–180.
Willig, R. D. (1976). ‘Consumer’s Surplus without Apology’, American Economic Review, 66: 589–597.
Wilson, C. (1977). ‘A Model of Insurance Markets with Incomplete Information’, Journal of Economic Theory, 16:
167–207.
IN D E X
Note: Figures are indicated by italic page numbers in the index, footnotes by suffix ‘n’
constant returns to scale 133, 169, of excess demand function 203 with homothetic production 140
232 and inverse images 518–19 production function derived from
constrained optimisation 577–601 of utility function 29 144, 160
equality constraints 577–9 continuum economies 251 properties 138
geometric interpretation 584–7 contract curve 197 short-run (restricted) 141
inequality constraints 591–5 contradiction translog form 158
Kuhn–Tucker conditions 595–601 by proof by 496 cost minimisation
Lagrange’s method 579–84 contrapositive form and profit maximisation 135
second-order conditions 588–91 of logical statement 496 cost-minimisation problem
constraint 578 of proof 496 solution to 136, 137
constraint constant 616 convergent sequences 210, 519–20 cost-of-living index 70
constraint-continuity 602 convex combination 500–2 counterexample
constraint function 578 convex functions 542–3 proof by 497
formula for slope 586 and concave functions 543 Cournot aggregation 61, 62
constraint qualifications 601, 616 examples 543 Cournot–Nash equilibrium 174, 188
constraint set 578 first derivative 553 Cournot oligopoly 174–5, 184,
constructive proof 496 and Hessian matrix 561 187–8, 189
consumer surplus 183, 186 linear segments in graph 543 cream skimming 406, 408, 411, 412
and compensating variation 183 locally convex 571 cross-price elasticity 60
consumer theory 3–63 second derivative 553
implications for observable and second-order own partial De Morgan’s laws 512, 546
behaviour 91 derivatives 561 dead weight loss 188
integrability in 85–91 see also quasiconvex functions decentralised competitive market
revealed preference 91–7 convex sets 207, 499–503 mechanism 216
under uncertainty 97–118 definition 500 decreasing absolute risk aversion
consumer welfare examples 502, 535 (DARA) 115–17
price and 179–83 and graph of concave function decreasing functions 530
in product selection 192 535–7 decreasing returns to scale 133
consumers and graph of convex function demand
ability to rank alternatives 6 543–4 compensated see Hicksian demand
expenditure-minimisation problem intersection of 502–3 demand elasticity 59–63
35–7, 39, 43, 44, 47 separation of 607–8 differentiable 27
preferences see preferences strongly convex set 220, 221 excess 204–5
in production economy convexity inverse 83–5, 119, 168
utility-maximisation problem 31, axiom on preference relation law of 4, 5
42–3, 44, 47 11–12, 78 modern version 55–6
see also demand core market demand 165–6, 204
consumption bundle 3–4 equal treatment theorem 242–4 with profit shares 224
consumption plan 3 and equilibria 239–51 properties in consumer system 62
consumption set 3, 19, 227 of exchange economy 201 derivatives 551
properties 4 limit theorem 247–9 see also directional derivatives;
contingent plans 238 and WEAs 215–16, 239, 251 partial derivatives
continuity 515–20 cost function differentiable demand 27
axiom of preference over gambles CES form 159 differentiable functions 551, 552
100 Cobb–Douglas form 138, 157, 159 differentiation rules 552
axiom on preference relation 8, 12 definition 136 direct mechanisms 459–60
Cauchy continuity 517 and expenditure function 138 direct proof 496
648 INDEX
with subgame perfect equilibrium extensive form games 325–64 Heine–Borel (compact) sets 515
338–47 imperfect information games 330, Heine–Borel theorem 514n
‘take-away’ game 325, 327–8, 333 337–47 Hessian matrix 557, 558
used-car example 325, 328 mixed strategies 314, 343 applications 58, 89, 150
perfect information games 330, bordered (for Lagrangian
feasible allocation 199, 233 333–4 functions) 589, 590
feasible labelling 525–6 strategic decision making 305–7 principal minors 572–3
feasible set 4 strategic form games 307–25 sufficient conditions for
firm see also extensive form games; definiteness 573–4
objectives 125 strategic form games Hicksian decomposition of price
theory of 125–61 game trees 328–30 change 51–3
see also producer theory examples 329, 334, 335, 338, 339, Hicksian demand 35–6, 41
first-price, sealed-bid auction 370 340, 341, 343, 344, 345, 348, and compensating variation 181
bidding behaviour in 429–32 349, 350, 351, 352, 356, 357, curves 55, 182
First Separation Theorem 608–9 359 and Marshallian demand 41, 44–8,
First Welfare Theorem general equilibrium 53, 182–3
exchange economy 217 in competitive market systems properties 55–8, 62
production economy 233–4 201–19 Hicks’ Third Law 70
first-order necessary conditions contingent-commodities homogeneous functions 131, 561–5
(FONC) for optima 567 interpretation 237–9, 257–8 definition 561–2
for local interior optima of and excess demand 204 excess demand function 204
real-valued functions 568–9 in exchange economy 196–201 partial derivatives 562–3
first-order partial derivatives 554–6 existence 196 production function 131–2, 155
fixed cost 141 in production economy 220–36 homothetic functions 612
fixed point 523 existence 225–6 production function 140, 155
fixed-point theorems 523 in Robinson Crusoe economy real-valued function 612
applications 208, 232, 384n, 421 226–31 social welfare function 287
see also Brouwer’s fixed-point generalised axiom of revealed Hotelling’s lemma
theorem preference (GARP) 96–7, applications 148, 149, 150
Frobenius’ theorem 89 120–1 hyperplane 549
functions 504–5 Gibbard–Satterthwaite theorem hypotenuse 506
inverse 505 291–6
real-valued 521, 529–45 contradiction of 465 image of a mapping 505
‘Fundamental Equation of Demand Giffen’s paradox 56 imperfect competition 170–9
Theory’ 53 global optima imperfect information games 330,
see also Slutsky equation global maximum 566 337–47
global minimum 566 incentive-compatible direct
gambles unique mechanisms 438–40, 460
best–worst gamble 105, 106 and strict concavity and characterisation of 441–3
certainty equivalent of 112, 113 convexity 576–7 definition 439
compound gamble 99 sufficient condition for 577 expected surplus 473
preference relation over 98 Gorman polar form 69 individual rationality in 445–6,
simple gamble 98 gradient vector 556 470–4
see also game theory greatest lower bound (g.l.b.) 513 income see real income
game of incomplete information income effect 51
321–2 half-space 74 in Hicksian decomposition 51–3,
game theory 305–77 Hammond equity 282 183
650 INDEX
income effect (continued) information economics 379–425 inverse demand function 83–5,
in Slutsky equation 53 adverse selection 380–413 119, 168
income elasticity 60, 191 asymmetric information 379, inverse function 505
income share 60 382–5 inverse images 16, 518
incomplete information game 321–2 market failure 379 inverse mapping 518
and Bayesian–Nash equilibrium moral hazard 413–20 isocost line 137, 141, 142
323, 324 principal–agent problem 413–20 isoexpenditure curves 34, 47
strategic form game(s) associated screening 404–13 iso-profit line 230
with 322, 323–4 signalling 385–404 isoquants 127, 129, 136, 141, 142
increasing function 529 symmetric information 380–2
increasing returns to scale 133 information set 328, 330 Jensen’s inequality 115
independence axiom input demand function justice 288–90
on preferences over gambles 101n, properties 149–50 see also social welfare
121 insurance
independence (in game theory) 352 adverse selection 380–5 ‘kicker–goalie duel’ (in soccer) 367
independence of irrelevant moral hazard in 413–20 Kuhn–Tucker conditions 23, 25,
alternatives 271 principal–agent problem in 413–20 595–601
independent private values and risk aversion 117–18 Kuhn–Tucker theorem 598–600
in auctions 431, 432, 434 screening in 404–13
independent private values model signalling in 385–404 Lagrange’s method (in constrained
428–37, 456 insurance game 370, 371 optimisation) 579–84
index set 498 insurance screening game 404–5, Lagrange’s theorem 584
indifference curves 20, 28, 47, 79, 423 applications 30, 38, 136
197 analysing 406–8 Lagrangian conditions
insurance customers 388–9, 391, cream skimming in 406, 408, first-order 25, 137, 579
394, 396, 398, 399, 400, 402, 411, 412 second-order 588–91
403 separating equilibria in 409–12 Lagrangian functions 579
indifference probabilities 106 insurance signalling game 386–7, bordered Hessian matrix for
indifference relation 422–3 589, 590
definition 6 analysing 388–91 examples 30, 84, 415, 418
see also preference relation pooling equilibria in 396–9 ‘Law of Demand’ 4, 5
indirect utility function 28–33 pure strategy sequential modern version 55–6
CES form 43–4 equilibrium 387–8 least upper bound (l.u.b.) 513
and direct utility 81–4 separating equilibria in 392–5 Lebesgue measures 19n
and expenditure function 41–8 integrability Leontief production function
properties 29–32 for cost functions 144–5 131, 156
individual rationality 445–6, 470–4 for demand functions 85–91 level sets 530–2
individually rational integrability problem 87 formula for slope 585
Vickrey–Clarke–Groves integrability theorem 87–90 for quasiconcave functions
(IR-VCG) mechanism 473–4 interior maxima 566 538–9
expected surplus interior minima 566 relative to a point 531–2
necessity 478–83 interior point of a set 511 lexicographic dictatorship 296, 297
sufficiency 476–8 intersection of sets 498 lexicographic preferences 64
inferior good 56 distributive law for 546 local–global theorem 575–6
inferior sets 532 intertemporal budget constraint 123 local interior optima
and quasiconvex functions 544, intertemporal utility function 123 first-order necessary condition for
545 intuitive criterion 401–4 568–9
INDEX 651
sequences (continued) separating equilibria in 392–5, 396 strategic form of extensive form
sets and continuous functions simple gambles 99, 101 game 333
520 simplex see unit simplex strategic form games 307–25
sequential equilibrium 347–64 single-crossing property 388–9 definition 307–8
definition 358 single-variable concavity 558 dominant strategies 308–11
existence 363 Slutsky compensation incomplete information and
in insurance signalling game in demand 67 319–25
387–99 in income 93 Nash equilibrium 311–18
sequential rationality 355–7, 360 Slutsky equation 53–5, 62, 182 strategy-proofness 291, 292
set difference 497 Slutsky matrix 58–9 strict concavity 537
set theory 497–505 symmetry 86, 87, 89, 95 strict convexity
sets social choice function 290 axiom on preference relation 11
basic concepts 497–9 dictatorial 290–1, 296 strict monotonicity
bounded 512–14 monotonic 292–3 axiom on preference relation 10
closed 510–12, 518 Pareto-efficient 292, 293 strict preference relation
compact 514–15 strategy-proof 291 definition 6
complement of 497, 511 social indifference curves strict welfarism 281
convex 499–503 dictatorship 279 strictly concave functions 538
empty 497 radially parallel 286–7 and Hessian matrix 561
equal 497 Rawlsian 283, 284 strict quasiconcavity 541–2
feasible 4 utilitarian 284–5 unique global maximum 576–7
inferior 532 social preference relation 269, 275 strictly convex functions 542–3
intersection of 498, 546 social state(s) 267 and Hessian matrix 561
level 530–2 ranking of 279, 282, 284, 289 unique global minimum 576–7
meaning of term 497 social utility function 274, 275 strictly decreasing functions 530
open 508–10, 518 social welfare function 270 strictly dominant strategies 308, 309
superior 532–3 anonymity 282 strictly dominated strategies 309–10
union of 498, 546 Arrow’s requirements 271–2 strictly increasing functions 19, 214,
Shephard’s lemma CES form 287 277, 284, 529
applications 37, 66, 88, 138, 159 ethical assumptions 281, 282 strictly inferior set 532
short-run flexible forms 285–7 strictly quasiconcave functions 19,
average and marginal costs in 159 and generalised utilitarian ordering 226, 541
cost function 141 285 strictly superior set 532
equilibrium in monopolistic and Hammond equity 282 strikes 483
competition 178–9 homothetic 287 strong axiom of revealed preference
equilibrium in perfect competition maximin form 289 (SARP) 96, 120
166, 167 Rawlsian form 282–3 strong-positive association 292n
market supply function 166 strong separability 287 strong separability 287
output supply function 152 under strict welfarism 281 strongly convex set 220, 221
period utilitarian form 284–5 strongly decreasing functions 530
definition 132, 166 utility-difference invariant 281 strongly increasing functions 226,
profit function 152 utility-level invariant 280, 281, 283 529
for Cobb–Douglas technology utility-percentage invariant 286 strongly separable production
152–3 Stackelberg duopoly 189 function 129
signalling 385–404 Stackelberg warfare 189 subgame 340
insurance signalling game 386–7 Stag Hunt game 366–7 subgame perfect equilibrium
pooling equilibria in 396–9 Stone–Geary utility function 69 and backward induction 342
INDEX 655