Maximum Likelihood Estimator (MLE)
Suppose 1 2, ,..., nX X X are random samples with the joint probability density
function 1 2, ,..., / 1 2( , ,..., )nX X X nf x x x which depends on an unknown nonrandom
parameter  .
/ 1 2( , , ..., / )nf x x x X is called the likelihood function. If 1 2, ,..., nX X X are discrete,
then the likelihood function will be a joint probability mass function. We represent
the concerned random variables and their values in vector notation by
1 2[ ... ]nX X X X and 1 2[ ... ]nx x x x respectively. Note that
/( / ) ln ( / )L f   Xx x is the log likelihood function. As a function of the random
variables, the likelihood and log-likelihood functions are random variables.
The maximum likelihood estimator ˆ
MLE is such an estimator that
/ 1 2 / 1 2
ˆ( , ,..., / ) ( , ,..., / ),n MLE nf x x x f x x x    X X
If the likelihood function is differentiable with respect to  , then ˆ
MLE is given by
MLE
ˆ/ θ
( / ) 0f  




X x
or 0
θ
)|L(
MLEθˆ 

 x
Thus the MLE is given by the solution of the likelihood equation given above.
If we have k unknown parameters given by
1
2
k



 
 
 
 
 
 
θ
Then MLE is given by a set of conditions.
1 1 2 2
ˆ ˆ ˆ1 2
L( / ) L( / ) L( / )
.... 0
θ θ θ
MLE MLE k kMLE
k     
  
  
   
        
x x x
Sinceln( ( / ))L x is a monotonic function of the argument, it is convenient to express the MLE
conditions in terms of the log-likelihood function. The condition is now given by
1 1 2 2
ˆ ˆ ˆ1 2
(L( / )) (L( / )) (L( / ))
.... 0
θ θ θ
MLE MLE k kMLE
k
Ln Ln Ln
     
  
  
   
    
    
x x x
Example 10:
Let 1 2, ,..., nX X X are independent identically distributed sequence of 2
( , )N   distributed
random variables. Find MLE for  and 2
 .
2
2
1 2/ ,
( , ,..., / , )nf x x x 
 X
=
2
1
2
1
1
2
ix
n
i
e



 
  
 


2
2 2
/ ,
( / , ) ln ( / , )L f  
    X
x x
=
2
1
1
ln 2 ln -
2
n
i
i
x
n n

 

 
   
 

ˆ
1
0
ˆ( ) 0
MLE
n
i MLE
i
L
x





 
  
2
2
ˆ
2
2 4
0
ˆ( )
0
ˆ ˆ
MLE
MLEi
MLE MLE
L
xn


 
 
 

   

Solving we get
 
1
22
1
1
ˆ
1
ˆ ˆ
n
MLE i
i
n
MLE i MLE
i
x and
n
x
n

 



 


Example 11:
Let 1 2, ,..., nX X X are independent identically distributed random samples with
1/
1
( ) -
2
x
Xf x e x


 
    
Show that 1 2( , ,..., )nX X Xmedian is the MLE for .
1
1 2, ,...., / 1 2
1
( , ,...., )
2
n
i
i
n
x
X X X n n
f x x x e



 

/
1
( / ) ln ( )
ln2
n
i
i
L f
n x




   
Xx x
1
n
i
i
x 

 is minimized by 21, ,( ..., )nxmedian x x
21, ,ˆ ( ..., )MLE nxmedian x x 
Properties of MLE
(1) MLE may be biased or unbiased. In Example 4, ˆMLE is unbiased where as 2
ˆMLE is a
biased estimator.
(2) If an efficient estimator exists, the MLE estimator is the efficient estimator.
Supposes an efficient estimator θˆ exists . Then
ˆ( / ) ( )L x c  


 

at ˆ ,MLE 
ˆ
( / )
0
ˆ ˆ( ) 0
ˆ ˆθ
MLE
MLE
MLE
L x
c



 




  
 
(3) The MLE is asymptotically unbiased and efficient. Thus for large n, the MLE is
approximately efficient.
(4) Invariance Properties of MLE
It is a remarkable property of the MLE and not shaerd by other estimators. If ˆ
MLE is the
MLE of  and ( )h  is a function, then ˆ( )MLEh  is the MLE of ( )h  .
(5) ˆ
MLE is a consistent estimator of 
Proof:
Suppose 0 is the true value of 
Given the iid observations 1 2, 3, ,......, nX X X X the sample average of the log-likelihood
function is given by
   /
1
1
ln
n
n X i
i
L f X
n


 
ˆ
MLE maximizes  nL  .
    
   
0
0
/
/ /
Let ln
= ln i
X i
X i X i i
L E f X
f x f x dx
 
 





Note that the expectation is carried out with respect to true value 0
According to WLLN
    (1)p
nL L  
Now we show thjat
   0nL L 
Note that    0nL L 
     
 
 
 
 
 
 
 
   
0 0 0
0
0
0
0
0
0
/ /
/
/
/
/
/
/
/
0
ln ln
ln
1 log t t-1
= 1
1 1 0
0
X X
X
X
X
X
X
X
X
n
E f X E f X
f X
E
f X
f X
E
f X
f X
f x dx
f X
L L
   









 


 
 
  
 
 
 
   
 
 
 
 
 
 
  
  

In other words  L  is maximum at 0 
Now from (1)    p
nL L 
0
p
MLE  
Bayesean Estimators
We may have some prior information about  in a sense that some values of  are more likely
(a priori information). We can represent this prior information in the form of a prior density
function.
In the following we omit the suffix in density functions just for notational simplicity.
The likelihood function will now be the conditional density )./( xf
, /( , ) ( ) ( )Xf f f x    Xx
Also we have the Bayes rule
/
/
/
/
( ) ( / )
( / )
( )
( ) ( / )
( ) ( / )

 

 
  
 

 
 



D
f f
f
f
f f
f f d
X
X
X
X
X
x
x
x
x
x
where / ( )f  X is the a posteriori density function
Example: Let X be Gaussian random samples with unknown mean  and variance 1. Given
~ (0,1) N . Find the a posteriori PDF / ( / )f xX for a single observation x.
Solution: We have
21
2
( )
2




 
e
f ,
21
( )
2
/ ( / )
2



 
 
x
X
e
f x
Parameter 
with density
)(f
/
Obervation
( / )f  X
x
x
2 2
2
2
2 2
2 2
2
2
2
2
2
2 2
2
2
1 1
( )
2 2
1
2
1
( 2 )
2 4 2 4
1
( )1 1 22
2( 2 ) 2
1
2( 2 )
1 1
( )
2 2
/ 1
2( 2 )
(
( )
2 2
2 2
2 2
2 2 1
2
2
2 2
2 2( / )
2 2
 
 



 


 

 

 




 

  

   

    

 
 


  


 
 







 






x
X
x
x
x x
x x
x
x
x
x
X
x
x
e e
f x d
e e
d
e e
d
e e
d
e
e e
f x
e
e
2
)
2
1
2
2
 
The parameter  is a random variable and the estimator ˆ( )X is another random variable.
Estimation error .θˆ  
We associate a cost function or a loss function ),θˆ( C with every estimator .θˆ It represents
the positive penalty with each wrong estimation.
Thus ),θˆ( C is a non-negative function.
The three most popular cost functions are:
Quadratic cost function 2
)θˆ( 
Absolute cost function θˆ
ˆ( )   

2

ˆ( )   
Hit or miss cost function (also called uniform cost function)
minimising means minimising on an average)
Bayesean Risk function or average cost
,
ˆ ˆ( , )C EC θ,θ) C(θ θ) f ( ,θ d dθ
 

 
    X x x
The estimator seeks to minimize the Bayescan Risk.
Case I. Quadratic Cost Function and the Minimum Mean-square Error Estimator
2
θ)-θˆ()θˆ,(θ C
Estimation problem is
Minimize 2
,
ˆ( )θ θ) f ( ,θ d dθ
 

 
  X x x
with respect to θˆ .
This is equivalent to minimizing
 
 










df)dθf(θ)θθ
ddθ)ff(θ)θθ
xxx
xxx
)()|ˆ((
)(|ˆ(
2
2
Since )(xf is always +ve, the above integral will be minimum if the inner integral is minimum.
This results in the problem:
Minimize 


 )dθf(θ)θθ )|ˆ( 2
x
with respect to .ˆ
2
/
ˆ ) 0
ˆθ




  
  (θ θ) f (θ dθX
/
ˆ2 ) 0



    (θ θ) f (θ dθX
1
-
2

2
 
( )C 
/ /
ˆ ) )
 
 
 
  θ f (θ dθ θ f (θ dθX X
/
ˆ )



  θ θ f (θ dθX
θˆ is the conditional mean or mean of the a posteriori density. Since we are minimizing
quadratic cost it is also called minimum mean square error estimator (MMSE).
Salient Points
 Given a priori density function )f (θ and the conditional PDF / ( )fX x ,
 We have to determine a posteriori density / )f (θ X . This is determined form the Bayes
rule:
/
/
/
/
( ) ( )
( )
( )
( ) ( )
( ) ( )
D
f f
f
f
f f
f f d




 
 

 
 




X
X
X
X
X
x
x
x
x
Example Let 1 2, ,...., nX X X be samples of ~ ( ,1)X N  and unknown mean ~ (0,1)N . Find
ˆ
MMSE .
We are given
2
2
2
1
1
2
( )
2
/
1
( )
2
1
( )
2
1
/ )
2
1
( 2 )
















 




i
n
i
i
xn
i
x
n
f e
f ( e
e
X x
Also, /
/
( ) ( )
/ )
( )
 
 
X
f f
f (θ
f
X
X
x
x
x
where
 
 
2
2
1
2 2 2
1
/
( )1
2 2
1
2 2( 1)
( ) ( ) ( / )
2
2
n
i
i
n
i
i
X
x
n
x n x
n
n
f f f d
e
d
e


  






 


 



 








Xx x
 
 
2
2
1
2 2 2
1
2
( )1
2 2
1
/
2 2( 1)
1
2 1
2
/ )
2
1
2
1
ˆ ( / ( ))
1
n
i
i
n
i
i
x
n
x n x
n
n
n n
θ x
n
MMSE
e
f (θ
e
e
n
E
n
x
n









 


 

  
  
 

 




  


X x
X = x

Estimationtheory2

  • 1.
    Maximum Likelihood Estimator(MLE) Suppose 1 2, ,..., nX X X are random samples with the joint probability density function 1 2, ,..., / 1 2( , ,..., )nX X X nf x x x which depends on an unknown nonrandom parameter  . / 1 2( , , ..., / )nf x x x X is called the likelihood function. If 1 2, ,..., nX X X are discrete, then the likelihood function will be a joint probability mass function. We represent the concerned random variables and their values in vector notation by 1 2[ ... ]nX X X X and 1 2[ ... ]nx x x x respectively. Note that /( / ) ln ( / )L f   Xx x is the log likelihood function. As a function of the random variables, the likelihood and log-likelihood functions are random variables. The maximum likelihood estimator ˆ MLE is such an estimator that / 1 2 / 1 2 ˆ( , ,..., / ) ( , ,..., / ),n MLE nf x x x f x x x    X X If the likelihood function is differentiable with respect to  , then ˆ MLE is given by MLE ˆ/ θ ( / ) 0f       X x or 0 θ )|L( MLEθˆ    x Thus the MLE is given by the solution of the likelihood equation given above. If we have k unknown parameters given by 1 2 k                θ Then MLE is given by a set of conditions. 1 1 2 2 ˆ ˆ ˆ1 2 L( / ) L( / ) L( / ) .... 0 θ θ θ MLE MLE k kMLE k                         x x x
  • 2.
    Sinceln( ( /))L x is a monotonic function of the argument, it is convenient to express the MLE conditions in terms of the log-likelihood function. The condition is now given by 1 1 2 2 ˆ ˆ ˆ1 2 (L( / )) (L( / )) (L( / )) .... 0 θ θ θ MLE MLE k kMLE k Ln Ln Ln                           x x x Example 10: Let 1 2, ,..., nX X X are independent identically distributed sequence of 2 ( , )N   distributed random variables. Find MLE for  and 2  . 2 2 1 2/ , ( , ,..., / , )nf x x x   X = 2 1 2 1 1 2 ix n i e             2 2 2 / , ( / , ) ln ( / , )L f       X x x = 2 1 1 ln 2 ln - 2 n i i x n n              ˆ 1 0 ˆ( ) 0 MLE n i MLE i L x           2 2 ˆ 2 2 4 0 ˆ( ) 0 ˆ ˆ MLE MLEi MLE MLE L xn               Solving we get   1 22 1 1 ˆ 1 ˆ ˆ n MLE i i n MLE i MLE i x and n x n           Example 11: Let 1 2, ,..., nX X X are independent identically distributed random samples with
  • 3.
    1/ 1 ( ) - 2 x Xfx e x          Show that 1 2( , ,..., )nX X Xmedian is the MLE for . 1 1 2, ,...., / 1 2 1 ( , ,...., ) 2 n i i n x X X X n n f x x x e       / 1 ( / ) ln ( ) ln2 n i i L f n x         Xx x 1 n i i x    is minimized by 21, ,( ..., )nxmedian x x 21, ,ˆ ( ..., )MLE nxmedian x x  Properties of MLE (1) MLE may be biased or unbiased. In Example 4, ˆMLE is unbiased where as 2 ˆMLE is a biased estimator. (2) If an efficient estimator exists, the MLE estimator is the efficient estimator. Supposes an efficient estimator θˆ exists . Then ˆ( / ) ( )L x c        at ˆ ,MLE  ˆ ( / ) 0 ˆ ˆ( ) 0 ˆ ˆθ MLE MLE MLE L x c               (3) The MLE is asymptotically unbiased and efficient. Thus for large n, the MLE is approximately efficient. (4) Invariance Properties of MLE It is a remarkable property of the MLE and not shaerd by other estimators. If ˆ MLE is the MLE of  and ( )h  is a function, then ˆ( )MLEh  is the MLE of ( )h  . (5) ˆ MLE is a consistent estimator of 
  • 4.
    Proof: Suppose 0 isthe true value of  Given the iid observations 1 2, 3, ,......, nX X X X the sample average of the log-likelihood function is given by    / 1 1 ln n n X i i L f X n     ˆ MLE maximizes  nL  .          0 0 / / / Let ln = ln i X i X i X i i L E f X f x f x dx          Note that the expectation is carried out with respect to true value 0 According to WLLN     (1)p nL L   Now we show thjat    0nL L  Note that    0nL L                          0 0 0 0 0 0 0 0 0 / / / / / / / / / 0 ln ln ln 1 log t t-1 = 1 1 1 0 0 X X X X X X X X X n E f X E f X f X E f X f X E f X f X f x dx f X L L                                                      In other words  L  is maximum at 0  Now from (1)    p nL L 
  • 5.
  • 6.
    Bayesean Estimators We mayhave some prior information about  in a sense that some values of  are more likely (a priori information). We can represent this prior information in the form of a prior density function. In the following we omit the suffix in density functions just for notational simplicity. The likelihood function will now be the conditional density )./( xf , /( , ) ( ) ( )Xf f f x    Xx Also we have the Bayes rule / / / / ( ) ( / ) ( / ) ( ) ( ) ( / ) ( ) ( / )                    D f f f f f f f f d X X X X X x x x x x where / ( )f  X is the a posteriori density function Example: Let X be Gaussian random samples with unknown mean  and variance 1. Given ~ (0,1) N . Find the a posteriori PDF / ( / )f xX for a single observation x. Solution: We have 21 2 ( ) 2       e f , 21 ( ) 2 / ( / ) 2        x X e f x Parameter  with density )(f / Obervation ( / )f  X x x
  • 7.
    2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 1 1 ( ) 2 2 1 2 1 ( 2 ) 2 4 2 4 1 ( )1 1 22 2( 2 ) 2 1 2( 2 ) 1 1 ( ) 2 2 / 1 2( 2 ) ( ( ) 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2( / ) 2 2                                                                        x X x x x x x x x x x x X x x e e f x d e e d e e d e e d e e e f x e e 2 ) 2 1 2 2   The parameter  is a random variable and the estimator ˆ( )X is another random variable. Estimation error .θˆ   We associate a cost function or a loss function ),θˆ( C with every estimator .θˆ It represents the positive penalty with each wrong estimation. Thus ),θˆ( C is a non-negative function. The three most popular cost functions are: Quadratic cost function 2 )θˆ(  Absolute cost function θˆ ˆ( )     2  ˆ( )   
  • 8.
    Hit or misscost function (also called uniform cost function) minimising means minimising on an average) Bayesean Risk function or average cost , ˆ ˆ( , )C EC θ,θ) C(θ θ) f ( ,θ d dθ          X x x The estimator seeks to minimize the Bayescan Risk. Case I. Quadratic Cost Function and the Minimum Mean-square Error Estimator 2 θ)-θˆ()θˆ,(θ C Estimation problem is Minimize 2 , ˆ( )θ θ) f ( ,θ d dθ        X x x with respect to θˆ . This is equivalent to minimizing               df)dθf(θ)θθ ddθ)ff(θ)θθ xxx xxx )()|ˆ(( )(|ˆ( 2 2 Since )(xf is always +ve, the above integral will be minimum if the inner integral is minimum. This results in the problem: Minimize     )dθf(θ)θθ )|ˆ( 2 x with respect to .ˆ 2 / ˆ ) 0 ˆθ          (θ θ) f (θ dθX / ˆ2 ) 0        (θ θ) f (θ dθX 1 - 2  2   ( )C 
  • 9.
    / / ˆ ))         θ f (θ dθ θ f (θ dθX X / ˆ )      θ θ f (θ dθX θˆ is the conditional mean or mean of the a posteriori density. Since we are minimizing quadratic cost it is also called minimum mean square error estimator (MMSE). Salient Points  Given a priori density function )f (θ and the conditional PDF / ( )fX x ,  We have to determine a posteriori density / )f (θ X . This is determined form the Bayes rule: / / / / ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) D f f f f f f f f d                  X X X X X x x x x Example Let 1 2, ,...., nX X X be samples of ~ ( ,1)X N  and unknown mean ~ (0,1)N . Find ˆ MMSE . We are given 2 2 2 1 1 2 ( ) 2 / 1 ( ) 2 1 ( ) 2 1 / ) 2 1 ( 2 )                       i n i i xn i x n f e f ( e e X x Also, / / ( ) ( ) / ) ( )     X f f f (θ f X X x x x where
  • 10.
        2 2 1 22 2 1 / ( )1 2 2 1 2 2( 1) ( ) ( ) ( / ) 2 2 n i i n i i X x n x n x n n f f f d e d e                               Xx x     2 2 1 2 2 2 1 2 ( )1 2 2 1 / 2 2( 1) 1 2 1 2 / ) 2 1 2 1 ˆ ( / ( )) 1 n i i n i i x n x n x n n n n θ x n MMSE e f (θ e e n E n x n                                     X x X = x