Estimationtheory2

Maximum Likelihood Estimator (MLE)
Suppose 1 2, ,..., nX X X are random samples with the joint probability density
function 1 2, ,..., / 1 2( , ,..., )nX X X nf x x x which depends on an unknown nonrandom
parameter  .
/ 1 2( , , ..., / )nf x x x X is called the likelihood function. If 1 2, ,..., nX X X are discrete,
then the likelihood function will be a joint probability mass function. We represent
the concerned random variables and their values in vector notation by
1 2[ ... ]nX X X X and 1 2[ ... ]nx x x x respectively. Note that
/( / ) ln ( / )L f   Xx x is the log likelihood function. As a function of the random
variables, the likelihood and log-likelihood functions are random variables.
The maximum likelihood estimator ˆ
MLE is such an estimator that
/ 1 2 / 1 2
ˆ( , ,..., / ) ( , ,..., / ),n MLE nf x x x f x x x    X X
If the likelihood function is differentiable with respect to  , then ˆ
MLE is given by
MLE
ˆ/ θ
( / ) 0f  




X x
or 0
θ
)|L(
MLEθˆ 

 x
Thus the MLE is given by the solution of the likelihood equation given above.
If we have k unknown parameters given by
1
2
k



 
 
 
 
 
 
θ
Then MLE is given by a set of conditions.
1 1 2 2
ˆ ˆ ˆ1 2
L( / ) L( / ) L( / )
.... 0
θ θ θ
MLE MLE k kMLE
k     
  
  
   
        
x x x

Sinceln( ( / ))L x is a monotonic function of the argument, it is convenient to express the MLE
conditions in terms of the log-likelihood function. The condition is now given by
1 1 2 2
ˆ ˆ ˆ1 2
(L( / )) (L( / )) (L( / ))
.... 0
θ θ θ
MLE MLE k kMLE
k
Ln Ln Ln
     
  
  
   
    
    
x x x
Example 10:
Let 1 2, ,..., nX X X are independent identically distributed sequence of 2
( , )N   distributed
random variables. Find MLE for  and 2
 .
2
2
1 2/ ,
( , ,..., / , )nf x x x 
 X
=
2
1
2
1
1
2
ix
n
i
e



 
  
 


2
2 2
/ ,
( / , ) ln ( / , )L f  
    X
x x
=
2
1
1
ln 2 ln -
2
n
i
i
x
n n

 

 
   
 

ˆ
1
0
ˆ( ) 0
MLE
n
i MLE
i
L
x





 
  
2
2
ˆ
2
2 4
0
ˆ( )
0
ˆ ˆ
MLE
MLEi
MLE MLE
L
xn


 
 
 

   

Solving we get
 
1
22
1
1
ˆ
1
ˆ ˆ
n
MLE i
i
n
MLE i MLE
i
x and
n
x
n

 



 


Example 11:
Let 1 2, ,..., nX X X are independent identically distributed random samples with

1/
1
( ) -
2
x
Xf x e x


 
    
Show that 1 2( , ,..., )nX X Xmedian is the MLE for .
1
1 2, ,...., / 1 2
1
( , ,...., )
2
n
i
i
n
x
X X X n n
f x x x e



 

/
1
( / ) ln ( )
ln2
n
i
i
L f
n x




   
Xx x
1
n
i
i
x 

 is minimized by 21, ,( ..., )nxmedian x x
21, ,ˆ ( ..., )MLE nxmedian x x 
Properties of MLE
(1) MLE may be biased or unbiased. In Example 4, ˆMLE is unbiased where as 2
ˆMLE is a
biased estimator.
(2) If an efficient estimator exists, the MLE estimator is the efficient estimator.
Supposes an efficient estimator θˆ exists . Then
ˆ( / ) ( )L x c  


 

at ˆ ,MLE 
ˆ
( / )
0
ˆ ˆ( ) 0
ˆ ˆθ
MLE
MLE
MLE
L x
c



 




  
 
(3) The MLE is asymptotically unbiased and efficient. Thus for large n, the MLE is
approximately efficient.
(4) Invariance Properties of MLE
It is a remarkable property of the MLE and not shaerd by other estimators. If ˆ
MLE is the
MLE of  and ( )h  is a function, then ˆ( )MLEh  is the MLE of ( )h  .
(5) ˆ
MLE is a consistent estimator of 

Proof:
Suppose 0 is the true value of 
Given the iid observations 1 2, 3, ,......, nX X X X the sample average of the log-likelihood
function is given by
   /
1
1
ln
n
n X i
i
L f X
n


 
ˆ
MLE maximizes  nL  .
    
   
0
0
/
/ /
Let ln
= ln i
X i
X i X i i
L E f X
f x f x dx
 
 





Note that the expectation is carried out with respect to true value 0
According to WLLN
    (1)p
nL L  
Now we show thjat
   0nL L 
Note that    0nL L 
     
 
 
 
 
 
 
 
   
0 0 0
0
0
0
0
0
0
/ /
/
/
/
/
/
/
/
0
ln ln
ln
1 log t t-1
= 1
1 1 0
0
X X
X
X
X
X
X
X
X
n
E f X E f X
f X
E
f X
f X
E
f X
f X
f x dx
f X
L L
   









 


 
 
  
 
 
 
   
 
 
 
 
 
 
  
  

In other words  L  is maximum at 0 
Now from (1)    p
nL L 

Bayesean Estimators
We may have some prior information about  in a sense that some values of  are more likely
(a priori information). We can represent this prior information in the form of a prior density
function.
In the following we omit the suffix in density functions just for notational simplicity.
The likelihood function will now be the conditional density )./( xf
, /( , ) ( ) ( )Xf f f x    Xx
Also we have the Bayes rule
/
/
/
/
( ) ( / )
( / )
( )
( ) ( / )
( ) ( / )

 

 
  
 

 
 



D
f f
f
f
f f
f f d
X
X
X
X
X
x
x
x
x
x
where / ( )f  X is the a posteriori density function
Example: Let X be Gaussian random samples with unknown mean  and variance 1. Given
~ (0,1) N . Find the a posteriori PDF / ( / )f xX for a single observation x.
Solution: We have
21
2
( )
2




 
e
f ,
21
( )
2
/ ( / )
2



 
 
x
X
e
f x
Parameter 
with density
)(f
/
Obervation
( / )f  X
x
x

2 2
2
2
2 2
2 2
2
2
2
2
2
2 2
2
2
1 1
( )
2 2
1
2
1
( 2 )
2 4 2 4
1
( )1 1 22
2( 2 ) 2
1
2( 2 )
1 1
( )
2 2
/ 1
2( 2 )
(
( )
2 2
2 2
2 2
2 2 1
2
2
2 2
2 2( / )
2 2
 
 



 


 

 

 




 

  

   

    

 
 


  


 
 







 






x
X
x
x
x x
x x
x
x
x
x
X
x
x
e e
f x d
e e
d
e e
d
e e
d
e
e e
f x
e
e
2
)
2
1
2
2
 
The parameter  is a random variable and the estimator ˆ( )X is another random variable.
Estimation error .θˆ  
We associate a cost function or a loss function ),θˆ( C with every estimator .θˆ It represents
the positive penalty with each wrong estimation.
Thus ),θˆ( C is a non-negative function.
The three most popular cost functions are:
Quadratic cost function 2
)θˆ( 
Absolute cost function θˆ
ˆ( )   

2

ˆ( )   

Hit or miss cost function (also called uniform cost function)
minimising means minimising on an average)
Bayesean Risk function or average cost
,
ˆ ˆ( , )C EC θ,θ) C(θ θ) f ( ,θ d dθ
 

 
    X x x
The estimator seeks to minimize the Bayescan Risk.
Case I. Quadratic Cost Function and the Minimum Mean-square Error Estimator
2
θ)-θˆ()θˆ,(θ C
Estimation problem is
Minimize 2
,
ˆ( )θ θ) f ( ,θ d dθ
 

 
  X x x
with respect to θˆ .
This is equivalent to minimizing
 
 










df)dθf(θ)θθ
ddθ)ff(θ)θθ
xxx
xxx
)()|ˆ((
)(|ˆ(
2
2
Since )(xf is always +ve, the above integral will be minimum if the inner integral is minimum.
This results in the problem:
Minimize 


 )dθf(θ)θθ )|ˆ( 2
x
with respect to .ˆ
2
/
ˆ ) 0
ˆθ




  
  (θ θ) f (θ dθX
/
ˆ2 ) 0



    (θ θ) f (θ dθX
1
-
2

2
 
( )C 

/ /
ˆ ) )
 
 
 
  θ f (θ dθ θ f (θ dθX X
/
ˆ )



  θ θ f (θ dθX
θˆ is the conditional mean or mean of the a posteriori density. Since we are minimizing
quadratic cost it is also called minimum mean square error estimator (MMSE).
Salient Points
 Given a priori density function )f (θ and the conditional PDF / ( )fX x ,
 We have to determine a posteriori density / )f (θ X . This is determined form the Bayes
rule:
/
/
/
/
( ) ( )
( )
( )
( ) ( )
( ) ( )
D
f f
f
f
f f
f f d




 
 

 
 




X
X
X
X
X
x
x
x
x
Example Let 1 2, ,...., nX X X be samples of ~ ( ,1)X N  and unknown mean ~ (0,1)N . Find
ˆ
MMSE .
We are given
2
2
2
1
1
2
( )
2
/
1
( )
2
1
( )
2
1
/ )
2
1
( 2 )
















 




i
n
i
i
xn
i
x
n
f e
f ( e
e
X x
Also, /
/
( ) ( )
/ )
( )
 
 
X
f f
f (θ
f
X
X
x
x
x
where

 
 
2
2
1
2 2 2
1
/
( )1
2 2
1
2 2( 1)
( ) ( ) ( / )
2
2
n
i
i
n
i
i
X
x
n
x n x
n
n
f f f d
e
d
e


  






 


 



 








Xx x
 
 
2
2
1
2 2 2
1
2
( )1
2 2
1
/
2 2( 1)
1
2 1
2
/ )
2
1
2
1
ˆ ( / ( ))
1
n
i
i
n
i
i
x
n
x n x
n
n
n n
θ x
n
MMSE
e
f (θ
e
e
n
E
n
x
n









 


 

  
  
 

 




  


X x
X = x

Estimationtheory2

More Related Content

What's hot

Viewers also liked

Similar to Estimationtheory2

More from Gopi Saiteja

Recently uploaded

Estimationtheory2