Regional Income Convergence

Regional Income Convergence: A Spatial
Analysis Approach
Prepared by César R. Sobrino
Universidad del Turabo
November 27, 2017
1 / 76 Prepared by César R. Sobrino Regional Income Convergence: A Spatial Analysis App

Outline
1 Regression Analysis
OLS regression
Assumptions and Tests
2 Spatial Econometrics
Spatial Dependence & Spatial Heterogeneity
Spatial Matrix (W) and Moran’s I statistic
3 Income Convergence
σ-convergence
β -convergence and speed of convergence (θ)
4 GeoDa
Managing shapeﬁles
Creating Ws and Moran’s I statistic
Regression

OLS: Ordinary Least Squares
Parameters
The coeﬃcients in an equation that determine the
exact mathematical relation among the variables
(growth rate and initial income)
Unknowns.
Parameter estimation
The process of ﬁnding estimates of the numerical
values of the parameters of an equation

OLS
OLS
The general purpose of linear regression is to ﬁnd a
(linear) relationship between the dependent variable
and a set of explanatory variables.
There can be cross-section or times series data.
Bivariate form
Y = a + bX + ε
Intercept parameter (a) gives value of Y where
regression line crosses Y -axis (value of Y when X is
zero.
Slope parameter (b) gives the change in Y associated
with a one-unit change in X : ∆Y /∆X

OLS
Two objectives:
Find a good match (fit) between a + bX and observed
values of Y ( a and b are the regression coefficients).
Discover which of the explanatory variables (Xs)
contribute significantly to the linear relationship
OLS accomplished both stated objectives in an optimal
fashion according to some criteria, and is referred to as
a Best Linear Unbiased Estimator (BLUE)
OLS estimates (a and b) are found minimizing the sum
of the squared prediction errors (hence least squares).

OLS
The OLS regression line (red one) is that minimizes the
sum of the squared prediction errors

OLS
In order to obtain the BLUE property and to able to
make statistical inferences about the population
parameters (a and b) by means of your estimates (a
and b), you need to make certain assumptions about
the random part of the regression equation (the
random error ε)
Two of these assumptions are crucial to obtain the
unbiasedness and eﬃciency of the OLS estimates.

OLS
Assumptions
The random error (ε) has mean zero (there is no
systematic misspeciﬁcation or bias in the regression
equation).
Expected value: E(ε) = 0
If E(ε) = 0 does not hold, estimators are biased
The random error terms are uncorrelated and have a
constant variance (they are homoskedastic).
Variance: E(εε ) = σ2
I
If E(εε ) = σ2
I does not hold, this means that either
autocorrelation or heteroskedasticity are present, so
estimators are ineﬃcient.

Hypothesis Tests
Null hypothesis H0: a = 0 or H0: b = 0 .
Alternative hypothesis H1: a = 0 or H1: b = 0 .
If you reject H0, the paramater (a or b ) is
statistically diﬀerent from zero.

Individual statistical significance
Must determine if there is sufficient statistical evidence
to indicate that Y is truly related to X (i.e., b = 0)
Even if b = 0, it is possible that the sample will
produce an estimate b that is different from zero
Test for statistical significance using t-tests or p-values

Individual significance - t-Test
First determine the level of significance (0.1%, 1%, 5%,
10%)
Probability of finding a parameter estimate to be
statistically different from zero when, in fact, it is zero
(alpha). α = 0.001, 0.01, 0.05, or 0.1, respectively.
Probability of a Type I Error (alpha).
1 – level of significance (alpha) = level of confidence
t-ratio is computed as t = b/Sb
where Sb
is the standard error of estimate b
Use t-table to choose critical t-value with n – k
degrees of freedom for the chosen level of significance
n = number of observations
k = number of parameters estimated.

Individual significance-t-Test
If the absolute value of t-ratio is greater than the
critical t , the parameter estimate is statistically
significant at the given level of significance.
If t-ratio (in absolute value) is equal to 2 (or bigger
than 2) , you can reject H0.

Individual significance - p-Values
Treat as statistically significant only those parameter
estimates with p-values smaller than the maximum
acceptable significance level. p-value gives exact level
of significance.
Also the probability of finding significance when none
exists
Significance levels (alpha)
α = 0.001, or 0.1% significance level
α = 0.01, or 1% significance level
α = 0.05, or 5% significance level
α = 0.1, or 10 % significance level
E.g. if p-value = 0.00001, you reject H0 at 0.1%
significance level, if p-value = 0.08, you reject H0 at
10% significance level, and, if p-value = 0.14, you
cannot reject H0 at 10% significance level

Joint significance -F-test
Used to test for significance of overall regression
equation
Compare F-statistic to critical F-value from F-table
Two degrees of freedom, n – k & k – 1
Level of significance
If F-statistic exceeds the critical F (=4), the regression
equation overall is statistically significant at the
specified level of significance.

Coeﬃcient of Determination: R2
R2
measures the percentage of total variation in the
dependent variable (Y ) that is explained by the
regression equation
Ranges from 0 to 1
High R2
indicates Y and X are highly correlated
E.g. R2
= 0.8 means that 80% of the changes of Y are
explained by the regression equation.

Spatial Analysis: Motivation
Diagnosis
The assumption of normal, homoskedastic and
uncorrelated error terms that lead to BLUE
characteristic of OLS estimators are not necessarily
satisﬁed by the real models and data.
When dealing with spatial data you must give special
attention to the possibility that the errors or the
variables (Xs) in the model show spatial
dependence.

What is spatial autocorrelation (dependence)
important?
We need to examine the inﬂuences of spatial
autocorrelation upon the inferences that may be
drawn from statistical tests.
As these inferences are based on independence
assumptions (OLS asumptions), then the presence of
spatial autocorrelation is likely to bias any resultant
inference.
Dependence amongts error terms brings ineﬃcient
OLS estimates. Spatial Error (SEM).
OLS estimates are biased, and thus inferences based on
the regression model will be incorrect. Spatial Lag
(SAR).

Applied work in regional science (economics, health,
demographics, etc.) uses of spatial data.
Spatial data: Data collected with reference to location.
Administrative spatial units (states, districts,
counties, etc.).
Functional regions (E.g. labour market regions).
Points in space (E.g. cities, municipalities, plants) .
Using spatial data, model estimation, hypothesis
testing and prediction have to allow for spatial
eﬀects.

Spatial Dependence
Lack of independence among spatial data,
Observations at location i depend on other
observations at locations j (= i).
Spatial dependence is associated with the notion of
relative space (location)
Neighbouring regions are expected to be more alike
than arbitrary regions.
Spatial dependence is expected to diminish with
increasing distance.
Spatial dependence are multidirectional by nature.
Time series is unidirectional.

Spatial Dependence: Causes
Nuisance:
The delineation of spatial units is somewhat arbitrary.
Spatial data are usually collected for administrative
units (states, districts, counties, etc.).
If the correspondence between the spatial scale of a
phenomenon under study and the delineation of the
spatial units of observation is not strong,
measurement errors are to be expected.
OLS models can be corrected by including a spatial
error speciﬁcation in the model (SEM).

Spatial Dependence: Causes
Substantive:
Interaction and dependence on the regional level may
be itself a modelling problem because it generats
model bias.
Location and distance are important forces at work in
human geography and market activity. E.g spatial
spillovers, hierarchy of places, etc..
This can be corrected by including an explicit spatial
lag term as an explanatory variable in the model
(SAR).

Spatial Heterogeneity
It refers to varying economic relationships or
disturbances over space.
A different relationship may hold for every spatial
unit. This situation characterizes the case of structural
instability.
In case of structural instability, the regression
coefficients are not constant across the spatial units.
E.g. Sample: 35,000 homes sold within the last 5 years
in Lucas county, Ohio.
3 distinct distributions,with low-priced homes nearest
to the Central Business District(CBD) and high
priced homes farthest away from the CBD.
This suggests different relationships may be at work
to describe home prices in different locations.

Spatial Weight Matrix (W)
Quantify location for analyzing spatial eﬀects
Contiguity (neighbourhood)
The relative location among spatial units. Usually
established from a map.
Units near should reﬂect a greater degree of spatial
dependence than those more distant from each
other. For spatial heterogeneity, relationships may
be similar for neighbouring units.
Distance
Latitude and longitude allow us to calculate distances
from any point in space.
Spatial dependence will decline with distance.
For (spatial heterogeneity, closer units should
exhibit similar relationships.

W
In a regular grid, neighbours can be deﬁned in a
number of ways. Among others, you may ﬁnd
In analogy of the game of chess, rook contiguity, bishop
contiguity and queen contiguity are distinguished.
Inverse distance raised to a power.

W: Rook contiguity
A spatial unit is a neighbour of another unit if both
areas share a common edge (side). In the next ﬁgure,
the units B1, B2, B3 and B4 are neighbours of unit A
according to the rook criterion.
B2
B1 A B3
B4

W:Queen contiguity
A spatial unit is a neighbour of another unit if both
areas share a common edge or vertex. In the next
ﬁgure the units B1, B2, B3 and B4 as well as C1, C2,
C3 and C4 are neighbours of unit A according to the
queen criterion.
C1 B2 C2
B1 A B3
C3 B4 C4

W:Distance-based spatial weight matrix
Spatial interaction will decline with increasing distance
due to increasing geographical impediments.
Nearer regions have a greater potential inﬂuence.
Power function: Wij = 1/dij
γ
, where
γ is a power parameter
Wij element of matrix W at row i and column j
(i = j)
dij: distance between region i and region j
The distances, dij, are usually measured between the
centres of the regions (latitude and longitude).

W: Representing 5 regions
Rook Standardized Distance-based






0 1 1 0 0
1 0 1 1 0
1 1 0 1 0
0 1 1 0 1
0 0 0 1 0












0 1
2
1
2
0 0
1
3
0 1
3
1
3
0
1
3
1
3
0 1
3
0
0 1
3
1
3
0 1
3
0 0 0 1 0












0 1
d12
1
d13
1
d14
1
d15
1
d21
0 1
d23
1
d24
1
d25
1
d31
1
d32
0 1
d34
1
d35
1
d41
1
d42
1
d43
0 1
d45
1
d51
1
d52
1
d53
1
d54
0






γ =1 & dij is the distance between i and j, i = j

Testing Spatial Autocorrelation
Moran’s I - statistic: test for spatial dependence.
Pearson correlation: ρxy = Sxy
SxSy
,
where Sxy is the covariance between x and y, Sx is the
standard deviation of x, and, Sy is the standard
deviation of y
Covariance formula
Sxy =
n
i=1(xi − ¯x)(yi − ¯y)
n − 1
, then
ρxy =
n
i=1(xi − ¯x)(yi − ¯y)
SxSy(n − 1)

Moran’s I - statistic
Similarities between units i and j are calculated as the
product of the diﬀerences between xi (variable of
interest) and xj (spatial lag) with the overall mean (¯x),
divided by the sample variance. This ratio has to be
adjusted for the spatial weights used.
I =
n
n
i
n
j Wij
n
i
n
j Wij(xi − ¯x)(xj − ¯x)
n
i (xi − ¯x)2
where xi is the i-th observation, n is the sample size,
and Wij is the spatial weight between i and j.

Moran’s I - statistic
The expected value of Moran’s I statistic: − 1/(n − 1)
E.g. if n = 48 regions ⇒ − 1/(48 − 1) = 0.0213, which is
close to zero, meaning no spatial autocorrelation.
Then, H0 : I = 0 and H1 : I = 0.
A standardized matrix bounds I between -1 and 1.
-1 means perfect clustering of dissimilar values (perfect
dispersion).
0 is no autocorrelation (perfect randomness)
1 means perfect clustering of similar values (spatial
autocorrelation).

Spatial Lag (SAR)
1 OLS regression Y = a + bX + ε
2 SAR (including W) : Y = ρWY + a + bX + ε
3 Y = (1 − ρW)−1
a + (1 − ρW)−1
bX + (1 − ρW)−1
ε
4 Where ρ is a scalar parameter that indicates the eﬀect
of the dependent variable in the neighbors on Y in the
focal area, intercept, (1 − ρW)−1
a, slope, (1 − ρW)−1
b
, and, error term, (1 − ρW)−1
ε
5 GeoDa reports 3) & ρ
6 Not including ρW brings biased estimates and thus
inferences based on an OLS model will be incorrect

Spatial Error Model (SEM)
1 OLS regression Y = a + bX + ε
2 SEM (including W): Y = a + bX + ε & ε = λWε + µ
3 Y = a + bX + (1 − λW)−1
µ
4 Where: λ is the autoregressive coeﬁcient and µ is
another error term, intercept a, slope b, and , error
term , (1 − λW)−1
µ
5 Geoda reports 3) & λ
6 Not including λW brings unbiased estimates and
biased standard errors and consequently, t-tests &
p-values will be misleading.

Income Convergence
Robert Solow (1956) “Capital should ﬂow from
countries with a high capital-to-output ratio to
countries with a low capital-to-output ratio ”
“Poor” countries/regions/states should have higher
growth rates.
”rich” countries/regions/states should have lower
growth rates
The analysis using regions is called Regional Income
Convergence.

Sigma Convergence, (σ- convergence)
It refers to decreasing variance of variables over time.
This is measured by the coeﬃcient of variation (CV)
which gives the relative standard deviation to the
mean (the standard deviation divided by mean).
Since CV is mean standardized, it controls for
increasing averages over time and can be directly
compared across diﬀerent variables.
When the CV of real per capita income across regions
falls over time, there is σ-convergence .

Beta Convergence, (β- convergence)
It considers the mobility of countries (regions).
It is deﬁned as a negative correlation between the
position of individual countries (regions) at the
beginning of an observation period and the changes or
growth rates over this period.
It assumes that growth from a low base is faster than
growth from high levels.

β- convergence
OLS regression model
LINC1i − LINC0i = a + βLINC0i + ε0i
Where:
LINC1i is the ﬁnal(1) per capita income for region i
in logs.
LINC0i is the initial(0) per capita income for region i
in logs.
LINC1i − LINC0i is the growth rate between the
ﬁnal year and the initial year.
L stands for logs
ε0i is an error term

β values
β > -1 and β <0 (β ∈ ]-1,0[) and significant means
β-convergence.
β > 0 (β ∈ ]0, ∞+
[) and significant means “divergence”
β = 0 , neither “convergence” nor “divergence”
β not significant , neither “convergence” nor
“divergence”

Convergence rate (θ)
θ = ln(1 + β)/(−k)
Where
k is diﬀerence between periods (E.g. k=1945-1929=16)
E.g if β = -0.2 and k= 16.
θ = ln(−0.2 + 1)/(−16)
θ = ln(0.8)/(−16)
θ = − 0.22/− 16 = 0.01375 or 1.4% (speed of
convergence).
This means that regions converge at a speed of 1.4
percent per year.
Note: ln(1)=0 and ln(0) does not exist, so, if β = -1 ,
θ does not exist, and , if β = 0 , θ =0
The logarithmic function does not take negative values.

Rey and Montuori(1998) (R & M)
This is an article on regional income convergence
Their data includes 48 states and used four years in
their study (1929, 1945, 1946, and 1994). They
included neither Hawaii nor Alaska.
Three periods: 1929-94, 1929-45, and, 1946-94. So,
they run a cross-sectional analysis.

GeoDa
GeoDa is a free and open source software tool that
serves for spatial data analysis.
You may download it from
http://geodacenter.github.io/download.html
The shapefiles (shp) are the most used files.
A shapefile stores nontopological geometry and
attribute information for the spatial features in a data
set. It includes an ID variable to identify regions.
A shapefile consists of at least four actual files, an
index file (shx), a data base table (dbf) and a
projection file (prj).

GeoDa
For your research paper, first you have to choose a
country and gather your data in excel (or in Open
Office/spreadsheet).
Download Open Office from https:
//www.openoffice.org/download/index.html
Later, look for a shapefile of the regions of that
country. This link is helpful
http://www.gadm.org/country.
Open that shapefile in Geoda.
Create the variables that you will use. Table/Add
variable and set integer, 10 lentgh, and 3 decimals.
GeoDa will create empty columns.
Click “Table” and select (if you need to do it) the
regions that you will use. Do not include isolate
regions such as islands.

GeoDa
Save as a new shapefile (create a new directory).
Automatically, GeoDa creates a dbf, shx, and prj files.
To include your data to the new shapefile, you have to
open the new dbf file using Open Office
(spreadsheet/international/OK).
Check the correspondence between the regions of the
new dbf file and the regions of your data. The order of
your data has to be equal to the order of the new dbf
file.
Copy your data and paste it on the new dbf file. Fill
the empty columns.
Save (Keep the current format)

GeoDa: US states
Open the new shapefile with your gathered data
The variables that I have gathered are:
INC29: 1929 real per capita income
Three different sample periods 1929-94, 1929-45, and,
1946-94
The initial year is 1929, the break year is 1945, and the
final year is 1994.

GeoDa: Calculating variables
Creating variables in logs Table/Add Variables
LINC29, LINC45, LINC46, and , LINC94. GeoDa
will create empty columns.
Table/Variable calculation/univariate set “LINC29” ,
operator “log (base e)” and variable “INC29”. Do the
same for the other variables.
Creating growth rates Table/Add Variables:
dI94I29, dI45I29, and, dI94I46, GeoDa will create
empty columns.
Table/Variable calculation/bivariate set “dI94I29” ,
variable, “, LINC94,” operator, “subtract”, variable,
“LINC29” . Do the same for the other growth rates.

GeoDa US states - Descriptive Statistics
Click on Explore/Boxplot

US states: Exploring β-convergence
Explore/
Scatter plot/
1994-29
Y: “dI94I29”, X: ‘‘LINC29”, OK
1945-29
Y: “dI45I29”, X: ‘‘LINC29”, OK
1994-46
Y: ‘dI94I46”, X: ‘‘LINC46”, OK
You should get negative relationships

US states: Exploring β- convergence
1994-29 1945-29 1994-46
X : Initial income & Y : Growth rate.
At ﬁrst glance, β- convergence holds.

GeoDa: Exploring Spatial Dependence
Map/Quantile Map/5 to check if there are spatial
patterns. Do you ﬁnd any?
1929 per capita income 1945 per capita income
1994 per capita income

GeoDa: Creating W and Moran scatterplots
Spatial Matrix
Tools/Weights manager/create
Weights File ID variable “’GEOID’ .
Your shapeﬁle must have one ID variable
Queen Contiguity
Create/Save
Moran’s I
Space/
Univariate’s Moran’s I/
Set the variable you want to analyze/
Set W/Queen
The scatterplot enables you to assess how similar a
spatial unit is to its neighbors.

GeoDa: Moran scatterplot- state per capita income
X: Spatial units; Y: the weighted average or spatial lag
of the corresponding observation on the X axis.
1929 1945 1994
They show spatial dependence because there is a
positive correlation (See page 146, R & M)

GeoDa US states - Exploring data
The CV of real per capita income in logs across US
states falls over time, so σ-convergence holds
According to Moran’s I, data shows spatial
dependence.
Table: Descriptive Statistics
Mean Median SD CV Moran’s I
LINC29 6.35 6.38 0.38 0.06 0.65
LINC45 7.02 7.03 0.23 0.03 0.57
LINC94 9.96 9.95 0.13 0.01 0.35

GeoDa: OLS regression
See slide 37 and R & M page 148, equation 4
Click on Regression
Dependent variable/ growth rate (E.g. dI94I29 )
Independent variable/ initial income (E.g. LINC29 )
Weights ﬁle/
Classic: This will run classical OLS regression with
spatial dependence diagnostics, click Run.
Three regressions:
1 dI94I29i = a + βLINC29i + ε29i
Where i : 1, 2, 3, ....., .48

GeoDa: OLS regression-Outcomes: 1994-29
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES ESTIMATION
Dependent Variable : DIN94IN29 Number of Observations: 48
Mean dependent var : 3.61054 Number of Variables : 2
S.D. dependent var : 0.284673 Degrees of Freedom : 46
R-squared : 0.918195 F-statistic : 516.314
Adjusted R-squared : 0.916417 Prob(F-statistic) : 1.20184e-26
Sum squared residual: 0.318208 Log likelihood : 52.281
Sigma-square : 0.00691757 Akaike info criterion : -100.562
S.E. of regression : 0.0831719 Schwarz criterion : -96.8195
Sigma-square ML : 0.00662934
S.E of regression ML: 0.0814207
----------------------------------------------------------------------
-------
Variable Coefficient Std.Error t-Statistic Probability
----------------------------------------------------------------------
-------
CONSTANT 8.25684 0.204832 40.3104 0.00000
LINC29 -0.732026 0.0322159 -22.7225 0.00000
----------------------------------------------------------------------
-------
REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER 34.095520 (Extreme
Multicollinearity)
TEST ON NORMALITY OF ERRORS
TEST DF VALUE PROB
Jarque-Bera 2 1.0399 0.59456
DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST DF VALUE PROB
Breusch-Pagan test 1 0.0012 0.97181
Koenker-Bassett test 1 0.0013 0.97079
DIAGNOSTICS FOR SPATIAL DEPENDENCE FOR WEIGHT MATRIX : nuevoq
(row-standardized weights)
TEST MI/DF VALUE PROB
Moran's I (error) 0.1509 1.9658 0.04932
Lagrange Multiplier (lag) 1 3.5538 0.05941
Robust LM (lag) 1 1.9997 0.15733
Lagrange Multiplier (error) 1 2.1903 0.13888
Robust LM (error) 1 0.6362 0.42511
Lagrange Multiplier (SARMA) 2 4.1900 0.12307
============================== END OF REPORT
================================

----------------------------------------------------------------------
-------
----------------------------------------------------------------------
-------
CONSTANT 3.39038 0.180291 18.8051 0.00000
LINC29 -0.427882 0.0283561 -15.0896 0.00000
----------------------------------------------------------------------
-------
MULTICOLLINEARITY CONDITION NUMBER 34.095520
TEST DF VALUE PROB
Jarque-Bera 2 0.2160 0.89762
RANDOM COEFFICIENTS
TEST DF VALUE PROB
SPECIFICATION ROBUST TEST
TEST DF VALUE PROB
White 2 2.3107 0.31495
DIAGNOSTICS FOR SPATIAL DEPENDENCE FOR WEIGHT MATRIX : nuevoq
(row-standardized weights)
Moran's I (error) 0.3815 4.3930 0.00001
Robust LM (lag) 1 2.3441 0.12576
Robust LM (error) 1 5.2958 0.02138
============================== END OF REPORT
================================

Data set : nuevo
----------------------------------------------------------------------
-------
----------------------------------------------------------------------
-------
CONSTANT 7.07005 0.374654 18.8709 0.00000
LINC46 -0.589693 0.0532032 -11.0838 0.00000
----------------------------------------------------------------------
-------
MULTICOLLINEARITY CONDITION NUMBER 59.704369
TEST DF VALUE PROB
Jarque-Bera 2 0.5390 0.76376
RANDOM COEFFICIENTS
TEST DF VALUE PROB
SPECIFICATION ROBUST TEST
TEST DF VALUE PROB
White 2 1.6639 0.43519
DIAGNOSTICS FOR SPATIAL DEPENDENCE
FOR WEIGHT MATRIX : nuevoq (row-standardized weights)
Moran's I (error) 0.3141 3.6646 0.00025
Robust LM (lag) 1 2.5955 0.10717
Robust LM (error) 1 1.6193 0.20319
============================== END OF REPORT
================================

GeoDa: OLS regression-Outcomes
With these outputs, you should be able to complete R
& M Table 2
You may ﬁnd R2
s, AICs (Akaike Infomation
Criterion), βs, and, p − values,
Convergence rate(θ) is calculated using β (See slide 39)
Tests for spatial dependence: Robust LM (lag and
error) and Moran’s I (error).
Breusch-Pagan Test (test for Heteroskedasticity).
AIC: Value for model selection

Reporting OLS outomes
Table: Unconditional model OLS estimation
R2 (σ2) AIC β (p − value) Convergence
rate (θ)
1929-94 0.918 -100.562 -0.732 0.020
( 0.007) (0.000)
1929-45 0.832 -112.813 -0.428 0.035
(0.005) (0.000)
1946-94 0.728 -96.323 -0.590 0.020
(0.008) (0.000)
Robust LM Robust LM Moran’s I (error)
(error) p-value (lag) p-value MI(p-value)
Diagnostics for spatial dependence
1929-94 0.425 0.157 0.1509 (0.049)
1929-45 0.021 0.126 0.3815 (0.000)
1946-94 0.203 0.107 0.3141(0.000)
Breusch-Pagan test
p-value
Diagnostics for heteroskedasticity
1929-94 0.972
1929-45 0.166
1946-94 0.213

GeoDa: Diagnostic Tests
Heteroskedasticiy: When regression errors do not
have a constant variance over all observations.
Breush-Pagan Test:
H0: homocedasticity ; H1: heteroskedasticity
Multicollinearity: High correlation between Xs
Condition number > 30 is considered suspect
Condition number =1 means a lack of multicollinearity
Non-normal errors: Most regression models assume
normal errors distributions
Jarque-Bera Test:
H0: normal errors ; H1: no existence of normal errors
AIC: Calculate AIC for each model with the same
data set, and the “best” model is the one with
minimum AIC value.
If p − value is greater than 0.1, you cannot reject H0

GeoDa: OLS vs SAR & SEM
GeoDa reports Moran’s I (error), LM (lag), LM (error),
Robust LM (lag), and, Robust LM (error)
Moran’s I (error) is an extension of Moran’s I -statistic
to measure spatial autocorrelation in regression
models. It is useful to detect spatial dependence but
they do not allow to discriminate betweem SAR and
SEM.
H0: OLS ; H1: Spatial dependence
LM (error): H0: OLS ; H1: SEM
LM (lag): H0: OLS ; H1: SAR
If LMs are signiﬁcant (H0 is rejected) , focus on robust
tests.

GeoDa: OLS vs SAR & SEM
Robust LM (error): H0: OLS ; H1: SEM
Robust LM (lag): H0: OLS ; H1: SAR
if both robust measures are signiﬁcant, stick with the
more signiﬁcant.

Interpretation of OLS outcomes
Results provide much support for β-convergence.
Coefficients highly signicant and negative.
R2
above 0.7 in all three samples
Convergence rate over entire sample, 2% yearly but
first sub sample, 3.5%, second sub sample 2%
Moran’s I statistic (MI) provides very strong evidence
(See p-value) of spatial dependence
Robust tests point to the presence of spatial error
(SEM) rather than the spatial lag (SAR).
Breusch- Pagan test for heteroscedasticity is not
significant in any of the sub-samples. Then, omit
further consideration of the spatial heterogeneity
models.

GeoDa : SAR
Click on Regression
Weights ﬁle/
Spatial lag
Three regressions:
1 dI94I29i = a + ρWdI94I29i + βLINC29i + ε29i
Where i : 1, 2, 3, ....., .48

GeoDa: SAR -Outcomes: 1994-29
SUMMARY OF OUTPUT: SPATIAL LAG MODEL - MAXIMUM LIKELIHOOD ESTIMATION
Data set : nuevo
Spatial Weight : nuevoq
Lag coeff. (Rho) : 0.153427
R-squared : 0.924712 Log likelihood : 54.1372
Sq. Correlation : - Akaike info criterion : -102.274
Sigma-square : 0.00610122 Schwarz criterion : -96.6607
S.E of regression : 0.0781103
----------------------------------------------------------------------
-------
Variable Coefficient Std.Error z-value Probability
----------------------------------------------------------------------
-------
W_DIN94IN29 0.153427 0.0776567 1.97571 0.04819
CONSTANT 7.21331 0.560658 12.8658 0.00000
LINC29 -0.655089 0.0491648 -13.3243 0.00000
----------------------------------------------------------------------
-------
RANDOM COEFFICIENTS
TEST DF VALUE PROB
SPATIAL LAG DEPENDENCE FOR WEIGHT MATRIX : nuevoq
TEST DF VALUE PROB
Likelihood Ratio Test 1 3.7124 0.05401
============================== END OF REPORT
================================

GeoDa: SAR-Outcomes: 1945-29
Data set : nuevo
----------------------------------------------------------------------
-------
----------------------------------------------------------------------
-------
W_DIN45IN29 0.295355 0.0974027 3.0323 0.00243
CONSTANT 2.64263 0.300699 8.78829 0.00000
LINC29 -0.341486 0.0388813 -8.78278 0.00000
----------------------------------------------------------------------
-------
RANDOM COEFFICIENTS
TEST DF VALUE PROB
TEST DF VALUE PROB
============================== END OF REPORT
================================

GeoDa:SAR-Outcomes: 1994-46
Data set : nuevo
----------------------------------------------------------------------
-------
----------------------------------------------------------------------
-------
W_DIN94IN46 0.350822 0.11406 3.07577 0.00210
CONSTANT 5.02565 0.731279 6.87241 0.00000
LINC46 -0.445107 0.0651113 -6.8361 0.00000
----------------------------------------------------------------------
-------
RANDOM COEFFICIENTS
TEST DF VALUE PROB
TEST DF VALUE PROB
============================== END OF REPORT
================================

GeoDa : SEM
Click on Regression
Weights ﬁle/
Spatial error
Three regressions:
1 dI94I29i = a + βLINC29i + ε29i ; ε29i = λWε29i + µ29i
Where i : 1, 2, 3, ....., .48

GeoDa: SEM: 1994-29
SUMMARY OF OUTPUT: SPATIAL ERROR MODEL - MAXIMUM LIKELIHOOD ESTIMATION
Data set : nuevo
Lag coeff. (Lambda) : 0.254318
R-squared : 0.922600 R-squared (BUSE) : -
Sq. Correlation : - Log likelihood : 53.223613
S.E of regression : 0.0791982 Schwarz criterion : -98.7048
----------------------------------------------------------------------
-------
----------------------------------------------------------------------
-------
CONSTANT 8.15606 0.231367 35.2516 0.00000
LINC29 -0.716327 0.0363589 -19.7016 0.00000
LAMBDA 0.254318 0.182314 1.39494 0.16303
----------------------------------------------------------------------
-------
RANDOM COEFFICIENTS
TEST DF VALUE PROB
SPATIAL ERROR DEPENDENCE FOR WEIGHT MATRIX : nuevoq
TEST DF VALUE PROB
============================== END OF REPORT
================================

GeoDa: SEM: 1945-29
Data set : nuevo
----------------------------------------------------------------------
-------
----------------------------------------------------------------------
-------
CONSTANT 3.21562 0.216223 14.8718 0.00000
LINC29 -0.39988 0.0338626 -11.8089 0.00000
LAMBDA 0.58025 0.131908 4.3989 0.00001
----------------------------------------------------------------------
-------
RANDOM COEFFICIENTS
TEST DF VALUE PROB
TEST DF VALUE PROB
============================== END OF REPORT
================================

GeoDa:SEM: 1994-46
Data set : nuevo
----------------------------------------------------------------------
-------
----------------------------------------------------------------------
-------
CONSTANT 6.64014 0.432628 15.3484 0.00000
LINC46 -0.529052 0.0613691 -8.62082 0.00000
LAMBDA 0.433936 0.157919 2.74785 0.00600
----------------------------------------------------------------------
-------
RANDOM COEFFICIENTS
TEST DF VALUE PROB
TEST DF VALUE PROB
============================== END OF REPORT
================================

GeoDa: SAR & SEM-Outcomes
With these outputs, you should be able to complete R
& M Table 3
You may ﬁnd R2
s, AICs, βs, λs, ρs and, p − values,
Convergence rate(θ) is calculated using β (See slide 39)

Reporting SAR & SEM outomes
Table: Spatial Dependence Models
Model speciﬁcation AIC β λ, ρ LM test
(p-value) p-value p-value
1929-94
Spatial error (ML) -102.447 -0.716 (0.000) 0.163 0.169
Spatial lag (ML) -102.274 -0.655(0.000) 0.048 0.054
1929-45
Spatial error (ML) -125.534 -0.399(0.000) 0.000 0.000
Spatial lag (ML) -102.447 -0.341(0.000) 0.002 0.002
1946-94
Spatial error (ML) -103.464 -0.529(0.000) 0.006 0.008
Spatial lag (ML) -103.734 -0.445(0.000) 0.002 0.002
Convergence rate (θ) based on the spatial error (ML) estimates
θ
1929-94 0.019
1929-45 0.032
1946-94 0.016

Interpretation of SEM & SAR outcomes
For SEM, as expected, AIC indicates that the fit of
each of the three spatial models is superior to OLS.
βs are significant and negative but different from OLS
coefficients.
OLS suffers from a misspecication due to omitted
spatial dependence.
The coefficients on error (λ) and lag(ρ) terms are
significant in the sub-samples. For the full sample, just
ρ is significant
LM test indicates that there is not spatial dependence
remaining in SAR and SEM.
Including spatial dependence reduces the convergence
rates (θs)
Convergence rate over entire sample, 1.9% yearly but
first sub sample, 3.2%, second sub sample 1.6%

Research paper
Choose a country (e.g. Canada, South Korea,
Portugal, Mexico, etc.), ﬁnd the historical data of the
real per capita GDP (personal income, GSP(Gross
State Product)) of its
states/provinces/municipalities/regions, and, do an
income convergence analysis about them (β -
convergence and σ - convergence).
In addition, you have to choose an event that was
important for the economy of that country (e.g. a new
constitution, an improvement in the power system, an
strong devaluation of its currency, a natural disaster, a
war, etc.), so that,you may choose "four years" like
Rey and Montuori did.

Research paper
Sections: introduction, literature, methodology (data),
outcomes (explanations of them), and conclusions.
Your scholarly work must report:
Exploring β- convergence (Slide 48).
Exploring Spatial Dependence (Slide 49).
Moran scatterplots (Slide 51).
Exploring data - σ- convergence & Moran’s I (Slide
52).
OLS outcomes, convergence rates, & spatial diagnostic
tests (Table slide 58).
SAR & SEM outcomes. Convergence rates (Table
slide 72).

References
1 Anselin, L (2005). Exploring Spatial Data with
GeoDaTM: A workbook
2 LeSage, J. and Pace R. K. (2009). “Introduction to
Spatial Econometrics” Taylor & Francis„ Boca Raton
3 Rey S. J. and Montouri B. D. (1999) “US Regional
Income Convergence: a Spatial Econometric
Perspective”, Regional Studies 33 , 143-156.
4 Solow, R. M. (1956). “A Contribution to the Theory of
Economic Growth” The Quarterly Journal of
Economics, 70(1), 65-94.

Regional Income Convergence

More Related Content

Similar to Regional Income Convergence

More from Cesar Sobrino

Recently uploaded

Regional Income Convergence