f33-ft-computing-lec09-correct.ppt

Oct. 2007 Error Correction Slide 1
Fault-Tolerant Computing
Dealing with
Mid-Level
Impairments

About This Presentation
Edition Released Revised Revised
First Oct. 2006 Oct. 2007
This presentation has been prepared for the graduate
course ECE 257A (Fault-Tolerant Computing) by
Behrooz Parhami, Professor of Electrical and Computer
Engineering at University of California, Santa Barbara.
The material contained herein can be used freely in
classroom teaching or any other educational setting.
Unauthorized uses are prohibited. © Behrooz Parhami

Error Correction

Multilevel Model
Component
Logic
Service
Result
Information
System
Low-Level
Impaired
Mid-Level
Impaired
High-Level
Impaired
Initial
Entry
Deviation
Remedy
Legned:
Ideal
Defective
Faulty
Erroneous
Malfunctioning
Degraded
Failed
Legend:
Tolerance
Entry
Last
lecture
Today

High-Redundancy Codes
Triplication is a form of error coding:
x represented as xxx (200% redundancy)
Corrects any error in one version
Detects two nonsimultaneous errors
With a larger replication factor, more errors can be tolerated
Encoding Decoding
f(x)
x y
f(x)
f(x)
If we triplicate the voter to obtain 3 results,
we are essentially performing the operation
f(x) on coded inputs, getting coded outputs
V
Our challenge today is to come up with strong correction capabilities,
using much lower redundancy (perhaps an order of magnitude less)
To correct all single-bit errors in an n-bit code, we must have 2r > n,
or 2r > k + r, which leads to about log2 k check bits, at least

Error-Correcting Codes: Idea
A conceptually simple error-correcting code:
Arrange the k data bits into a k1/2  k1/2 square array
Attach an even parity bit to each row and column of the array
Row/Column check bit = XOR of all row/column data bits
Data space: All 2k possible k-bit words
Redundancy: 2k1/2 + 1 check bits for k data bits
Corrects all single-bit errors (lead to distinct noncodewords)
Detects all double-bit errors (some triples go undetected)
Encoding
Decoding
Data words Codewords
Noncodewords
Errors
Data space Code space
Error space
0 1 1 0
0 1 0 1
1 0 1 0
1 0 0 1
0
To be avoided
at all cost

Error-Correcting Codes: Evaluation
Redundancy: k data bits encoded in n = k + r bits (r redundant bits)
Encoding: Complexity (cost / time) to form codeword from data word
Decoding: Complexity (cost / time) to obtain data word from codeword
Capability: Classes of error that can be corrected
Greater correction capability generally involves more redundancy
To correct c bit-errors, a minimum code distance of 2c + 1 is required
Combined error correction/detection capability:
To correct c errors and additionally detect d errors (d > c),
a minimum code distance of c + d + 1 is required
Example: Hamming SEC/DED code has a code distance of 4
Examples of code correction capabilities:
Single, double, byte, b-bit burst, unidirectional, . . . errors

Hamming Distance for Error Correction
Red dots represent codewords
Yellow dots, noncodewords
within distance 1 of codewords,
represent correctable errors
Blue dot, within distance 2 of
three different codewords
represents a detectable error
Not all “double errors” are
correctable, however, because
there are points within distance
2 of some codewords that are
also within distance 1 of another
The following visualization, though not completely accurate, is still useful

A Hamming SEC Code
d3 d2 d1 d0 p2 p1 p0
Data bits Parity bits
Uses multiple parity bits, each applied to
a different subset of data bits
Encoding: 3 XOR networks to form parity bits
Checking: 3 XOR networks to verify parities
Decoding: Trivial (separable code)
Redundancy: 3 check bits for 4 data bits
Unimpressive, but gets better with more data bits
(7, 4); (15, 11); (31, 26); (63, 57); (127, 120)
Capability: Corrects any single-bit error
s2 = d3  d2  d1  p2
s1 = d3  d1  d0  p1
s0 = d2  d1  d0  p0
s2 s1 s0
Error
0 0 0 None
0 0 1 p0
0 1 0 p1
0 1 1 d0
1 0 0 p2
1 0 1 d2
1 1 0 d3
1 1 1 d1
s2 s1 s0
Syndrome

Matrix Formulation of Hamming SEC Code
d3 d2 d1 d0 p2 p1 p0
d3 d2 d1 d0 p2 p1 p0
1 1 1 0 1 0 0
1 0 1 1 0 1 0
0 1 1 1 0 0 1 s2 s1 s0
Error
0 0 0 None
0 0 1 p0
0 1 0 p1
0 1 1 d0
1 0 0 p2
1 0 1 d2
1 1 0 d3
1 1 1 d1
Parity check matrix
d3
d2
d1
d0
p2
p1
p0
s2
s1
s0
 =
Syndrome
Received
word
Syndrome matches the p2 column
in the parity check matrix
Matrix-vector multiplication is done
with AND/XOR, instead of /+

Matrix Rearrangement for Simpler Correction
p0 p1 d0 p2 d2 d3 d1
Data and parity bits
p0 p1 d0 p2 d2 d3 d1
0 0 0 1 1 1 1
0 1 1 0 0 1 1
1 0 1 0 1 0 1 s2 s1 s0
Error
0 0 0 None
0 0 1 p0
0 1 0 p1
0 1 1 d0
1 0 0 p2
1 0 1 d2
1 1 0 d3
1 1 1 d1
p0
p1
d0
p2
d2
d3
d1
s2
s1
s0
 =
Syndrome indicates
error in position 4
1 2 3 4 5 6 7
Position number
s2
s1
s0
Decoder
0 1-7
Corrected
version
1-7
1-7
Matrix columns
are binary rep’s
of column indices

Hamming Generator Matrix
d3 d2 d1 d0 p2 p1 p0
d3 d2 d1 d0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 1 1 0
1 0 1 1
0 1 1 1
Generator matrix
d3
d2
d1
d0
 =
Codeword
Data word
d3
d2
d1
d0
p2
p1
p0
Recall that matrix-vector multiplication
is done with AND/XOR, instead of /+
Data bits

Generalization to Wider Hamming SEC Codes
p0 p1 d0 p2 . . .
0 0 0 . . . 1 1 1
: : : . . . : : :
0 1 1 . . . 0 1 1
1 0 1 . . . 1 0 1
p0
p1
d0
p2
.
.
.
sr-1
:
s1
s0
 =
1 2 3 2r–1
Position number
sr-1
:
s1
s0
Decoder
2r-1
Corrected
version
2r-1
2r-1
n k = n – r
7 4
15 11
31 26
63 57
127 120
255 247
511 502
1023 1013
Condition for general
Hamming SEC code:
n = k + r = 2r – 1
Matrix columns
are binary rep’s
of column indices

1 1 1 . . . 1 1 1 1
0
:
0
0
pr
sr
A Hamming SEC/DED Code
p0 p1 d0 p2 . . .
0 0 0 . . . 1 1 1
: : : . . . : : :
0 1 1 . . . 0 1 1
1 0 1 . . . 1 0 1
p0
p1
d0
p2
.
.
.
sr-1
:
s1
s0
 =
1 2 3 2r–1
Position number
Add an extra row of
all 1s and a column
with only one 1 to the
parity check matrix
Parity check matrix Syndrome
Received
word
sr-1
:
s1
s0
Decoder
2r-1
Data and
parity bits
Corrected
version
2r-1
2r-1
sr
q
Not single error
Easy to verify that the appropriate “correction”
is made for all 4 combinations of (sr,q) values

Some Other Useful Codes
BCH codes: Named in honor of Bose, Chaudhuri, Hocquenghem
Reed-Solomon codes: Special case of BCH code
Example: A popular variant is RS(255, 223) with 8-bit symbols
223 bytes of data, 32 check bytes, redundancy  14%
Can correct errors in up to 16 bytes anywhere in the 255-byte codeword
Used in CD players, digital audio tape, digital television
Hamming codes are examples of linear codes
Linear codes may be defined in many other ways
There are also many other classes of codes
Reed-Muller codes: Have a recursive construction, with smaller codes
used to build larger ones
Turbo codes: Highly efficient separable
codes, with iterative (soft) decoding
Encoder 1
Encoder 2
Interleaver
Data
Code

Reed-Solomon Codes
With k data symbols, require 2t check symbols, each s bits wide, to correct
up to t symbol errors; hence, RS(k + 2t, k) has distance 2t + 1
The number k of data symbols must satisfy k  2s – 1 – 2t (s grows with k)
Generator polynomial: g(x) = (x – b)(x – b2)(x – b3)(x – b4);
b is a primitive root mod 7, that is, b7 = 1 mod 7, but bj  1 mod 7 for j < 7
Pick b = 3  g(x) = (x – 3)(x – 32)(x – 33)(x – 34)
= (x – 3)(x – 2)(x – 6)(x – 4) = x4 + 6x3 + 3x2 + 2x + 4
k data symbols 2t check symbols
Example: RS(6, 2) code, with 2 data and 2t = 4 check symbols (7-valued)
 up to t = 2 symbol errors correctable; hence, RS(6, 2) has distance 5
As usual, the codeword is the product of g(x) and the info polynomial;
convertible to matrix-by-vector multiply by deriving a generator matrix G

BCH Codes
BCH(15, 7) code: Capable of correcting any two errors
Correct the deficiency of Reed-Solomon code; have a fixed alphabet
We usually choose the alphabet {0, 1}
Generator polynomial: g(x) = 1 + x4 + x6 + x7 + x8
[0 1 1 0 0 1 0 1 1 0 0 0 0 1 0] 
1000 1000
0100 0001
0010 0011
0001 0101
1100 1111
0110 1000
0011 0001
1101 0011
1010 0101
0101 1111
1110 1000
0111 0001
1111 0011
1011 0101
1001 1111
= [x x x x x x x x]
Received word
Parity check matrix
Syndrome
BCH(511, 493) used as
DEC code in a video
coding standard for
videophones
BCH(40, 32) used as
SEC/DED code in ATM

Arithmetic Error-Correcting Codes
––––––––––––––––––––––––––––––––––––––––
Positive Syndrome Negative Syndrome
error mod 7 mod 15 error mod 7 mod 15
––––––––––––––––––––––––––––––––––––––––
1 1 1 –1 6 14
2 2 2 –2 5 13
4 4 4 –4 3 11
8 1 8 –8 6 7
16 2 1 –16 5 14
32 4 2 –32 3 13
64 1 4 –64 6 11
128 2 8 –128 5 7
256 4 1 –256 3 14
512 1 2 –512 6 13
1024 2 4 –1024 5 11
2048 4 8 –2048 3 7
––––––––––––––––––––––––––––––––––––––––
4096 1 1 –4096 6 14
8192 2 2 –8192 5 13
16,384 4 4 –16,384 3 11
32,768 1 8 –32,768 6 7
––––––––––––––––––––––––––––––––––––––––
Error syndromes for
weight-1 arithmetic
errors in the (7, 15)
biresidue code
Because all the
syndromes in this
table are different,
any weight-1
arithmetic error is
correctable by the
(mod 7, mod 15)
biresidue code

Properties of Biresidue Codes
Biresidue code with relatively prime low-cost check moduli A = 2a – 1
and B = 2b – 1 supports a  b bits of data for weight-1 error correction
Representational redundancy = (a + b)/(ab) = 1/a + 1/b
n k
7 4
15 11
31 26
63 57
127 120
255 247
511 502
1023 1013
Compare with Hamming SEC code
a b n=k+a+b k=ab
3 4 19 12
5 6 41 30
7 8 71 56
11 12 143 120
15 16 271 240

Arithmetic on Biresidue-Coded Operands
Similar to residue-checked arithmetic for addition and multiplication,
except that two residues are involved
Divide/square-root: remains difficult
Arithmetic processor
with biresidue checking
Main
Arithmetic
Processor
Check
Processor
x
y
C(x)
C(y)
z
Compare
mod
C(z)
Error
Indicator
A
D(z)
D(x)
D(y)
s
B
s

Higher-Level Error Coding Methods
We have applied coding to data at the bit-string or word level
It is also possible to apply coding at higher levels
Data structure level – Robust data structures
Application level – Algorithm-based error tolerance

Preview of Algorithm-Based Error Tolerance
2 1 6 1
5 3 4 4
3 2 7 4
Mr =
2 1 6 1
5 3 4 4
3 2 7 4
2 6 1 1
Mf =
2 1 6
5 3 4
3 2 7
M =
2 1 6
5 3 4
3 2 7
2 6 1
Mc =
Matrix M Row checksum matrix
Column checksum matrix Full checksum matrix
Error coding applied to data structures, rather than at the level of atomic
data elements
Example: mod-8
checksums used
for matrices
If Z = X  Y then
Zf = Xc  Yr
In Mf, any single
error is correctable
and any 3 errors
are detectable
Four errors may
go undetected

f33-ft-computing-lec09-correct.ppt

More Related Content

Similar to f33-ft-computing-lec09-correct.ppt (20)

Recently uploaded (20)

f33-ft-computing-lec09-correct.ppt