Copyright 2015 QuEST Forum. All Rights Reserved.
1
The action against Soft-errors
to prevent service outages
NTT Network Service Systems Laboratories
Hidenori Iwashita
2015 APAC QuEST Forum APAC Best Practices Conference
April 2015
Agenda
2
1. Soft error problems
Laboratory non-reproducible errors
Silent errors
2. Soft error mechanisms
Soft errors are caused by cosmic rays
3. The increase of soft errors
With miniaturization of LSI design rules, soft errors are
increasing rapidly
4. Practices
Soft error test using a compact accelerator neutron source
5. Results
6. Conclusion
NTT can reduce service outages and failure recovery costs due
to soft errors.
1. Soft error problems
Laboratory non-reproducible errors
3
Network System
Network operations center
① Error
② Alarm
Manufacturer factory
③ Return
④ Tests
⑤ Test OK
1. Soft error problems
Silent errors
4
Network System
Network operations center① User complaint
I can’t connect! • Not alarmed
• Fault node
unknown
Prolonged
Significant failure
 Press release
(Newspaper, TV)
5
SunSupernova explosion
Earth
Cosmic rays
(High energy particles)
Neutron
Nuclei (O or N)陽子
High energy particles
Destruction
Nuclear reactions in the atmosphere
Proton
Muon
π-meson
2. Soft error mechanisms
Neutrons generated by cosmic rays
6
2. Soft error mechanisms
Nuclear reactions in the device
Soft error
(Bit error)
Secondary ions
Silicon nuclei陽子
Destruction
NeutronNetwork System
Neutrons
3. The increase of soft errors
7
Miniaturization of LSI design rule
(Highly integrated)
Soft errors increase
Current,
At ground level
Past,
Only in space or the sky
3. The increase of soft errors
How often do soft errors occur ?
8
FPGA
SRAM
The FPGA contains large capacity SRAM.
Without soft error mitigation you got more than
10000 FIT.
E.g.
Since SRAMs have less critical charge (are more
sensitive), soft errors occur more frequently.
SRAM
×1000 units in network
FPGA×6
About 1.5 devices per day fail
4. Practices
9
Developing and applying soft error countermeasures
4. Practices
Step 1. Specifying requirements
10
Planned network scale
E.g.
1000 units on the network
Specify requirements
E.g.
1 failure per month
on the network
⇒ about 1300FIT / unit
4. Practices
Step 2. Simulating soft errors
11
Device Design
rule
[nm]
Size
[Mb]
Soft error
rate
[FIT]
CPU SRAM 65 2 200
FPGA SRAM 28 100 10000
ASIC SRAM 90 2 150
DRAM ① 40 500 10
DRAM ② 40 500 10
DRAM ③ 40 500 10
DRAM ④ 40 500 10
SRAM ① 65 10 1000
SRAM ② 65 1 100
SRAM ③ 65 10 1000
SRAM ④ 65 2 200
SRAM ⑤ 65 10 1000
Flash Mem 90 50 50
Substrate
FPGA ASIC
CPU
SRAM
SRAMSRAMSRAMSRAMSRAM
DRAM
DRAM
DRAM
DRAM
Flash
Memory
SRAM
SRAM
E.g.
We simulate high soft error rates in devices.
High
High
High
High
4. Practices
Step 3. Apply soft error countermeasures
12
(1) Reducing
soft errors
(2) Protection from
soft errors
(3) Recovery from
soft errors
Devices with low soft
error rates
Using memory devices
with error correction
functions such as ECC*.
*Error Correction Code
Systems automatically
restart or overwrite if a
soft error occurs.
Selecting the appropriate soft error countermeasures to suit
functions
MRAM
Special
device
Low
spec
High
cost
1 bit correction
2 bit detection
2 bit correction
3 bit detection
Low
cost
High
cost
Firmware Low cost
ASIC Long-term
development
4. Practices
Step 4. Soft error tests with real products
13
We developed soft error testing technology using Hokkaido
University’s compact accelerator-driven neutron source.
Hokkaido University’s compact
accelerator-driven neutron source
14
4. Practices
Step 4. Soft error tests with real products
5. Results
15
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Comparison of neutron soft error rates
FPGA based device
ASIC based device
w/o ECC function
w/ ECC function
w/o auto recovery function
w/ auto recovery function
We measured the device to confirm the soft error rate reduction using
the accelerator neutron source.
On the real network, the number of soft errors largely decreased.
80% reduction
90% reduction
80% reduction
6. Conclusion
16
We successfully reproduced soft errors using a compact
accelerator-driven neutron source.
We were able to investigate soft error tolerance, and check
the fault detection process and the process of switching to a
backup network system.
We conclude that NTT can reduce service outages and
failure recovery costs due to soft errors.
Message
17
Have you ever experience troubles with unknown
causes on your network ?
It might be caused with soft errors !
Soft errors is able to deal with !
We hope all of the carriers and manufacturers of
the world to be freed from this problems !
Special thanks:
18
Fujitsu, Ltd.
Hitachi, Ltd.
NEC corp.

More Related Content

PDF
[iROC Webinar] Do I Need to Worry About Soft Errors?
PPTX
PPTX
Introduction to TakeCharge on-chip ESD solutions from Sofics
PDF
ST on 96Boards OpenHours - System level ESD protection
PDF
Solid state batteries for industrial IoT and Medtech
PPT
Installation Steps of Cable Jointing Kit
PDF
Automotive Electrostatic Discharge Case Study
PDF
Cataloge schneider rcbo schneider-dienhathe.vn
[iROC Webinar] Do I Need to Worry About Soft Errors?
Introduction to TakeCharge on-chip ESD solutions from Sofics
ST on 96Boards OpenHours - System level ESD protection
Solid state batteries for industrial IoT and Medtech
Installation Steps of Cable Jointing Kit
Automotive Electrostatic Discharge Case Study
Cataloge schneider rcbo schneider-dienhathe.vn

What's hot (13)

PDF
Kā izvēlēties augstas efektivitātes Saules baterijas? Kvalitāte un testēšana.
PDF
Electrostatic Discharge (ESD) Protection for a Laser Diode Ignited Actuator
PPTX
Radiation Hardening by Design
PDF
AS-6M30 PERC Module Specification
PDF
Introduction of Transmission Line Pulse (TLP) Testing for ESD Analysis - De...
PDF
CompEx Certificate
PPT
Laser security system
PDF
ESDEMC CDE IEEE Symposium 2014
PDF
On-Chip Solutions for ESD/EOS/Latch up/EMC
PDF
1310nm sld
PDF
IRJET- A Review on Wireless Sensor System of Fault Detection of Motor Arrays
PPT
Ldb mapperò di prinzio 04
PDF
Fiber cable --where to use & why
Kā izvēlēties augstas efektivitātes Saules baterijas? Kvalitāte un testēšana.
Electrostatic Discharge (ESD) Protection for a Laser Diode Ignited Actuator
Radiation Hardening by Design
AS-6M30 PERC Module Specification
Introduction of Transmission Line Pulse (TLP) Testing for ESD Analysis - De...
CompEx Certificate
Laser security system
ESDEMC CDE IEEE Symposium 2014
On-Chip Solutions for ESD/EOS/Latch up/EMC
1310nm sld
IRJET- A Review on Wireless Sensor System of Fault Detection of Motor Arrays
Ldb mapperò di prinzio 04
Fiber cable --where to use & why
Ad

Similar to The Action Against Soft-Errors to Prevent Service Outage (20)

PDF
Soft Error Study of ARM SoC at 28 Nanometers
PDF
2012A8PS309P_AbhishekKumar_FinalReport
PDF
An Efficient Approach Towards Mitigating Soft Errors Risks
PDF
How fpgas work when they don't
PDF
safety_critical_applications_and_customer_concerns
PPTX
Detecting soft errors by a purely software approach
PPTX
Detecting soft errors by a purely software approach
PDF
MITIGATION OF SOFT ERRORS ON 65NM COMBINATIONAL LOGIC GATES VIA BUFFER GATE
PPTX
01 Silicon Diagnosis survey by Swetha
PDF
MATRIX CODE BASED MULTIPLE ERROR CORRECTION TECHNIQUE FOR N-BIT MEMORY DATA
PPT
14911259.ppt
DOCX
ROUGH DOC.437
PDF
slides_tese_v3_paraAnexoTesis
PDF
Estudio de la robustez frente a SEUs de algoritmos auto-convergentes
PDF
Chris Frost Presentation (may 27th 2014)
PDF
Deep Explaination of STA_setupandholdchecks
PDF
Fault Detection in Mobile Communication Networks Using Data Mining Techniques...
PDF
Qualifying a high performance memory subsysten for Functional Safety
PDF
Mentor graphics minimizing customer returns - new
PPTX
Webinar on Functional Safety Analysis using Model-based System Analysis
Soft Error Study of ARM SoC at 28 Nanometers
2012A8PS309P_AbhishekKumar_FinalReport
An Efficient Approach Towards Mitigating Soft Errors Risks
How fpgas work when they don't
safety_critical_applications_and_customer_concerns
Detecting soft errors by a purely software approach
Detecting soft errors by a purely software approach
MITIGATION OF SOFT ERRORS ON 65NM COMBINATIONAL LOGIC GATES VIA BUFFER GATE
01 Silicon Diagnosis survey by Swetha
MATRIX CODE BASED MULTIPLE ERROR CORRECTION TECHNIQUE FOR N-BIT MEMORY DATA
14911259.ppt
ROUGH DOC.437
slides_tese_v3_paraAnexoTesis
Estudio de la robustez frente a SEUs de algoritmos auto-convergentes
Chris Frost Presentation (may 27th 2014)
Deep Explaination of STA_setupandholdchecks
Fault Detection in Mobile Communication Networks Using Data Mining Techniques...
Qualifying a high performance memory subsysten for Functional Safety
Mentor graphics minimizing customer returns - new
Webinar on Functional Safety Analysis using Model-based System Analysis
Ad

More from QuEST Forum (20)

PDF
QuEST Forum TL 9000 R6.0 Requirements & ISO 9001:2015
PDF
Networked Society - Story to be Continued
PDF
Achieving Best-in-Class Customer Experience through Effective Product Launch
PDF
Kudos Aristotle: Using Ethos, Logos & Pathos to Improve the Xilinx Customer E...
PDF
Sustainability Thinking Pays Off; New Framework Drives Game-Changing Ideas
PDF
Increasing Revenue Through Improved Customer Experience
PDF
From the Clean Room to the Great Outdoors
PDF
KPI Team Journey
PDF
Continuous Multilayer Protection: Operationalizing a Security Framework
PDF
Customer Delight Created by Co-Operation between Supplier and Operator
PDF
Network Quality and Customer Experience
PDF
Driving Networks Forward to the Hyper-Connected World
PDF
Automotive Services and Communications Technologies, a Brief Look into the Fu...
PDF
Conquering the Cost of Poor Quality
PDF
TL 9000 Measurements and Requirements Interactive Workshop
PDF
Integrated Continuous Improvements Ecosystem
PDF
Adoption & Reinforcement - Applying TL 9000 in R&D Businesses
PDF
7 Key Elements for Operation Quality Improvement
PDF
Zero Defect Initiative - Quality Index Generator
PDF
Being Agile with Assured Quality
QuEST Forum TL 9000 R6.0 Requirements & ISO 9001:2015
Networked Society - Story to be Continued
Achieving Best-in-Class Customer Experience through Effective Product Launch
Kudos Aristotle: Using Ethos, Logos & Pathos to Improve the Xilinx Customer E...
Sustainability Thinking Pays Off; New Framework Drives Game-Changing Ideas
Increasing Revenue Through Improved Customer Experience
From the Clean Room to the Great Outdoors
KPI Team Journey
Continuous Multilayer Protection: Operationalizing a Security Framework
Customer Delight Created by Co-Operation between Supplier and Operator
Network Quality and Customer Experience
Driving Networks Forward to the Hyper-Connected World
Automotive Services and Communications Technologies, a Brief Look into the Fu...
Conquering the Cost of Poor Quality
TL 9000 Measurements and Requirements Interactive Workshop
Integrated Continuous Improvements Ecosystem
Adoption & Reinforcement - Applying TL 9000 in R&D Businesses
7 Key Elements for Operation Quality Improvement
Zero Defect Initiative - Quality Index Generator
Being Agile with Assured Quality

Recently uploaded (20)

PDF
Five Habits of High-Impact Board Members
PDF
Unlock new opportunities with location data.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Architecture types and enterprise applications.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
DOCX
search engine optimization ppt fir known well about this
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPT
What is a Computer? Input Devices /output devices
PDF
Hybrid model detection and classification of lung cancer
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
Five Habits of High-Impact Board Members
Unlock new opportunities with location data.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Enhancing emotion recognition model for a student engagement use case through...
Architecture types and enterprise applications.pdf
Zenith AI: Advanced Artificial Intelligence
Benefits of Physical activity for teenagers.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
CloudStack 4.21: First Look Webinar slides
NewMind AI Weekly Chronicles – August ’25 Week III
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
search engine optimization ppt fir known well about this
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
WOOl fibre morphology and structure.pdf for textiles
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
What is a Computer? Input Devices /output devices
Hybrid model detection and classification of lung cancer
1 - Historical Antecedents, Social Consideration.pdf
observCloud-Native Containerability and monitoring.pptx
Final SEM Unit 1 for mit wpu at pune .pptx

The Action Against Soft-Errors to Prevent Service Outage

  • 1. Copyright 2015 QuEST Forum. All Rights Reserved. 1 The action against Soft-errors to prevent service outages NTT Network Service Systems Laboratories Hidenori Iwashita 2015 APAC QuEST Forum APAC Best Practices Conference April 2015
  • 2. Agenda 2 1. Soft error problems Laboratory non-reproducible errors Silent errors 2. Soft error mechanisms Soft errors are caused by cosmic rays 3. The increase of soft errors With miniaturization of LSI design rules, soft errors are increasing rapidly 4. Practices Soft error test using a compact accelerator neutron source 5. Results 6. Conclusion NTT can reduce service outages and failure recovery costs due to soft errors.
  • 3. 1. Soft error problems Laboratory non-reproducible errors 3 Network System Network operations center ① Error ② Alarm Manufacturer factory ③ Return ④ Tests ⑤ Test OK
  • 4. 1. Soft error problems Silent errors 4 Network System Network operations center① User complaint I can’t connect! • Not alarmed • Fault node unknown Prolonged Significant failure  Press release (Newspaper, TV)
  • 5. 5 SunSupernova explosion Earth Cosmic rays (High energy particles) Neutron Nuclei (O or N)陽子 High energy particles Destruction Nuclear reactions in the atmosphere Proton Muon π-meson 2. Soft error mechanisms Neutrons generated by cosmic rays
  • 6. 6 2. Soft error mechanisms Nuclear reactions in the device Soft error (Bit error) Secondary ions Silicon nuclei陽子 Destruction NeutronNetwork System Neutrons
  • 7. 3. The increase of soft errors 7 Miniaturization of LSI design rule (Highly integrated) Soft errors increase Current, At ground level Past, Only in space or the sky
  • 8. 3. The increase of soft errors How often do soft errors occur ? 8 FPGA SRAM The FPGA contains large capacity SRAM. Without soft error mitigation you got more than 10000 FIT. E.g. Since SRAMs have less critical charge (are more sensitive), soft errors occur more frequently. SRAM ×1000 units in network FPGA×6 About 1.5 devices per day fail
  • 9. 4. Practices 9 Developing and applying soft error countermeasures
  • 10. 4. Practices Step 1. Specifying requirements 10 Planned network scale E.g. 1000 units on the network Specify requirements E.g. 1 failure per month on the network ⇒ about 1300FIT / unit
  • 11. 4. Practices Step 2. Simulating soft errors 11 Device Design rule [nm] Size [Mb] Soft error rate [FIT] CPU SRAM 65 2 200 FPGA SRAM 28 100 10000 ASIC SRAM 90 2 150 DRAM ① 40 500 10 DRAM ② 40 500 10 DRAM ③ 40 500 10 DRAM ④ 40 500 10 SRAM ① 65 10 1000 SRAM ② 65 1 100 SRAM ③ 65 10 1000 SRAM ④ 65 2 200 SRAM ⑤ 65 10 1000 Flash Mem 90 50 50 Substrate FPGA ASIC CPU SRAM SRAMSRAMSRAMSRAMSRAM DRAM DRAM DRAM DRAM Flash Memory SRAM SRAM E.g. We simulate high soft error rates in devices. High High High High
  • 12. 4. Practices Step 3. Apply soft error countermeasures 12 (1) Reducing soft errors (2) Protection from soft errors (3) Recovery from soft errors Devices with low soft error rates Using memory devices with error correction functions such as ECC*. *Error Correction Code Systems automatically restart or overwrite if a soft error occurs. Selecting the appropriate soft error countermeasures to suit functions MRAM Special device Low spec High cost 1 bit correction 2 bit detection 2 bit correction 3 bit detection Low cost High cost Firmware Low cost ASIC Long-term development
  • 13. 4. Practices Step 4. Soft error tests with real products 13 We developed soft error testing technology using Hokkaido University’s compact accelerator-driven neutron source. Hokkaido University’s compact accelerator-driven neutron source
  • 14. 14 4. Practices Step 4. Soft error tests with real products
  • 15. 5. Results 15 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Comparison of neutron soft error rates FPGA based device ASIC based device w/o ECC function w/ ECC function w/o auto recovery function w/ auto recovery function We measured the device to confirm the soft error rate reduction using the accelerator neutron source. On the real network, the number of soft errors largely decreased. 80% reduction 90% reduction 80% reduction
  • 16. 6. Conclusion 16 We successfully reproduced soft errors using a compact accelerator-driven neutron source. We were able to investigate soft error tolerance, and check the fault detection process and the process of switching to a backup network system. We conclude that NTT can reduce service outages and failure recovery costs due to soft errors.
  • 17. Message 17 Have you ever experience troubles with unknown causes on your network ? It might be caused with soft errors ! Soft errors is able to deal with ! We hope all of the carriers and manufacturers of the world to be freed from this problems !