SlideShare a Scribd company logo
Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica Ltd. Visiting Professor, Cass Business School
A Personal Story - Background Trained as electronic engineer, but seduced by software Working first in industry, then academia Building hardware and software to support fast user interfaces Software:  silicon compiler, parallel graphics algorithms  Hardware:  microcoded, SIMD, MIMD and ASIC processors 1990, Oxford academic – ‘road to Damscus’ experience Saw my first FPGA – and the future! All previous threads came together - simultaneously HLLs, regular architectures, algorithms in hardware, parallelism, real-time, design automation, communications, hardware o/s, program algebra, …
A Personal Story – A pattern emerges I had been trying for many years to build complex algorithms (graphics and highly interactive user interfaces) into hardware I tried: User micro-coding Massively parallel, SIMD array processing Custom designed silicon MIMD networks of transputers All were short-term successes, but long-term failures - I hadn’t realised that what I was mostly doing was fighting Moore’s Law None of these hardware platforms that I built or used stayed around long enough to be a stable platform The largest investment - in the software – was written off each time Moore’s Law made yet another architecture redundant
Moore’s Law – just a reminder A reminder of what an amazing industry we are embedded in A doubling of transistor count every two years First published 1965 and it's still driving the industry It still has many more years to run It is completely pervasive.  Nothing  escapes its influence The Opportunity: 4,000 transistors per circuit in 1970 1 billion transistors by 2005  $1/transistor in 1968 to $1/50 million transistors today The Problems: Rock's Law - foundries double in cost each generation A 300mm foundry costs $3 Billion (Intel pushing for 450mm)‏ A 65nm mask set is around $3m Somebody has to design these chips
A Personal Story – What does it all mean? Moore’s Law continues to force entry ticket prices up and ever greater integration and to  reduce  the number of different chip solutions available What are tomorrow’s commodity chips? FPGAs will be around for decades 10 6  LUTs available soon I see FPGA fabric as the world’s first, truly stable, parallel processing substrate  (though the ‘grid’ may be some sort of competition)‏ 1990 – believing that FPGAs change the  nature  of the game, an act of faith “One day, most hardware designs will be done through programming languages and FPGAs”  And the research question was: “ what do we have to do to make it come true?”
The Design Problem – statistics of failure 18%  of all projects are cancelled within 5 months* 58%  are late to market* 20%  of products are not within 50% of specification* 15%   of deep sub-micron designs require up to four re-spins Of the products that do get to market: On time and 50% over budget  earn only  4%  less profit over 5 years † 6  months late and on budget  earn  33%  less profit over 5 years † Every  4  weeks delay in product launch equals  14%  loss in market share‡ *  Source :  Current and Emerging Embedded Markets and Opportunities †  Source: McKinsey & Co.   ‡  Source: John Chambers, CEO Cisco
Moore’s Law : Chip complexity grows at over 40% CAGR (Compound Annual Growth Rate).  Designer productivity has historically grown at 21% CAGR* The difference is the  Design Gap It is the gap between what you   can   design (with fixed resources)‏ and what you   must   design (to stay in business)‏ The Design Gap increases by around  20% CAGR The Design Problem – The Design Gap * Source: Gartner Group
Rapidly increasing complexity is the root of the problem The only practical way to handle complexity is to raise the level of design abstraction  We are guided by previous shifts in hardware design methodology which raised the level of abstraction: - from schematics to HDLs - from assembler code to HLLs The Design Problem - Complexity
Handel-C solution: treat hardware like software Exploit the massive leverage created by the software industry A rapid and simple flow from program to implementation Compile/P&R, run, edit – in minutes, just like with software Hardware and software development use same methodology Hardware development in less time with a smaller  team  Enables hardware development by system architects and software engineers as well as hardware engineers; these skills all converge This might be the  only  design option for really complex designs
Choosing a Programming Language  Hardware implementations need  efficiently  to use both  time  and  space  (= parallelism)‏ Q:  Why not compile ordinary C++/C programs into hardware? A:  Nobody knows how to write a compiler that efficiently and reliably invents the parallelism that the designer didn’t specify Conclusion:  We require a language that allows (forces) the designer explicitly to denote the  parallelism  required in the computation Q:  Why not use a language such as occam, Java, …? A:  Nobody knows how to write a compiler that efficiently and reliably invents the timing specifications that the designer didn’t specify  Conclusion:  We require a language that allows (forces) the designer explicitly to denote the  time  that computations take These might  appear  to be two backwards steps – but NO!
The Handel Solution No existing language met the basic requirements, so the Handel model of programming was created Handel-C is the embedding of the Handel model in C language Handel-C is a language for  programming   applications Handel-C is  not   an HDL. Nor is it C used as an HDL Handel-C is meaningful to both s/w and h/w engineers Handel-C is exceptionally easy to learn and use The  par  command gives control over  space The  single clock assignment   rule gives control over  time
Handel-C in brief Handel-C is based on ANSI-C It has well-defined semantics Similar to occam in spirit, but adding timing and replacing pseudo-parallelism with true parallelism Other additions: channels for communications between parallel processes flexible bit-widths and better logical operators constructs for RAM, ROM, interfacing, etc.
Handel-C Example  A Windowed Display System par { sync_generator (sx, sy);  // process 1 while (1)   // process 2 if inside (window1, sx, sy)‏ video = contents (window1, sx, sy)‏ else if inside (window2, sx, sy)‏ video = contents (window2, sx, sy)‏ else video = background_colour; while (1) … mouse; update window1, 2 …  // process 3 }
Our first FPGA Platform – HARP, 1991 FPGA + SRAM Transputer + DRAM Four fast serial links for expansion Physically stackable (TRAM) module for arbitrary expansion I confidently predicted that Xilinx and Altera would be building things like this as single chips by 1995!
SW HW
Company ‘E’  : Redesign of a Failing Project A team of 2 software engineers developed core component of IPv6 router in 2 man-months using Handel-C  Team of 3 hardware engineers failed to produce the design using VHDL in over 36 man-months Handel-C Design 33 MHz 15% V1000 FPGA 20 Pages Code V HDL  Design Design Not Completed >100% V1000 FPGA >400 Pages Actual Months 0 5 10 15 IPv6 Router  Code
Company ‘L’  : Algorithm Acceleration Trial A team of 2 software engineers (with no previous HW experience) transferred an algorithm from a CPU to an FPGA Run-time was 21 seconds on a 600MHz Pentium III 23 times performance improvement after 42 man-days Signal Processing Algorithm > 700 s 0.9 s 28 s 16 s Company Training Session 600 MHz CPU Algorithm Run-time (seconds)‏ Man-days 0 10   40 700 30 20 10   0
Customer ‘C’  : Internal Design Competition Competition to design MP3 encoder between: Traditional hardware design team using HDL-based approach and Small group of software designers using Celoxica technology Handel-C group  Converted existing software implementation of MP3 encoder to Handel-C Optimized, working hardware that beat design specifications in 7 weeks (including training time)‏ In the same time, the hardware group had not completed writing the specification!
Xilinx Design Challenge A Xilinx-specified “Design Challenge” To implement JPEG2000 using conventional HDL and Handel-C approaches Comparison made between Handel-C and HDL approach See Article in Xcell Volume 46 Online at  www.xilinx.com/publications/xcellonline/xcell_46/xc_celoxica46.htm
JPEG2000 Architecture and Communication Model Pre processing RGB to YUV conversion Quantisation Tier-2 Encoder Rate Control Original Image Coded Image DWT- Wavelet Transform Tier-1 Encoder Hardware models Software models
Xilinx project benchmark to validate FPGA system tools Start with C description of JPEG2000 algorithm  Use Software-Compiled System Design methodology Partition and Implement JPEG2000 Design Compare results against original VHDL design performance JPEG2000 Project overview Top level block diagram for JPEG2000 operation Pre processing RGB to YUV conversion Wavelet Transform Quantisation Tier-1 Encoder Tier-2 Encoder Rate Control Original Image Coded Image
JPEG2000 Case Study results DK Design Suite 1 st  pass Slices 646 Device utilization    6% Speed (MHz)* 110 Lines of code 386 Design time (days)  6   Rapid Handel-C (HC) implementation by an engineer with  no prior knowledge  of JPEG2000. Primary design focus was  area efficiency . Common language base made easy porting to hardware of the DWT source & DSM allowed partition, co verification & data to be easily moved between HW & SW Optimizations included using signals instead of registers, maximum use of dual ported memory & reduction in routing logic by syntax duplication in Handel-C.  Place & Route tools configured to optimize the implementation for area efficiency Final implementation  integrated existing HDL IP block into the design flow for maximum design re-use value (black boxing)‏ Observations Comparable HC faster HC quicker Expert vs Novice HDL   800 7% 128 435 20* *  Doesn’t include partitioning spec. development  2 nd  pass 546 5% 130 395 7 (6+1)‏   Final 758* 7% 151 395 7 (6+1)‏   *Lena image used as test-bench throughout input bit width=12, max 1K image width *  Includes IP Block Insertion
Does it work?  -  Demonstrations RC100 Board:  Single Xilinx XC2S200 FPGA 28 x 42 = 1176 CLBs (2352 LUTs)‏ Flash memory with stored configurations PLD to reload the FPGA from the flash memory Digital/Analogue converter to create video signal All demos fit in 1200 CLBs – some in under 500 A few of them use external memory No computer. No software. No operating system Cheapest FPGAs: over 340 LUTs/$ (Oct08, one-off price)‏
Solutions for Algorithm Design Algorithm acceleration Rapid Prototyping SW & FPGA Implementation Technologies for Algorithm to Implementation MATLAB to C C to FPGA System Prototyping Boards IP Libraries Implementation Services Over 100 customers worldwide Shortening the time to develop and deploy complex  image processing systems Agility Design Solutions
Proven Customer Success Lockheed Hubble  Telescope Canon PowerShot Digital Camera Toyota Prius Hybrid Aeroastro Vision  Recognition Harris Satellite Communications Raytheon Airborne  Systems & NLOS
Thank You Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica Ltd. Visiting Professor, Cass Business School

More Related Content

What's hot (20)

DOC
Resume
Paul Barcelona
 
DOC
V.S.VamsiKrishna
vamsisvk
 
PDF
[Dec./2017] My Personal/Professional Journey after Graduate Univ.
Hayoung Yoon
 
DOC
Maheswara reddy 10+years_avionics
maheswarareddy pr
 
DOC
KeithJohnston06212015
Keith Johnston
 
DOC
SaiKumarGurram_Resume
Sai Kumar Gurram
 
DOC
KISHORE_RESUME_Ver2.0
Kishore Karanam K
 
DOC
Richlong2013Modified
richtx
 
DOCX
santhosh popshetwar
Santhosh Kumar Popshetwar
 
DOCX
Resume_PraveenKumar
Praveen Kumar
 
DOC
Resume
sudeshna roy
 
DOC
Resume
Sanjay Kumar
 
PDF
Larson and toubro
anoopc1998
 
PDF
Del Sozzo's talk @ ICCD17
NECST Lab @ Politecnico di Milano
 
PDF
Jay_Vicory_Resume_2018
Jay Vicory
 
PDF
Space Codesign ARM Tech Symposium Japan 20141027
Gary Dare
 
PDF
Space Codesign at TandemLaunch 20150414
Space Codesign
 
DOC
Muruganandam_7years
muruganandam nallathambi
 
DOCX
Resume-Zhuyu
雨 朱
 
DOCX
Kannan_Resume
Kannan Mahalingam
 
V.S.VamsiKrishna
vamsisvk
 
[Dec./2017] My Personal/Professional Journey after Graduate Univ.
Hayoung Yoon
 
Maheswara reddy 10+years_avionics
maheswarareddy pr
 
KeithJohnston06212015
Keith Johnston
 
SaiKumarGurram_Resume
Sai Kumar Gurram
 
KISHORE_RESUME_Ver2.0
Kishore Karanam K
 
Richlong2013Modified
richtx
 
santhosh popshetwar
Santhosh Kumar Popshetwar
 
Resume_PraveenKumar
Praveen Kumar
 
Resume
sudeshna roy
 
Resume
Sanjay Kumar
 
Larson and toubro
anoopc1998
 
Del Sozzo's talk @ ICCD17
NECST Lab @ Politecnico di Milano
 
Jay_Vicory_Resume_2018
Jay Vicory
 
Space Codesign ARM Tech Symposium Japan 20141027
Gary Dare
 
Space Codesign at TandemLaunch 20150414
Space Codesign
 
Muruganandam_7years
muruganandam nallathambi
 
Resume-Zhuyu
雨 朱
 
Kannan_Resume
Kannan Mahalingam
 

Similar to Computing Without Computers - Oct08 (20)

PPTX
Who Is This Guy?
Chili.CHIPS
 
PPT
Ecd302 unit 01(investigate ecad systems)
Xi Qiu
 
PDF
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
enriquealbabaena6868
 
DOC
verification resume
vishwanath swamy
 
PPTX
Embedded system
ashraf eltholth
 
DOCX
Wonho Park_20151209
Wonho Park
 
PDF
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
CEE-SEC(R)
 
PDF
OliverStoneSWResume2015-05
Oliver Stone
 
DOCX
Bindu_Resume
HIMABINDU CHITRAPU
 
PDF
HIS 2017 Mark Batty-Industrial concurrency specification for C/C++
jamieayre
 
PPT
FPGA_prototyping proccesing with conclusion
PersiPersi1
 
DOC
Ankit sarin
sarinsahab
 
DOC
4+yr Hardware Design Engineer_Richa
Richa Verma
 
PPTX
soc design for dsp applications
P V Krishna Mohan Gupta
 
PPT
Software Factories in the Real World: How an IBM WebSphere Integration Factor...
ghodgkinson
 
PDF
Applications of Fuzzy Logic in Image Processing – A Brief Study
Computer Science Journals
 
PDF
Lean Model-Driven Development through Model-Interpretation: the CPAL design ...
Nicolas Navet
 
PPT
A Software Factory Integrating Rational & WebSphere Tools
ghodgkinson
 
DOC
Prasad_CTP
Prasad Bhat
 
Who Is This Guy?
Chili.CHIPS
 
Ecd302 unit 01(investigate ecad systems)
Xi Qiu
 
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
enriquealbabaena6868
 
verification resume
vishwanath swamy
 
Embedded system
ashraf eltholth
 
Wonho Park_20151209
Wonho Park
 
Массовый параллелизм для гетерогенных вычислений на C++ для беспилотных автом...
CEE-SEC(R)
 
OliverStoneSWResume2015-05
Oliver Stone
 
Bindu_Resume
HIMABINDU CHITRAPU
 
HIS 2017 Mark Batty-Industrial concurrency specification for C/C++
jamieayre
 
FPGA_prototyping proccesing with conclusion
PersiPersi1
 
Ankit sarin
sarinsahab
 
4+yr Hardware Design Engineer_Richa
Richa Verma
 
soc design for dsp applications
P V Krishna Mohan Gupta
 
Software Factories in the Real World: How an IBM WebSphere Integration Factor...
ghodgkinson
 
Applications of Fuzzy Logic in Image Processing – A Brief Study
Computer Science Journals
 
Lean Model-Driven Development through Model-Interpretation: the CPAL design ...
Nicolas Navet
 
A Software Factory Integrating Rational & WebSphere Tools
ghodgkinson
 
Prasad_CTP
Prasad Bhat
 
Ad

Recently uploaded (20)

PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Ad

Computing Without Computers - Oct08

  • 1. Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica Ltd. Visiting Professor, Cass Business School
  • 2. A Personal Story - Background Trained as electronic engineer, but seduced by software Working first in industry, then academia Building hardware and software to support fast user interfaces Software: silicon compiler, parallel graphics algorithms Hardware: microcoded, SIMD, MIMD and ASIC processors 1990, Oxford academic – ‘road to Damscus’ experience Saw my first FPGA – and the future! All previous threads came together - simultaneously HLLs, regular architectures, algorithms in hardware, parallelism, real-time, design automation, communications, hardware o/s, program algebra, …
  • 3. A Personal Story – A pattern emerges I had been trying for many years to build complex algorithms (graphics and highly interactive user interfaces) into hardware I tried: User micro-coding Massively parallel, SIMD array processing Custom designed silicon MIMD networks of transputers All were short-term successes, but long-term failures - I hadn’t realised that what I was mostly doing was fighting Moore’s Law None of these hardware platforms that I built or used stayed around long enough to be a stable platform The largest investment - in the software – was written off each time Moore’s Law made yet another architecture redundant
  • 4. Moore’s Law – just a reminder A reminder of what an amazing industry we are embedded in A doubling of transistor count every two years First published 1965 and it's still driving the industry It still has many more years to run It is completely pervasive. Nothing escapes its influence The Opportunity: 4,000 transistors per circuit in 1970 1 billion transistors by 2005 $1/transistor in 1968 to $1/50 million transistors today The Problems: Rock's Law - foundries double in cost each generation A 300mm foundry costs $3 Billion (Intel pushing for 450mm)‏ A 65nm mask set is around $3m Somebody has to design these chips
  • 5. A Personal Story – What does it all mean? Moore’s Law continues to force entry ticket prices up and ever greater integration and to reduce the number of different chip solutions available What are tomorrow’s commodity chips? FPGAs will be around for decades 10 6 LUTs available soon I see FPGA fabric as the world’s first, truly stable, parallel processing substrate (though the ‘grid’ may be some sort of competition)‏ 1990 – believing that FPGAs change the nature of the game, an act of faith “One day, most hardware designs will be done through programming languages and FPGAs” And the research question was: “ what do we have to do to make it come true?”
  • 6. The Design Problem – statistics of failure 18% of all projects are cancelled within 5 months* 58% are late to market* 20% of products are not within 50% of specification* 15% of deep sub-micron designs require up to four re-spins Of the products that do get to market: On time and 50% over budget earn only 4% less profit over 5 years † 6 months late and on budget earn 33% less profit over 5 years † Every 4 weeks delay in product launch equals 14% loss in market share‡ * Source : Current and Emerging Embedded Markets and Opportunities † Source: McKinsey & Co. ‡ Source: John Chambers, CEO Cisco
  • 7. Moore’s Law : Chip complexity grows at over 40% CAGR (Compound Annual Growth Rate). Designer productivity has historically grown at 21% CAGR* The difference is the Design Gap It is the gap between what you can design (with fixed resources)‏ and what you must design (to stay in business)‏ The Design Gap increases by around 20% CAGR The Design Problem – The Design Gap * Source: Gartner Group
  • 8. Rapidly increasing complexity is the root of the problem The only practical way to handle complexity is to raise the level of design abstraction We are guided by previous shifts in hardware design methodology which raised the level of abstraction: - from schematics to HDLs - from assembler code to HLLs The Design Problem - Complexity
  • 9. Handel-C solution: treat hardware like software Exploit the massive leverage created by the software industry A rapid and simple flow from program to implementation Compile/P&R, run, edit – in minutes, just like with software Hardware and software development use same methodology Hardware development in less time with a smaller team Enables hardware development by system architects and software engineers as well as hardware engineers; these skills all converge This might be the only design option for really complex designs
  • 10. Choosing a Programming Language Hardware implementations need efficiently to use both time and space (= parallelism)‏ Q: Why not compile ordinary C++/C programs into hardware? A: Nobody knows how to write a compiler that efficiently and reliably invents the parallelism that the designer didn’t specify Conclusion: We require a language that allows (forces) the designer explicitly to denote the parallelism required in the computation Q: Why not use a language such as occam, Java, …? A: Nobody knows how to write a compiler that efficiently and reliably invents the timing specifications that the designer didn’t specify Conclusion: We require a language that allows (forces) the designer explicitly to denote the time that computations take These might appear to be two backwards steps – but NO!
  • 11. The Handel Solution No existing language met the basic requirements, so the Handel model of programming was created Handel-C is the embedding of the Handel model in C language Handel-C is a language for programming applications Handel-C is not an HDL. Nor is it C used as an HDL Handel-C is meaningful to both s/w and h/w engineers Handel-C is exceptionally easy to learn and use The par command gives control over space The single clock assignment rule gives control over time
  • 12. Handel-C in brief Handel-C is based on ANSI-C It has well-defined semantics Similar to occam in spirit, but adding timing and replacing pseudo-parallelism with true parallelism Other additions: channels for communications between parallel processes flexible bit-widths and better logical operators constructs for RAM, ROM, interfacing, etc.
  • 13. Handel-C Example A Windowed Display System par { sync_generator (sx, sy); // process 1 while (1) // process 2 if inside (window1, sx, sy)‏ video = contents (window1, sx, sy)‏ else if inside (window2, sx, sy)‏ video = contents (window2, sx, sy)‏ else video = background_colour; while (1) … mouse; update window1, 2 … // process 3 }
  • 14. Our first FPGA Platform – HARP, 1991 FPGA + SRAM Transputer + DRAM Four fast serial links for expansion Physically stackable (TRAM) module for arbitrary expansion I confidently predicted that Xilinx and Altera would be building things like this as single chips by 1995!
  • 15. SW HW
  • 16. Company ‘E’ : Redesign of a Failing Project A team of 2 software engineers developed core component of IPv6 router in 2 man-months using Handel-C Team of 3 hardware engineers failed to produce the design using VHDL in over 36 man-months Handel-C Design 33 MHz 15% V1000 FPGA 20 Pages Code V HDL Design Design Not Completed >100% V1000 FPGA >400 Pages Actual Months 0 5 10 15 IPv6 Router Code
  • 17. Company ‘L’ : Algorithm Acceleration Trial A team of 2 software engineers (with no previous HW experience) transferred an algorithm from a CPU to an FPGA Run-time was 21 seconds on a 600MHz Pentium III 23 times performance improvement after 42 man-days Signal Processing Algorithm > 700 s 0.9 s 28 s 16 s Company Training Session 600 MHz CPU Algorithm Run-time (seconds)‏ Man-days 0 10 40 700 30 20 10 0
  • 18. Customer ‘C’ : Internal Design Competition Competition to design MP3 encoder between: Traditional hardware design team using HDL-based approach and Small group of software designers using Celoxica technology Handel-C group Converted existing software implementation of MP3 encoder to Handel-C Optimized, working hardware that beat design specifications in 7 weeks (including training time)‏ In the same time, the hardware group had not completed writing the specification!
  • 19. Xilinx Design Challenge A Xilinx-specified “Design Challenge” To implement JPEG2000 using conventional HDL and Handel-C approaches Comparison made between Handel-C and HDL approach See Article in Xcell Volume 46 Online at www.xilinx.com/publications/xcellonline/xcell_46/xc_celoxica46.htm
  • 20. JPEG2000 Architecture and Communication Model Pre processing RGB to YUV conversion Quantisation Tier-2 Encoder Rate Control Original Image Coded Image DWT- Wavelet Transform Tier-1 Encoder Hardware models Software models
  • 21. Xilinx project benchmark to validate FPGA system tools Start with C description of JPEG2000 algorithm Use Software-Compiled System Design methodology Partition and Implement JPEG2000 Design Compare results against original VHDL design performance JPEG2000 Project overview Top level block diagram for JPEG2000 operation Pre processing RGB to YUV conversion Wavelet Transform Quantisation Tier-1 Encoder Tier-2 Encoder Rate Control Original Image Coded Image
  • 22. JPEG2000 Case Study results DK Design Suite 1 st pass Slices 646 Device utilization 6% Speed (MHz)* 110 Lines of code 386 Design time (days) 6 Rapid Handel-C (HC) implementation by an engineer with no prior knowledge of JPEG2000. Primary design focus was area efficiency . Common language base made easy porting to hardware of the DWT source & DSM allowed partition, co verification & data to be easily moved between HW & SW Optimizations included using signals instead of registers, maximum use of dual ported memory & reduction in routing logic by syntax duplication in Handel-C. Place & Route tools configured to optimize the implementation for area efficiency Final implementation integrated existing HDL IP block into the design flow for maximum design re-use value (black boxing)‏ Observations Comparable HC faster HC quicker Expert vs Novice HDL 800 7% 128 435 20* * Doesn’t include partitioning spec. development 2 nd pass 546 5% 130 395 7 (6+1)‏ Final 758* 7% 151 395 7 (6+1)‏ *Lena image used as test-bench throughout input bit width=12, max 1K image width * Includes IP Block Insertion
  • 23. Does it work? - Demonstrations RC100 Board: Single Xilinx XC2S200 FPGA 28 x 42 = 1176 CLBs (2352 LUTs)‏ Flash memory with stored configurations PLD to reload the FPGA from the flash memory Digital/Analogue converter to create video signal All demos fit in 1200 CLBs – some in under 500 A few of them use external memory No computer. No software. No operating system Cheapest FPGAs: over 340 LUTs/$ (Oct08, one-off price)‏
  • 24. Solutions for Algorithm Design Algorithm acceleration Rapid Prototyping SW & FPGA Implementation Technologies for Algorithm to Implementation MATLAB to C C to FPGA System Prototyping Boards IP Libraries Implementation Services Over 100 customers worldwide Shortening the time to develop and deploy complex image processing systems Agility Design Solutions
  • 25. Proven Customer Success Lockheed Hubble Telescope Canon PowerShot Digital Camera Toyota Prius Hybrid Aeroastro Vision Recognition Harris Satellite Communications Raytheon Airborne Systems & NLOS
  • 26. Thank You Computing Without Computers Ian Page Business Development Director, Seven Spires Investments Founder, Celoxica Ltd. Visiting Professor, Cass Business School