SlideShare a Scribd company logo
AN LLVM BACKEND
FOR GHC
David Terei & Manuel Chakravarty
University of New South Wales

Haskell Symposium 2010
Baltimore, USA, 30th Sep
What is ‘An LLVM Backend’?
• Low Level Virtual Machine

• Backend Compiler framework

• Designed to be used by compiler developers, not

 by software developers

• Open Source

• Heavily sponsored by Apple
Motivation
• Simplify
  • Reduce ongoing work
  • Outsource!


• Performance
  • Improve run-time
Example
collat :: Int -> Word32 -> Int                          Run-time (ms)
collat c 1 = c                                   2000
collat c n | even n =                            1800
             collat (c+1) $ n `div` 2            1600
           | otherwise =                         1400
             collat (c+1) $ 3 * n + 1
                                                 1200
                                                 1000
pmax x n = x `max` (collat 1 n, n)
                                                 800
                                                 600
main = print $ foldl pmax (1,1)
                                                 400
                [2..1000000]
                                                 200
                                                   0
                                                         LLVM   C   NCG
Different, updated run-times compared to paper
Competitors
         C Backend (C)            Native Code Generator (NCG)

• GNU C Dependency               • Huge amount of work
  • Badly supported on           • Very limited portability
    platforms such as Windows
                                 • Does very little
• Use a Mangler on
                                  optimisation work
  assembly code
• Slow compilation speed
  • Takes twice as long as the
   NCG
GHC’s Compilation Pipeline
                                                     C

   Core            STG             Cmm              NCG

                                                    LLVM
Language, Abstract Machine, Machine Configuration



                          Heap, Stack, Registers
Compiling to LLVM
Won’t be covering, see paper for full details:
• Why from Cmm and not from STG/Core
• LLVM, C-- & Cmm languages
• Dealing with LLVM’s SSA form
• LLVM type system


Will be covering:
• Handling the STG Registers
• Handling GHC’s Table-Next-To-Code optimisation
Handling the STG Registers
                               STG Register    X86 Register
Implement either by:           Base            ebx
• In memory                    Heap Pointer    edi
• Pin to hardware registers    Stack Pointer   ebp
                               R1              esi
NCG?
• Register allocator permanently stores STG registers in
  hardware

C Backend?
• Uses GNU C extension (global register variables) to
  also permanently store STG registers in hardware
Handling the STG Registers
LLVM handles by implementing a new calling convention:
             STG Register    X86 Register
             Base            ebx
             Heap Pointer    edi
             Stack Pointer   ebp
             R1              esi


  define f ghc_cc (Base, Hp, Sp, R1) {
    …
    tail call g ghc_cc (Base, Hp’, Sp’, R1’);
    return void;
  }
Handling the STG Registers
• Issue: If implemented naively then all the STG registers
 have a live range of the entire function.

• Some of the STG registers can never be scratched (e.g
 Sp, Hp…) but many can (e.g R2, R3…).

• We need to somehow tell LLVM when we no longer care
 about an STG register, otherwise it will spill and reload the
 register across calls to C land for example.
Handling the STG Registers
• We handle this by storing undef into the STG register
 when it is no longer needed. We manually scratch them.
define f ghc_cc (Base, Hp, Sp, R1, R2, R3, R4) {
  …
  store undef %R2
  store undef %R3
  store undef %R4
  call c_cc sin(double %f22);
  …
  tail call ghc_cc g(Base,Hp’,Sp’,R1’,R2’,R3’,R4’);
  return void;
}
Handling Tables-Next-To-Code
   Un-optimised Layout        Optimised Layout




                         How to implement in LLVM?
Handling Tables-Next-To-Code
Use GNU Assembler sub-section feature.
• Allows code/data to be put into numbered sub-section
• Sub-sections are appended together in order
• Table in <n>, entry code in <n+1>
                    .text 12
                    sJ8_info:
                        movl ...
                        movl ...
                        jmp ...

                    [...]
                    .text 11
                    sJ8_info_itable:
                        .long ...
                        .long 0
                        .long 327712
Handling Tables-Next-To-Code
        LLVM Mangler                     C Mangler

• 180 lines of Haskell (half   • 2,000 lines of Perl
  is documentation)            • Needed for every platform
• Needed only for OS X
Evaluation: Simplicity
         Code Size         LLVM
25000                      • Half of code is representation of
                             LLVM language
20000
                           C
15000                      • Compiler: 1,100 lines
                           • C Headers: 2,000 lines

10000                      • Perl Mangler: 2,000 lines


 5000                      NCG
                           • Shared component: 8,000 lines
   0                       • Platform specific: 4,000 – 5,000
        LLVM   C     NCG     for X86, SPARC, PowerPC
Evaluation: Performance
                                      Run-time against LLVM (%)
Nofib:                           3
• Egalitarian benchmark
  suite, everything is equal   2.5

• Memory bound, little           2
  room for optimisation
  once at Cmm stage            1.5

                                 1

                               0.5

                                 0
                                           NCG            C
                               -0.5
Repa Performance
                   runtimes (s)
16

14

12

10

 8                                      LLVM
                                        NCG
 6

 4

 2

 0
     Matrix Mult   Laplace        FFT
Compile Times, Object Sizes
      Compile Times Vs         Object Sizes Vs LLVM
           LLVM           0
60                                 NCG         C
                          -2
40
                          -4
20
                          -6
 0
        NCG         C     -8
-20
                         -10
-40
                         -12
-60

-80                      -14
Result
• LLVM Backend is simpler.


• LLVM Backend is as fast or faster.


• LLVM developers now work for GHC!
Get It
• LLVM
   • Our calling convention has been accepted upstream!
   • Included in LLVM since version 2.7


                             http://llvm.org
• GHC
  • In HEAD
  • Should be released in GHC 7.0




• Send me any programs that are slower!
Questions?
Why from Cmm?
A lot less work then from STG/Core

But…

Couldn’t you do a better job from STG/Core?

Doubtful…

Easier to fix any deficiencies in Cmm representation and
code generator
Dealing with SSA
LLVM language is SSA form:
• Each variable can only be assigned to once
• Immutable


How do we handle converting mutable Cmm variables?
• Allocate a stack slot for each Cmm variable
• Use load and stores for reads and writes
• Use ‘mem2reg’ llvm optimisation pass
 • This converts our stack allocation to LLVM variables instead that
   properly obeys the SSA requirement
Type Systems?
LLVM language has a fairly high level type system
• Strings, Arrays, Pointers…


When combined with SSA form, great for development
• 15 bug fixes required after backend finished to get test
  suite to pass
• 10 of those were motivated by type system errors
• Some could have been fairly difficult (e.g returning pointer
  instead of value)

More Related Content

What's hot (20)

PPTX
ONNC - 0.9.1 release
Luba Tang
 
PDF
Lustre Best Practices
George Markomanolis
 
PDF
JVM JIT compilation overview by Vladimir Ivanov
ZeroTurnaround
 
PDF
Getting started with AMD GPUs
George Markomanolis
 
PDF
한컴MDS_Virtual Target Debugging with TRACE32
HANCOM MDS
 
PDF
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
Masashi Shibata
 
PDF
Programming Languages & Tools for Higher Performance & Productivity
Linaro
 
PDF
Porting and Optimization of Numerical Libraries for ARM SVE
Linaro
 
PDF
Evaluating GPU programming Models for the LUMI Supercomputer
George Markomanolis
 
PDF
LCU14 209- LLVM Linux
Linaro
 
PPT
Jvm Performance Tunning
guest1f2740
 
PDF
Moving NEON to 64 bits
Chiou-Nan Chen
 
PDF
eBPF/XDP
Netronome
 
PPT
Qtpvbscripttrainings
Ramu Palanki
 
PDF
Utilizing AMD GPUs: Tuning, programming models, and roadmap
George Markomanolis
 
PDF
RISC-V Linker Relaxation and LLD
Ray Song
 
PDF
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
Andrei Nikolaenko
 
PPTX
Debugging With GNU Debugger GDB
kyaw thiha
 
PDF
New Process/Thread Runtime
Linaro
 
PDF
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Anne Nicolas
 
ONNC - 0.9.1 release
Luba Tang
 
Lustre Best Practices
George Markomanolis
 
JVM JIT compilation overview by Vladimir Ivanov
ZeroTurnaround
 
Getting started with AMD GPUs
George Markomanolis
 
한컴MDS_Virtual Target Debugging with TRACE32
HANCOM MDS
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
Masashi Shibata
 
Programming Languages & Tools for Higher Performance & Productivity
Linaro
 
Porting and Optimization of Numerical Libraries for ARM SVE
Linaro
 
Evaluating GPU programming Models for the LUMI Supercomputer
George Markomanolis
 
LCU14 209- LLVM Linux
Linaro
 
Jvm Performance Tunning
guest1f2740
 
Moving NEON to 64 bits
Chiou-Nan Chen
 
eBPF/XDP
Netronome
 
Qtpvbscripttrainings
Ramu Palanki
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
George Markomanolis
 
RISC-V Linker Relaxation and LLD
Ray Song
 
DBMS benchmarking overview and trends for Moscow ACM SIGMOD Chapter
Andrei Nikolaenko
 
Debugging With GNU Debugger GDB
kyaw thiha
 
New Process/Thread Runtime
Linaro
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Anne Nicolas
 

Similar to Haskell Symposium 2010: An LLVM backend for GHC (20)

PDF
Os Lattner
oscon2007
 
PDF
Memory, Big Data, NoSQL and Virtualization
Bigstep
 
PDF
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
HostedbyConfluent
 
PDF
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
HostedbyConfluent
 
PPTX
AVX512 assembly language in FFmpeg
Kieran Kunhya
 
PPTX
Onnc intro
Luba Tang
 
PPTX
Clr jvm implementation differences
Jean-Philippe BEMPEL
 
PDF
TiDB vs Aurora.pdf
ssuser3fb50b
 
PDF
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 
PDF
Efficient execution of quantized deep learning models a compiler approach
jemin lee
 
PDF
Scale Out Your Graph Across Servers and Clouds with OrientDB
Luca Garulli
 
PPTX
Værktøjer udviklet på AAU til analyse af SCJ programmer
InfinIT - Innovationsnetværket for it
 
PDF
LCA14: LCA14-412: GPGPU on ARM SoC session
Linaro
 
PDF
Code GPU with CUDA - SIMT
Marina Kolpakova
 
PDF
Java Memory Model
Łukasz Koniecki
 
PDF
Porting a Streaming Pipeline from Scala to Rust
Evan Chan
 
PPTX
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
PDF
Five cool ways the JVM can run Apache Spark faster
Tim Ellison
 
PPTX
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Anna Shymchenko
 
Os Lattner
oscon2007
 
Memory, Big Data, NoSQL and Virtualization
Bigstep
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
HostedbyConfluent
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
HostedbyConfluent
 
AVX512 assembly language in FFmpeg
Kieran Kunhya
 
Onnc intro
Luba Tang
 
Clr jvm implementation differences
Jean-Philippe BEMPEL
 
TiDB vs Aurora.pdf
ssuser3fb50b
 
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 
Efficient execution of quantized deep learning models a compiler approach
jemin lee
 
Scale Out Your Graph Across Servers and Clouds with OrientDB
Luca Garulli
 
Værktøjer udviklet på AAU til analyse af SCJ programmer
InfinIT - Innovationsnetværket for it
 
LCA14: LCA14-412: GPGPU on ARM SoC session
Linaro
 
Code GPU with CUDA - SIMT
Marina Kolpakova
 
Java Memory Model
Łukasz Koniecki
 
Porting a Streaming Pipeline from Scala to Rust
Evan Chan
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kevin Lynch
 
Five cool ways the JVM can run Apache Spark faster
Tim Ellison
 
Вячеслав Блинов «Java Garbage Collection: A Performance Impact»
Anna Shymchenko
 
Ad

Recently uploaded (20)

PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Ad

Haskell Symposium 2010: An LLVM backend for GHC

  • 1. AN LLVM BACKEND FOR GHC David Terei & Manuel Chakravarty University of New South Wales Haskell Symposium 2010 Baltimore, USA, 30th Sep
  • 2. What is ‘An LLVM Backend’? • Low Level Virtual Machine • Backend Compiler framework • Designed to be used by compiler developers, not by software developers • Open Source • Heavily sponsored by Apple
  • 3. Motivation • Simplify • Reduce ongoing work • Outsource! • Performance • Improve run-time
  • 4. Example collat :: Int -> Word32 -> Int Run-time (ms) collat c 1 = c 2000 collat c n | even n = 1800 collat (c+1) $ n `div` 2 1600 | otherwise = 1400 collat (c+1) $ 3 * n + 1 1200 1000 pmax x n = x `max` (collat 1 n, n) 800 600 main = print $ foldl pmax (1,1) 400 [2..1000000] 200 0 LLVM C NCG Different, updated run-times compared to paper
  • 5. Competitors C Backend (C) Native Code Generator (NCG) • GNU C Dependency • Huge amount of work • Badly supported on • Very limited portability platforms such as Windows • Does very little • Use a Mangler on optimisation work assembly code • Slow compilation speed • Takes twice as long as the NCG
  • 6. GHC’s Compilation Pipeline C Core STG Cmm NCG LLVM Language, Abstract Machine, Machine Configuration Heap, Stack, Registers
  • 7. Compiling to LLVM Won’t be covering, see paper for full details: • Why from Cmm and not from STG/Core • LLVM, C-- & Cmm languages • Dealing with LLVM’s SSA form • LLVM type system Will be covering: • Handling the STG Registers • Handling GHC’s Table-Next-To-Code optimisation
  • 8. Handling the STG Registers STG Register X86 Register Implement either by: Base ebx • In memory Heap Pointer edi • Pin to hardware registers Stack Pointer ebp R1 esi NCG? • Register allocator permanently stores STG registers in hardware C Backend? • Uses GNU C extension (global register variables) to also permanently store STG registers in hardware
  • 9. Handling the STG Registers LLVM handles by implementing a new calling convention: STG Register X86 Register Base ebx Heap Pointer edi Stack Pointer ebp R1 esi define f ghc_cc (Base, Hp, Sp, R1) { … tail call g ghc_cc (Base, Hp’, Sp’, R1’); return void; }
  • 10. Handling the STG Registers • Issue: If implemented naively then all the STG registers have a live range of the entire function. • Some of the STG registers can never be scratched (e.g Sp, Hp…) but many can (e.g R2, R3…). • We need to somehow tell LLVM when we no longer care about an STG register, otherwise it will spill and reload the register across calls to C land for example.
  • 11. Handling the STG Registers • We handle this by storing undef into the STG register when it is no longer needed. We manually scratch them. define f ghc_cc (Base, Hp, Sp, R1, R2, R3, R4) { … store undef %R2 store undef %R3 store undef %R4 call c_cc sin(double %f22); … tail call ghc_cc g(Base,Hp’,Sp’,R1’,R2’,R3’,R4’); return void; }
  • 12. Handling Tables-Next-To-Code Un-optimised Layout Optimised Layout How to implement in LLVM?
  • 13. Handling Tables-Next-To-Code Use GNU Assembler sub-section feature. • Allows code/data to be put into numbered sub-section • Sub-sections are appended together in order • Table in <n>, entry code in <n+1> .text 12 sJ8_info: movl ... movl ... jmp ... [...] .text 11 sJ8_info_itable: .long ... .long 0 .long 327712
  • 14. Handling Tables-Next-To-Code LLVM Mangler C Mangler • 180 lines of Haskell (half • 2,000 lines of Perl is documentation) • Needed for every platform • Needed only for OS X
  • 15. Evaluation: Simplicity Code Size LLVM 25000 • Half of code is representation of LLVM language 20000 C 15000 • Compiler: 1,100 lines • C Headers: 2,000 lines 10000 • Perl Mangler: 2,000 lines 5000 NCG • Shared component: 8,000 lines 0 • Platform specific: 4,000 – 5,000 LLVM C NCG for X86, SPARC, PowerPC
  • 16. Evaluation: Performance Run-time against LLVM (%) Nofib: 3 • Egalitarian benchmark suite, everything is equal 2.5 • Memory bound, little 2 room for optimisation once at Cmm stage 1.5 1 0.5 0 NCG C -0.5
  • 17. Repa Performance runtimes (s) 16 14 12 10 8 LLVM NCG 6 4 2 0 Matrix Mult Laplace FFT
  • 18. Compile Times, Object Sizes Compile Times Vs Object Sizes Vs LLVM LLVM 0 60 NCG C -2 40 -4 20 -6 0 NCG C -8 -20 -10 -40 -12 -60 -80 -14
  • 19. Result • LLVM Backend is simpler. • LLVM Backend is as fast or faster. • LLVM developers now work for GHC!
  • 20. Get It • LLVM • Our calling convention has been accepted upstream! • Included in LLVM since version 2.7 http://llvm.org • GHC • In HEAD • Should be released in GHC 7.0 • Send me any programs that are slower!
  • 22. Why from Cmm? A lot less work then from STG/Core But… Couldn’t you do a better job from STG/Core? Doubtful… Easier to fix any deficiencies in Cmm representation and code generator
  • 23. Dealing with SSA LLVM language is SSA form: • Each variable can only be assigned to once • Immutable How do we handle converting mutable Cmm variables? • Allocate a stack slot for each Cmm variable • Use load and stores for reads and writes • Use ‘mem2reg’ llvm optimisation pass • This converts our stack allocation to LLVM variables instead that properly obeys the SSA requirement
  • 24. Type Systems? LLVM language has a fairly high level type system • Strings, Arrays, Pointers… When combined with SSA form, great for development • 15 bug fixes required after backend finished to get test suite to pass • 10 of those were motivated by type system errors • Some could have been fairly difficult (e.g returning pointer instead of value)