This is an open-source, online tutorial that provides an end-to-end introduction to MLIR, demonstrating how deep learning models can be lowered from a machine learning framework to executable binaries. Aimed at newcomers and university students, the tutorial focuses on conveying the core concepts of the MLIR compilation flow rather than on performance optimization. Using torch-mlir, upstream MLIR passes, and a custom lowering pass targeting OpenBLAS, the talk illustrates how to build and extend a practical ML compilation pipeline targeting both CPU and NVIDIA GPU backends. The tutorial is designed to lower the barrier to entry and broaden participation in the MLIR community. From a newcomer to other newcomers, so to speak.
The repo provides some PyTorch models (small sample models and real world models) and provides the following:
- Import models from PyTorch and HF to MLIR using torch-mlir
- Use existing MLIR passes to lower from entry level IR's (linalg, arith, ...) to llvm ir
- Create the corrsponding object file
- Call the model via a function call in C++
Additionally, I created a pass that converts linalg.matmul's to OpenBLAS matrix-multiplication function calls. Further, I target an Nvidia GPU (sm_90) to launch specific kenrels on the GPU (This currently only works for sample and the mnist model).
Warning: Some instructions are RWTH cluster specific, e.g. paths.
- Introduction
- Getting started and project setup
- Importing PyTorch models to torch-mlir
- Lowering models to x86 machine code
- Integration of OpenBLAS for Matrix Multiplications
- Targeting an Nvidia GPU
Appendix: An overview of IREE
See Chapter 2 in the docs.
- Official MLIR website
- MLIR for Beginners by Jeremy Kun
- MLIR tutorial with GPU compilation by Stephen Diehl