SlideShare a Scribd company logo
The hardest thing
in computer science
Hard things
Docker Caching
Dependency versions
Install dependencies
[ 20 minutes or so ]
Only here copy all sources
Intended behaviour
● No change:
docker is not rebuilt - LIGHTNING FAST!!!!
● Sources change/dependencies not:
only sources are added - QUITE FAST !!!
● Dependencies change:
dependencies installed, sources - LITTLE SLOWER !!
Actual behaviour
same machine - local checkout
● Local docker registry
● Repeated build: 1:06m
● Only sources: 1:30m
● Dependencies: 11m
● Whole build: ~ 30m
CI case
● Always fresh machine
○ no code
○ no registry
● Git clone/checkout
● Build
● Wipeout
Docker registry to the rescue!
Build cache:
● Docker build
● Docker push airflow/airflow:latest
Use cache:
● Docker pull airflow/airflow:latest
● docker build --cache-from ariflow/airflow:latest
Actual behaviour
Docker Hub automated build
● DockerHub docker registry as cache
● Repeated build: 11m
● Only sources: 11m <- Still OK
● Dependencies: ~1h
● Whole build: ~ 2h
Using the cache in Travis CI
● Docker Hub builds are slow
● Travis or Cloud Build use earlier image
with --cache-from
● But only sources change most of the
time
Caching in Docker - the hardest thing in computer science
Actual BAD behaviour
Travis CI automated build
● Build on Travis with cache from DockerHub
● Repeated build: 11m
● Only sources: 1 h <-
● Dependencies: 1h
● Whole build: ~ 2h
Caching in Docker - the hardest thing in computer science
Problem no 1
Git & permissions
● git clone file creation:
○ local user
○ default user’s group
● file/dir permissions (rwxs)
○ preserves user, group and other rx permissions files & dirs
○ does not store w and by default uses umask when cloning by default
○ core.sharedRepository git-config
■ one of: group(true), all, umask(false), 0xxx
● Umask WTF:
○ file: 644 (DockeHub) vs. 664 (Travis CI)
○ dir: 755 (DockerHub) vs. 775 (Travis CI)
Solution to problem 1
Fix group permissions
Problem no 2
Generated files
● not only .gitignore
● generated files
○ autoapi - documentation
○ build artifacts
○ npm cache
○ .pyc files
○ files created accidentally (wget in source folder anyone?)
● COPY .
● Context calculated based on ALL files
● .dockerignore != .gitignore
● slightly different syntax
Solution to problem 2
Set .dockerignore ** by default
Problem no. 3
● Download & compile ALL dependencies takes time!
Partial solution to problem 3
Find the weakest link
Solution to problem 3
a) build image with wheels
Solution to problem 3
b) Copy directory via multi-stage
Docker builds
Solution 3
c) install using wheels
Caching in Docker - the hardest thing in computer science
Thank You!
You can add some info where to follow you,
or add information about
polidea.com/blog

More Related Content

What's hot (20)

PPTX
Introduction to Docker Compose
Prabhas Gupte
 
PDF
Docker Athens: Docker Engine Evolution & Containerd Use Cases
Phil Estes
 
PDF
RancherOS - The perfect place to run Docker
Saputro Aryulianto
 
PDF
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Erica Windisch
 
PPTX
Ansible as a better shell script
Takuya Nishimoto
 
PDF
[Szjug] Docker. Does it matter for java developer?
Izzet Mustafaiev
 
PDF
Let's Count Bytes! Launching Ruby in 32K of RAM
Amoniac OÜ
 
PDF
Containers: What are they, Really?
Sneha Inguva
 
PDF
CRI Runtimes Deep-Dive: Who's Running My Pod!?
Phil Estes
 
PDF
Clustering Docker with Docker Swarm on openSUSE
Saputro Aryulianto
 
PDF
EC2 Storage for Docker 150526b
Clinton Kitson
 
PDF
CoreOS + Kubernetes @ All Things Open 2015
Brandon Philips
 
PDF
It's 2018. Are My Containers Secure Yet!?
Phil Estes
 
PDF
Upstate DevOps - Containers 101 - March 28, 2019
Allen Vailliencourt
 
ODP
Docker. Micro services for lazy developers
Eugene Krevenets
 
PPTX
Datacenter Airlift - "Docker and the world of “containerized" environments"
Pedro Sousa
 
PDF
2 docker engine_hands_on
FEG
 
PPTX
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Gosuke Miyashita
 
PDF
Docker tutorial2
Tatsuya Yagi
 
PPTX
Dockerin10mins
Dawood M.S
 
Introduction to Docker Compose
Prabhas Gupte
 
Docker Athens: Docker Engine Evolution & Containerd Use Cases
Phil Estes
 
RancherOS - The perfect place to run Docker
Saputro Aryulianto
 
Docker for Developers: Dev, Test, Deploy @ BucksCo Devops at MeetMe HQ
Erica Windisch
 
Ansible as a better shell script
Takuya Nishimoto
 
[Szjug] Docker. Does it matter for java developer?
Izzet Mustafaiev
 
Let's Count Bytes! Launching Ruby in 32K of RAM
Amoniac OÜ
 
Containers: What are they, Really?
Sneha Inguva
 
CRI Runtimes Deep-Dive: Who's Running My Pod!?
Phil Estes
 
Clustering Docker with Docker Swarm on openSUSE
Saputro Aryulianto
 
EC2 Storage for Docker 150526b
Clinton Kitson
 
CoreOS + Kubernetes @ All Things Open 2015
Brandon Philips
 
It's 2018. Are My Containers Secure Yet!?
Phil Estes
 
Upstate DevOps - Containers 101 - March 28, 2019
Allen Vailliencourt
 
Docker. Micro services for lazy developers
Eugene Krevenets
 
Datacenter Airlift - "Docker and the world of “containerized" environments"
Pedro Sousa
 
2 docker engine_hands_on
FEG
 
Inside Sqale's Backend at Sapporo Ruby Kaigi 2012
Gosuke Miyashita
 
Docker tutorial2
Tatsuya Yagi
 
Dockerin10mins
Dawood M.S
 

Similar to Caching in Docker - the hardest thing in computer science (20)

PDF
Использование Docker в CI / Александр Акбашев (HERE Technologies)
Ontico
 
PDF
Optimizing Docker Images
Brian DeHamer
 
PPTX
Effective images remix
🎥 Brent Langston
 
PDF
Be a better developer with Docker (revision 3)
Nicola Paolucci
 
PPTX
Build optimization mechanisms in GitLab and Docker
Dmytro Patkovskyi
 
PDF
Docker in Continuous Integration
Alexander Akbashev
 
PDF
ContainerDays Boston 2015: "Continuous Delivery with Containers" (Nick Gauthier)
DynamicInfraDays
 
PPTX
Настройка окружения для кросскомпиляции проектов на основе docker'a
corehard_by
 
PDF
Optimizing Your CI Pipelines
Sebastian Witowski
 
PDF
5 Things I Wish I Knew About Gitlab CI
Sebastian Witowski
 
PDF
Docker Introduction + what is new in 0.9
Jérôme Petazzoni
 
PDF
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Jérôme Petazzoni
 
PDF
Docker and friends at Linux Days 2014 in Prague
tomasbart
 
PDF
LXC to Docker Via Continuous Delivery
Docker, Inc.
 
PPT
Docker - a lot changed in a year
Chris Swan
 
PDF
Things I've learned working with Docker Support
Sujay Pillai
 
PPTX
Scaling Development Environments with Docker
Docker, Inc.
 
PDF
Docker primer and tips
Samuel Chow
 
PDF
Scale Big With Docker — Moboom 2014
Jérôme Petazzoni
 
Использование Docker в CI / Александр Акбашев (HERE Technologies)
Ontico
 
Optimizing Docker Images
Brian DeHamer
 
Effective images remix
🎥 Brent Langston
 
Be a better developer with Docker (revision 3)
Nicola Paolucci
 
Build optimization mechanisms in GitLab and Docker
Dmytro Patkovskyi
 
Docker in Continuous Integration
Alexander Akbashev
 
ContainerDays Boston 2015: "Continuous Delivery with Containers" (Nick Gauthier)
DynamicInfraDays
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
corehard_by
 
Optimizing Your CI Pipelines
Sebastian Witowski
 
5 Things I Wish I Knew About Gitlab CI
Sebastian Witowski
 
Docker Introduction + what is new in 0.9
Jérôme Petazzoni
 
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
Jérôme Petazzoni
 
Docker and friends at Linux Days 2014 in Prague
tomasbart
 
LXC to Docker Via Continuous Delivery
Docker, Inc.
 
Docker - a lot changed in a year
Chris Swan
 
Things I've learned working with Docker Support
Sujay Pillai
 
Scaling Development Environments with Docker
Docker, Inc.
 
Docker primer and tips
Samuel Chow
 
Scale Big With Docker — Moboom 2014
Jérôme Petazzoni
 
Ad

More from Jarek Potiuk (11)

PDF
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
Jarek Potiuk
 
PDF
Subtle Differences between Python versions
Jarek Potiuk
 
PDF
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Jarek Potiuk
 
PDF
Off time - how to use social media to be more out of social media
Jarek Potiuk
 
PDF
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Jarek Potiuk
 
PDF
Berlin Apache Con EU Airflow Workshops
Jarek Potiuk
 
PDF
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Jarek Potiuk
 
PDF
Ci for android OS
Jarek Potiuk
 
PDF
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
Jarek Potiuk
 
PPTX
It's a Breeze to develop Airflow (Cloud Native Warsaw)
Jarek Potiuk
 
PPTX
React native introduction (Mobile Warsaw)
Jarek Potiuk
 
What's Coming in Apache Airflow 2.0 - PyDataWarsaw 2019
Jarek Potiuk
 
Subtle Differences between Python versions
Jarek Potiuk
 
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Jarek Potiuk
 
Off time - how to use social media to be more out of social media
Jarek Potiuk
 
It's a Breeze to develop Apache Airflow (London Apache Airflow meetup)
Jarek Potiuk
 
Berlin Apache Con EU Airflow Workshops
Jarek Potiuk
 
Manageable data pipelines with airflow (and kubernetes) november 27, 11 45 ...
Jarek Potiuk
 
Ci for android OS
Jarek Potiuk
 
It's a Breeze to develop Apache Airflow (Apache Con Berlin)
Jarek Potiuk
 
It's a Breeze to develop Airflow (Cloud Native Warsaw)
Jarek Potiuk
 
React native introduction (Mobile Warsaw)
Jarek Potiuk
 
Ad

Recently uploaded (20)

PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Designing Production-Ready AI Agents
Kunal Rai
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
July Patch Tuesday
Ivanti
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Designing Production-Ready AI Agents
Kunal Rai
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 

Caching in Docker - the hardest thing in computer science

  • 1. The hardest thing in computer science
  • 3. Docker Caching Dependency versions Install dependencies [ 20 minutes or so ] Only here copy all sources
  • 4. Intended behaviour ● No change: docker is not rebuilt - LIGHTNING FAST!!!! ● Sources change/dependencies not: only sources are added - QUITE FAST !!! ● Dependencies change: dependencies installed, sources - LITTLE SLOWER !!
  • 5. Actual behaviour same machine - local checkout ● Local docker registry ● Repeated build: 1:06m ● Only sources: 1:30m ● Dependencies: 11m ● Whole build: ~ 30m
  • 6. CI case ● Always fresh machine ○ no code ○ no registry ● Git clone/checkout ● Build ● Wipeout
  • 7. Docker registry to the rescue! Build cache: ● Docker build ● Docker push airflow/airflow:latest Use cache: ● Docker pull airflow/airflow:latest ● docker build --cache-from ariflow/airflow:latest
  • 8. Actual behaviour Docker Hub automated build ● DockerHub docker registry as cache ● Repeated build: 11m ● Only sources: 11m <- Still OK ● Dependencies: ~1h ● Whole build: ~ 2h
  • 9. Using the cache in Travis CI ● Docker Hub builds are slow ● Travis or Cloud Build use earlier image with --cache-from ● But only sources change most of the time
  • 11. Actual BAD behaviour Travis CI automated build ● Build on Travis with cache from DockerHub ● Repeated build: 11m ● Only sources: 1 h <- ● Dependencies: 1h ● Whole build: ~ 2h
  • 13. Problem no 1 Git & permissions ● git clone file creation: ○ local user ○ default user’s group ● file/dir permissions (rwxs) ○ preserves user, group and other rx permissions files & dirs ○ does not store w and by default uses umask when cloning by default ○ core.sharedRepository git-config ■ one of: group(true), all, umask(false), 0xxx ● Umask WTF: ○ file: 644 (DockeHub) vs. 664 (Travis CI) ○ dir: 755 (DockerHub) vs. 775 (Travis CI)
  • 14. Solution to problem 1 Fix group permissions
  • 15. Problem no 2 Generated files ● not only .gitignore ● generated files ○ autoapi - documentation ○ build artifacts ○ npm cache ○ .pyc files ○ files created accidentally (wget in source folder anyone?) ● COPY . ● Context calculated based on ALL files ● .dockerignore != .gitignore ● slightly different syntax
  • 16. Solution to problem 2 Set .dockerignore ** by default
  • 17. Problem no. 3 ● Download & compile ALL dependencies takes time!
  • 18. Partial solution to problem 3 Find the weakest link
  • 19. Solution to problem 3 a) build image with wheels
  • 20. Solution to problem 3 b) Copy directory via multi-stage Docker builds
  • 21. Solution 3 c) install using wheels
  • 23. Thank You! You can add some info where to follow you, or add information about polidea.com/blog