SlideShare a Scribd company logo
www.scling.com
Secure software supply chain
on a shoestring budget
Lars Albertsson, Founder, Scling
Jfokus, 2022-05-04
1
www.scling.com
Losing battles
2
https://www.carbonbrief.org/unep-1-5c-climate-target-slipping-out-of-reach
https://www.idea.int/gsod-indices/faqs
"I am here to bring you the bad news,
which is that we are not winning. We are
really losing this battle [on security]."
- Vinton Cerf
www.scling.com
What do we contribute?
● Internet, digitalisation + many good little things
● Ability to measure and manipulate populations at scale
● Monetising bad security
○ Stolen CPU cycles → money
○ Ransomware
3
https://spinbackup.com/blog/24-biggest-ransomware-attacks-in-2019/
https://blog.chainalysis.com/reports/2022-crypto-crime-report-preview-ransomware/
https://www.theguardian.com/news/2018/mar/17/ca
mbridge-analytica-facebook-influence-us-election
www.scling.com
vs
Risk-management rarely wins
Employees have conflicting definitions of success
Security vs productivity
4
Revenue-generation
Features
Delivery speed
Security reviews
Pentests
Password reauthentication
Phishing campaigns
Firewalls
…
www.scling.com
A simple recipe for application security:
- While we value items on the right, we value items on the left more.
- Invent alternatives that are aligned with speed
- Give employees aligned definitions of success
Security AND productivity
5
SSO
Password managers
Infrastructure as code
Hardware MFA
Ephemeral containers
…
Security reviews
Pentests
Password reauthentication
Phishing campaigns
Firewalls
…
www.scling.com
Quality expectations 1995-2002 Quality expectations 2022
We have been here before
6
https://www.cnet.com/culture/windows-may-crash-after-49-7-days/
www.scling.com
Quality and ops
7
Aligning quality with speed
TDD
Continuous
delivery
Agile
Dev-friendly
ops tooling
Test
automation
XP
Cross-functional
teams
DevOps
Trunk-based
Continuous
integration
Containers
www.scling.com
● Scaled processes
● Machine tools
● Challenges: scale,
logistics, legal,
organisation, faults, ...
Manual, mechanised, industrialised
8
● Muscle-powered
● Few tools
● Human touch for every
step
● Direct human control
● Machine tools
● Low investment, direct
return
www.scling.com
IT craft to factory
9
Security Waterfall
Application
delivery
Traditional
operations
Traditional
QA
Infrastructure
DevSecOps Agile
Containers
DevOps CI/CD
Infrastructure
as code
www.scling.com
● Toyota: Low defect rates AND high margins per vehicle
● State of DevOps report: High reliability AND high deployment rate
○ We have industrialised software engineering
Quality, speed - choose two
10
Quality
vs
Speed
Quality
AND
Speed
1000x span in
availability metrics
www.scling.com
Themes of good presentations, IMHO
● We have seen lots of X / X from a different angle. Here are some patterns.
● We have context Y. Here is how we work.
● We did a thing Z. Here is what we learnt.
11
We need to share how we work
in order to make faster progress.
www.scling.com
Security Waterfall
Data factories
12
Application
delivery
Traditional
operations
DevSecOps
Traditional
QA
Infrastructure
DB-oriented
architecture
Agile
Containers
DevOps CI/CD
Infrastructure
as code
Data factories,
data pipelines,
DataOps
www.scling.com
Data industrialisation
13
DW
~10 year capability gap
"data factory engineering"
Enterprise big data failures
"Modern data stack" -
traditional workflows, new technology
4GL / UML phase of data engineering
Data engineering education
www.scling.com
How data leaders work
14
Data processed offline
Online
Data factory
Data platform & lake
data
Data
innovation &
functionality
100+K daily
datasets
30% staff
BigQuery daily
users
Value from data!
www.scling.com
Scling - data-factory-as-a-service
15
Data value through collaboration
Customer
Data factory
Data platform & lake
data
domain
expertise
Value from data!
Rapid data
innovation
Learning by doing,
in collaboration
www.scling.com
Efficiency is sacred
● Productivity is our unique selling point
○ Client value from data is unpredictable
○ Clients don't know what they want
○ Quick experiments & pivot
● Minimal operational overhead
○ Pipelines / person
○ Datasets / day / person
● Nothing must undermine our USP
16
www.scling.com
Our security strategy
● Invest where it improves productivity
○ Cloud single sign on
○ Cloud identity management
○ Workload identities over secret tokens
○ Hardware multifactor authentication
○ Infrastructure as code
○ Patch management *
● Homogeneity over autonomy
○ Few technologies
○ Few processes
○ Processes encoded in code *
17
● Minimal attack surface *
● Strict asset management
○ Digital assets as code
○ Process to align assets with code
○ Explicit manual asset management
● Lean on Google
www.scling.com
Minimising attack surfaces
● Few ecosystems
○ Ubuntu
○ Scala + Spark
○ Python
● Few components
○ Reuse over perfect match
● Few versions
○ Single version per third party component
○ Opens gates to dependency hell *
■ Control or autonomous cells
18
www.scling.com
Our supply chain
● Google cloud
○ Kubernetes, GCS, Cloud SQL, …
● Virtual machine images
○ Ubuntu, Google
● Container base images
○ Ubuntu, phusion, MySQL, …
● Apt packages
● SaaS
○ Google, Atlassian, Gitlab
19
● Scala (+ other JVM)
○ Maven central
● Python
○ Pypi
● Direct downloads
○ URL + checksum
● Bazel plugins
○ URL + checksum
● Developer devices
○ Ubuntu, MacOS, Android, iOS
www.scling.com
Which version?
● Version specifications
○ Exact version
■ Good for application stability
○ Range
○ Latest
■ Good for patch latency
● Specification choice tradeoffs
○ Provider trust
○ Patch latency
20
● Upgrade tradeoffs
○ Vulnerability patching
○ Rogue code
○ Bugs fixed
○ Bugs introduced
○ Necessary work
● Our goal:
○ Exact version
○ Transitive dependencies locked
○ Automatically updated
● Let's pursue!
www.scling.com
Levels of up to date
● No new version of A exists
● New A version exists. Application verified ok with upgrade.
● New A version exists. Unclear whether upgrade breaks application.
● New A version exists. Upgrade breaks application.
○ We use a deprecated API.
○ New version has bug.
● New A version exists. Upgrade breaks dependency B.
○ New version of B exists.
○ No new version of B exists.
○ A and B must atomically upgrade
21
www.scling.com
A bot friendly task
● There is some order that moves us forward through hell
● Slow trial and error cycle
○ Compile or test takes minutes
● There are bots
○ Dependabot, Scala steward
■ Way too complex (100/20 KLOC, 1000s lines of doc / examples)
○ Do not cover our needs
■ Application correctness
■ Our ecosystems
22
www.scling.com
With a strong process
● we can reason and automate
○ Trial and error forward
● Process strength
○ Faulty change is detected before prod
○ Non-code changes unlikely to affect correctness
○ Self-bootstrapping
23
www.scling.com
Strong process challenges
● Everything not covered by tests
● Test infrastructure / setup defined by code
○ How to test?
○ How to bootstrap?
● Indeterministic processes / components
○ Mostly deterministic is ok
24
Extended test suite:
● Testsuite bootstrap
● Continuous deployment testsuite
● Non-production functionality
○ Dev tooling
○ Web
○ …
www.scling.com
Our build process
● Monorepo + trunk-based
○ Platforms + all client code and pipelines
○ Single version of platform
● All tests verified* for every change
○ Tests do not require cloud resources
● Build + test speed challenging
○ Spark → seconds upstart time → slow tests
● Simple recipe for speed:
○ Avoid doing things → caching
○ Do things in parallel
25
www.scling.com
Bazel
● Designed for monorepos & strong process
○ Lazy tree evaluation
○ Isolated sandboxes
● Unmatched performance features
○ Isolation → reliable caching
○ Test result caching
○ Remote caching
○ Parallelism
○ Remote execution
26
● Great for stuff used by Google
● Catching up on
○ Docker
○ Scala
○ Third-party dependencies
www.scling.com
Dependency version control
● Transitive, locked
○ Python
○ JVM
○ Lock files in version control
● Not transitive, locked
○ Direct downloads
○ Bazel plugins
○ Container base images
○ version.bzl file
■ → bazel, python, bash
27
● Apt packages
○ Latest*
● Some Google components
○ VM base images, misc
○ Latest
● Employee devices
○ Manual
● Unmanaged leftovers
○ SaaS
○ Otherwise minimal exposure
www.scling.com
bazel-deps
28
dependencies.yaml
workspace.bzl
www.scling.com
pip-tools
29
requirements.in
requirements.txt
BUILD.bazel
bootstrap tooling
www.scling.com
pip-compile, build time: bazel-deps, run time:
Python vs JVM dependency failure
30
www.scling.com
Bazel & containers
31
{scala,py}_binary
base image
files / tars
{scala,py}_image
container_run_and_commit_layer
Weak determinism
Apt, files only
Distroless tools
install_pkgs
www.scling.com
Can we make apt install deterministic?
● apt-get typically provides latest
○ Determined by Packages.gz
○ Download during build breaks determinism & caching?
● Distroless bazel package_manager:
○ Exact Packages.gz specification
○ Debian: Versioned Packages.gz
○ Ubuntu: Only latest Packages.gz
● Compromise on determinism
○ Download Packages.gz before build
○ Caching still ok
● Not running apt scripts seemed to work. For a while.
○ Subtle low-level container failures
○ Abandoned
32
www.scling.com
● Single unified platform
○ Monorepo + trunk-based process
○ Separate instance per client
○ All test suites run on every change
● Factories are adapted to constraints and important properties
○ Ok: Security, risk, quality, availability, compliance
○ No: Preferred technology, work processes
Scling collaboration models
33
Refinement factory
● Raw data in
● Valuable data out
● Non-technical clients
● "Easy" domain
Joint factory
● Hybrid teams
● Domain experts
● Data apprentices
● Scling runs data platform
Client factory
● Start as joint factory
● Goal: Client independent
www.scling.com
Divided, multi-tenant platform
34
Orion
base data platform
GCP (but portable to other clouds)
Isolated
client
instance
Isolated
client
instance
Isolated
client
instance Saturn
non-essential
operational tooling
ion CLI tool
scli CLI tool
www.scling.com
Client exit scenario
35
Orion
base data platform
Client cloud choice
Isolated
client
instance
Client monitoring,
logging, identity, etc
ion CLI tool
www.scling.com
Multiphase build bootstrap
36
Ubuntu
some python
docker
benderbot
python 3.x.y
JVM
bazel
py deps
ion
gcloud
kubectl
scli
hugo
orion/bin/tool.py
versions.bzl
requirements.txt
● Images cached based on
content
● Caches shared
www.scling.com
Benderbot
● Lazy bot that takes the easy way out
○ Dumb solutions over smart
● Find Guess next versions
○ 404 not found? Quick failure.
● Mimic developer actions
○ Upgrade source
○ Rerun bazel-deps / pip-compile
○ Run build bootstrap, test suite, dev tooling check
○ Look at logs to classify problem
○ Update checksum if necessary
○ Create merge request on success
37
● Isolated environment
○ Separate region
○ No internal network access
○ Gitlab + logging bucket credentials
○ Cheap spot instance + NVMe
www.scling.com
● Months of evening hacking
○ = weeks full time
Benderbot components / efforts
38
benderbot.py
< 1000 LOC
Statistics
data pipelines
Reporting dashboard
tool.py
few LOCs, brittle
Classification
data pipeline
Reevaluation journey:
● dash + plotly
● bokeh + bokeh
● streamlit + bokeh
www.scling.com
Benderbot reports
39
www.scling.com
Resolution classifications
● No new version of A exists
● New A version exists. Application verified ok with upgrade.
● New A version exists. Unclear whether upgrade breaks application.
● New A version exists. Upgrade breaks application.
○ We use a deprecated API.
○ New version has bug.
● New A version exists. Upgrade breaks dependency B.
○ New version of B exists.
○ No new version of B exists.
○ A and B must atomically upgrade
40
not found
test failure
success
test failure
test failure
test failure
transient
transient
transient
transient
www.scling.com
Our most productive developer
~500 MRs
41
www.scling.com
Benderbot stats - resolutions
42
www.scling.com
Benderbot stats - resolutions
43
More
hardware
Process
flakiness
Speculative
execution
www.scling.com
Resolutions by kind
44
Total
Other
JVM
Python
www.scling.com
Last resolution by dependency
45
Total
Other
JVM
Python
www.scling.com
Time between scans
46
www.scling.com
Google SLSA evaluation
● Supply-chain Levels for Software Artifacts
○ Maturity model
● SLSA 1: yes
● SLSA 2: yes
● SLSA 3: some
○ Prioritising speed over Ephemeral Environment,
Isolated, Non-Falsifiable
● SLSA 4: some
○ Parameterless
○ Dependencies complete (except apt)
47
www.scling.com
Concluding remarks
● Challenges?
○ Operational tuning to balance rate vs €
○ Google cloud_sql_proxy patch update took us down
○ Diva dependencies need custom solutions
○ Which test failure to address?
● Future?
○ Upgrade conditional on container scanning?
○ Dead dependency detection?
● Open source? No.
○ Specific to our environment
○ Bot is easy. Just do it.
○ Strong process challenging. But rewarding.
○ Offer: A copy of the code for a C-level lunch date. :-)
48
www.scling.com
Resources
https://trunkbaseddevelopment.com/
https://reproducible-builds.org/
https://www.scling.com/presentations/
49

More Related Content

PDF
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
PDF
Holistic data application quality
Lars Albertsson
 
PDF
Data ops in practice - Swedish style
Lars Albertsson
 
PDF
Open Source Secret Sauce - Lugor Sep 2011
Ted Husted
 
ODP
Path dependent-development (PyCon India)
ncoghlan_dev
 
PDF
Stop Sucking at Building Stuff!
Puppet
 
PPTX
Continuous delivery applied
Mike McGarr
 
PDF
Buytaert kris tools
kuchinskaya
 
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Holistic data application quality
Lars Albertsson
 
Data ops in practice - Swedish style
Lars Albertsson
 
Open Source Secret Sauce - Lugor Sep 2011
Ted Husted
 
Path dependent-development (PyCon India)
ncoghlan_dev
 
Stop Sucking at Building Stuff!
Puppet
 
Continuous delivery applied
Mike McGarr
 
Buytaert kris tools
kuchinskaya
 

Similar to Secure software supply chain on a shoestring budget (20)

PDF
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
Denim Group
 
ODP
Path Dependent Development (PyCon AU)
ncoghlan_dev
 
PDF
Data engineering in 10 years.pdf
Lars Albertsson
 
PDF
Don't Suck at Building Stuff - Mykel Alvis at Puppet Camp Altanta
Puppet
 
PDF
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
Sergii Khomenko
 
PDF
Distributed software services to the cloud without breaking a sweat
José Ferreiro
 
PDF
Deploying distributed software services to the cloud without breaking a sweat
Susan Potter
 
PDF
Crossing the data divide
Lars Albertsson
 
PDF
7 tools for your devops stack
Kris Buytaert
 
PDF
DataOps - Lean principles and lean practices
Lars Albertsson
 
PDF
Scaling Up Lookout
Lookout
 
PPTX
Continuous Delivery Applied (AgileDC)
Mike McGarr
 
PPTX
Continuous Delivery Applied
DC Agile Engineering Conference
 
PDF
DevSecOps: essential tooling to enable continuous security 2019-09-16
Rich Mills
 
PDF
Lessons learned from building Demand Side Platform
bbogacki
 
PDF
Taming the reproducibility crisis
Lars Albertsson
 
PDF
The Brave New World of Continuous Release - Baruch Sadogursky
jaxconf
 
ODP
7 Tools for your Puppetized Devops stack
Kris Buytaert
 
PPTX
Continuous Delivery Applied (Agile Richmond)
Mike McGarr
 
PPTX
Continuous Delivery Applied
Excella
 
Threat Modeling the CI/CD Pipeline to Improve Software Supply Chain Security ...
Denim Group
 
Path Dependent Development (PyCon AU)
ncoghlan_dev
 
Data engineering in 10 years.pdf
Lars Albertsson
 
Don't Suck at Building Stuff - Mykel Alvis at Puppet Camp Altanta
Puppet
 
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
Sergii Khomenko
 
Distributed software services to the cloud without breaking a sweat
José Ferreiro
 
Deploying distributed software services to the cloud without breaking a sweat
Susan Potter
 
Crossing the data divide
Lars Albertsson
 
7 tools for your devops stack
Kris Buytaert
 
DataOps - Lean principles and lean practices
Lars Albertsson
 
Scaling Up Lookout
Lookout
 
Continuous Delivery Applied (AgileDC)
Mike McGarr
 
Continuous Delivery Applied
DC Agile Engineering Conference
 
DevSecOps: essential tooling to enable continuous security 2019-09-16
Rich Mills
 
Lessons learned from building Demand Side Platform
bbogacki
 
Taming the reproducibility crisis
Lars Albertsson
 
The Brave New World of Continuous Release - Baruch Sadogursky
jaxconf
 
7 Tools for your Puppetized Devops stack
Kris Buytaert
 
Continuous Delivery Applied (Agile Richmond)
Mike McGarr
 
Continuous Delivery Applied
Excella
 
Ad

More from Lars Albertsson (20)

PDF
All the DataOps, all the paradigms .
Lars Albertsson
 
PDF
Generative AI - the power to destroy democracy meets the security and reliabi...
Lars Albertsson
 
PDF
The road to pragmatic application of AI.pdf
Lars Albertsson
 
PDF
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
PDF
Industrialised data - the key to AI success.pdf
Lars Albertsson
 
PDF
Schema management with Scalameta
Lars Albertsson
 
PDF
How to not kill people - Berlin Buzzwords 2023.pdf
Lars Albertsson
 
PDF
The 7 habits of data effective companies.pdf
Lars Albertsson
 
PDF
Ai legal and ethics
Lars Albertsson
 
PDF
The right side of speed - learning to shift left
Lars Albertsson
 
PDF
Mortal analytics - Covid-19 and the problem of data quality
Lars Albertsson
 
PDF
The lean principles of data ops
Lars Albertsson
 
PDF
Data democratised
Lars Albertsson
 
PDF
Engineering data quality
Lars Albertsson
 
PDF
Eventually, time will kill your data processing
Lars Albertsson
 
PDF
Eventually, time will kill your data pipeline
Lars Albertsson
 
PPTX
Data ops in practice
Lars Albertsson
 
PDF
Kubernetes as data platform
Lars Albertsson
 
PDF
Don't build a data science team
Lars Albertsson
 
PDF
Big data == lean data
Lars Albertsson
 
All the DataOps, all the paradigms .
Lars Albertsson
 
Generative AI - the power to destroy democracy meets the security and reliabi...
Lars Albertsson
 
The road to pragmatic application of AI.pdf
Lars Albertsson
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Industrialised data - the key to AI success.pdf
Lars Albertsson
 
Schema management with Scalameta
Lars Albertsson
 
How to not kill people - Berlin Buzzwords 2023.pdf
Lars Albertsson
 
The 7 habits of data effective companies.pdf
Lars Albertsson
 
Ai legal and ethics
Lars Albertsson
 
The right side of speed - learning to shift left
Lars Albertsson
 
Mortal analytics - Covid-19 and the problem of data quality
Lars Albertsson
 
The lean principles of data ops
Lars Albertsson
 
Data democratised
Lars Albertsson
 
Engineering data quality
Lars Albertsson
 
Eventually, time will kill your data processing
Lars Albertsson
 
Eventually, time will kill your data pipeline
Lars Albertsson
 
Data ops in practice
Lars Albertsson
 
Kubernetes as data platform
Lars Albertsson
 
Don't build a data science team
Lars Albertsson
 
Big data == lean data
Lars Albertsson
 
Ad

Recently uploaded (20)

PDF
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Activate_Methodology_Summary presentatio
annapureddyn
 
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 

Secure software supply chain on a shoestring budget

  • 1. www.scling.com Secure software supply chain on a shoestring budget Lars Albertsson, Founder, Scling Jfokus, 2022-05-04 1
  • 3. www.scling.com What do we contribute? ● Internet, digitalisation + many good little things ● Ability to measure and manipulate populations at scale ● Monetising bad security ○ Stolen CPU cycles → money ○ Ransomware 3 https://spinbackup.com/blog/24-biggest-ransomware-attacks-in-2019/ https://blog.chainalysis.com/reports/2022-crypto-crime-report-preview-ransomware/ https://www.theguardian.com/news/2018/mar/17/ca mbridge-analytica-facebook-influence-us-election
  • 4. www.scling.com vs Risk-management rarely wins Employees have conflicting definitions of success Security vs productivity 4 Revenue-generation Features Delivery speed Security reviews Pentests Password reauthentication Phishing campaigns Firewalls …
  • 5. www.scling.com A simple recipe for application security: - While we value items on the right, we value items on the left more. - Invent alternatives that are aligned with speed - Give employees aligned definitions of success Security AND productivity 5 SSO Password managers Infrastructure as code Hardware MFA Ephemeral containers … Security reviews Pentests Password reauthentication Phishing campaigns Firewalls …
  • 6. www.scling.com Quality expectations 1995-2002 Quality expectations 2022 We have been here before 6 https://www.cnet.com/culture/windows-may-crash-after-49-7-days/
  • 7. www.scling.com Quality and ops 7 Aligning quality with speed TDD Continuous delivery Agile Dev-friendly ops tooling Test automation XP Cross-functional teams DevOps Trunk-based Continuous integration Containers
  • 8. www.scling.com ● Scaled processes ● Machine tools ● Challenges: scale, logistics, legal, organisation, faults, ... Manual, mechanised, industrialised 8 ● Muscle-powered ● Few tools ● Human touch for every step ● Direct human control ● Machine tools ● Low investment, direct return
  • 9. www.scling.com IT craft to factory 9 Security Waterfall Application delivery Traditional operations Traditional QA Infrastructure DevSecOps Agile Containers DevOps CI/CD Infrastructure as code
  • 10. www.scling.com ● Toyota: Low defect rates AND high margins per vehicle ● State of DevOps report: High reliability AND high deployment rate ○ We have industrialised software engineering Quality, speed - choose two 10 Quality vs Speed Quality AND Speed 1000x span in availability metrics
  • 11. www.scling.com Themes of good presentations, IMHO ● We have seen lots of X / X from a different angle. Here are some patterns. ● We have context Y. Here is how we work. ● We did a thing Z. Here is what we learnt. 11 We need to share how we work in order to make faster progress.
  • 13. www.scling.com Data industrialisation 13 DW ~10 year capability gap "data factory engineering" Enterprise big data failures "Modern data stack" - traditional workflows, new technology 4GL / UML phase of data engineering Data engineering education
  • 14. www.scling.com How data leaders work 14 Data processed offline Online Data factory Data platform & lake data Data innovation & functionality 100+K daily datasets 30% staff BigQuery daily users Value from data!
  • 15. www.scling.com Scling - data-factory-as-a-service 15 Data value through collaboration Customer Data factory Data platform & lake data domain expertise Value from data! Rapid data innovation Learning by doing, in collaboration
  • 16. www.scling.com Efficiency is sacred ● Productivity is our unique selling point ○ Client value from data is unpredictable ○ Clients don't know what they want ○ Quick experiments & pivot ● Minimal operational overhead ○ Pipelines / person ○ Datasets / day / person ● Nothing must undermine our USP 16
  • 17. www.scling.com Our security strategy ● Invest where it improves productivity ○ Cloud single sign on ○ Cloud identity management ○ Workload identities over secret tokens ○ Hardware multifactor authentication ○ Infrastructure as code ○ Patch management * ● Homogeneity over autonomy ○ Few technologies ○ Few processes ○ Processes encoded in code * 17 ● Minimal attack surface * ● Strict asset management ○ Digital assets as code ○ Process to align assets with code ○ Explicit manual asset management ● Lean on Google
  • 18. www.scling.com Minimising attack surfaces ● Few ecosystems ○ Ubuntu ○ Scala + Spark ○ Python ● Few components ○ Reuse over perfect match ● Few versions ○ Single version per third party component ○ Opens gates to dependency hell * ■ Control or autonomous cells 18
  • 19. www.scling.com Our supply chain ● Google cloud ○ Kubernetes, GCS, Cloud SQL, … ● Virtual machine images ○ Ubuntu, Google ● Container base images ○ Ubuntu, phusion, MySQL, … ● Apt packages ● SaaS ○ Google, Atlassian, Gitlab 19 ● Scala (+ other JVM) ○ Maven central ● Python ○ Pypi ● Direct downloads ○ URL + checksum ● Bazel plugins ○ URL + checksum ● Developer devices ○ Ubuntu, MacOS, Android, iOS
  • 20. www.scling.com Which version? ● Version specifications ○ Exact version ■ Good for application stability ○ Range ○ Latest ■ Good for patch latency ● Specification choice tradeoffs ○ Provider trust ○ Patch latency 20 ● Upgrade tradeoffs ○ Vulnerability patching ○ Rogue code ○ Bugs fixed ○ Bugs introduced ○ Necessary work ● Our goal: ○ Exact version ○ Transitive dependencies locked ○ Automatically updated ● Let's pursue!
  • 21. www.scling.com Levels of up to date ● No new version of A exists ● New A version exists. Application verified ok with upgrade. ● New A version exists. Unclear whether upgrade breaks application. ● New A version exists. Upgrade breaks application. ○ We use a deprecated API. ○ New version has bug. ● New A version exists. Upgrade breaks dependency B. ○ New version of B exists. ○ No new version of B exists. ○ A and B must atomically upgrade 21
  • 22. www.scling.com A bot friendly task ● There is some order that moves us forward through hell ● Slow trial and error cycle ○ Compile or test takes minutes ● There are bots ○ Dependabot, Scala steward ■ Way too complex (100/20 KLOC, 1000s lines of doc / examples) ○ Do not cover our needs ■ Application correctness ■ Our ecosystems 22
  • 23. www.scling.com With a strong process ● we can reason and automate ○ Trial and error forward ● Process strength ○ Faulty change is detected before prod ○ Non-code changes unlikely to affect correctness ○ Self-bootstrapping 23
  • 24. www.scling.com Strong process challenges ● Everything not covered by tests ● Test infrastructure / setup defined by code ○ How to test? ○ How to bootstrap? ● Indeterministic processes / components ○ Mostly deterministic is ok 24 Extended test suite: ● Testsuite bootstrap ● Continuous deployment testsuite ● Non-production functionality ○ Dev tooling ○ Web ○ …
  • 25. www.scling.com Our build process ● Monorepo + trunk-based ○ Platforms + all client code and pipelines ○ Single version of platform ● All tests verified* for every change ○ Tests do not require cloud resources ● Build + test speed challenging ○ Spark → seconds upstart time → slow tests ● Simple recipe for speed: ○ Avoid doing things → caching ○ Do things in parallel 25
  • 26. www.scling.com Bazel ● Designed for monorepos & strong process ○ Lazy tree evaluation ○ Isolated sandboxes ● Unmatched performance features ○ Isolation → reliable caching ○ Test result caching ○ Remote caching ○ Parallelism ○ Remote execution 26 ● Great for stuff used by Google ● Catching up on ○ Docker ○ Scala ○ Third-party dependencies
  • 27. www.scling.com Dependency version control ● Transitive, locked ○ Python ○ JVM ○ Lock files in version control ● Not transitive, locked ○ Direct downloads ○ Bazel plugins ○ Container base images ○ version.bzl file ■ → bazel, python, bash 27 ● Apt packages ○ Latest* ● Some Google components ○ VM base images, misc ○ Latest ● Employee devices ○ Manual ● Unmanaged leftovers ○ SaaS ○ Otherwise minimal exposure
  • 30. www.scling.com pip-compile, build time: bazel-deps, run time: Python vs JVM dependency failure 30
  • 31. www.scling.com Bazel & containers 31 {scala,py}_binary base image files / tars {scala,py}_image container_run_and_commit_layer Weak determinism Apt, files only Distroless tools install_pkgs
  • 32. www.scling.com Can we make apt install deterministic? ● apt-get typically provides latest ○ Determined by Packages.gz ○ Download during build breaks determinism & caching? ● Distroless bazel package_manager: ○ Exact Packages.gz specification ○ Debian: Versioned Packages.gz ○ Ubuntu: Only latest Packages.gz ● Compromise on determinism ○ Download Packages.gz before build ○ Caching still ok ● Not running apt scripts seemed to work. For a while. ○ Subtle low-level container failures ○ Abandoned 32
  • 33. www.scling.com ● Single unified platform ○ Monorepo + trunk-based process ○ Separate instance per client ○ All test suites run on every change ● Factories are adapted to constraints and important properties ○ Ok: Security, risk, quality, availability, compliance ○ No: Preferred technology, work processes Scling collaboration models 33 Refinement factory ● Raw data in ● Valuable data out ● Non-technical clients ● "Easy" domain Joint factory ● Hybrid teams ● Domain experts ● Data apprentices ● Scling runs data platform Client factory ● Start as joint factory ● Goal: Client independent
  • 34. www.scling.com Divided, multi-tenant platform 34 Orion base data platform GCP (but portable to other clouds) Isolated client instance Isolated client instance Isolated client instance Saturn non-essential operational tooling ion CLI tool scli CLI tool
  • 35. www.scling.com Client exit scenario 35 Orion base data platform Client cloud choice Isolated client instance Client monitoring, logging, identity, etc ion CLI tool
  • 36. www.scling.com Multiphase build bootstrap 36 Ubuntu some python docker benderbot python 3.x.y JVM bazel py deps ion gcloud kubectl scli hugo orion/bin/tool.py versions.bzl requirements.txt ● Images cached based on content ● Caches shared
  • 37. www.scling.com Benderbot ● Lazy bot that takes the easy way out ○ Dumb solutions over smart ● Find Guess next versions ○ 404 not found? Quick failure. ● Mimic developer actions ○ Upgrade source ○ Rerun bazel-deps / pip-compile ○ Run build bootstrap, test suite, dev tooling check ○ Look at logs to classify problem ○ Update checksum if necessary ○ Create merge request on success 37 ● Isolated environment ○ Separate region ○ No internal network access ○ Gitlab + logging bucket credentials ○ Cheap spot instance + NVMe
  • 38. www.scling.com ● Months of evening hacking ○ = weeks full time Benderbot components / efforts 38 benderbot.py < 1000 LOC Statistics data pipelines Reporting dashboard tool.py few LOCs, brittle Classification data pipeline Reevaluation journey: ● dash + plotly ● bokeh + bokeh ● streamlit + bokeh
  • 40. www.scling.com Resolution classifications ● No new version of A exists ● New A version exists. Application verified ok with upgrade. ● New A version exists. Unclear whether upgrade breaks application. ● New A version exists. Upgrade breaks application. ○ We use a deprecated API. ○ New version has bug. ● New A version exists. Upgrade breaks dependency B. ○ New version of B exists. ○ No new version of B exists. ○ A and B must atomically upgrade 40 not found test failure success test failure test failure test failure transient transient transient transient
  • 41. www.scling.com Our most productive developer ~500 MRs 41
  • 43. www.scling.com Benderbot stats - resolutions 43 More hardware Process flakiness Speculative execution
  • 45. www.scling.com Last resolution by dependency 45 Total Other JVM Python
  • 47. www.scling.com Google SLSA evaluation ● Supply-chain Levels for Software Artifacts ○ Maturity model ● SLSA 1: yes ● SLSA 2: yes ● SLSA 3: some ○ Prioritising speed over Ephemeral Environment, Isolated, Non-Falsifiable ● SLSA 4: some ○ Parameterless ○ Dependencies complete (except apt) 47
  • 48. www.scling.com Concluding remarks ● Challenges? ○ Operational tuning to balance rate vs € ○ Google cloud_sql_proxy patch update took us down ○ Diva dependencies need custom solutions ○ Which test failure to address? ● Future? ○ Upgrade conditional on container scanning? ○ Dead dependency detection? ● Open source? No. ○ Specific to our environment ○ Bot is easy. Just do it. ○ Strong process challenging. But rewarding. ○ Offer: A copy of the code for a C-level lunch date. :-) 48