SlideShare a Scribd company logo
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode and beyond:
End-to-end SCA with open
source code and open data
Philippe Ombredanne,
Lead maintainer of AboutCode and CTO of nexB, Inc.
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Agenda
2
● About AboutCode & nexB
● Software Composition
Analysis
○ Vulnerabilities AND licensing
○ Proprietary problems
● The AboutCode stack
● New projects
● Roadmap
● Questions?
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
About me
● On a mission to enable easier and safer to reuse FOSS code with
best-in-class open source Software Composition Analysis (SCA)
tools, data, and standards for open source discovery, license & security
compliance
● Lead maintainer of AboutCode projects (ScanCode, DejaCode,
VulnerableCode and others)
● Factoids
○ In 2010, I said that Docker technology would never succeed
○ Signed off on the largest deletion of code in the Linux kernel
(but these were only license comments)
● CTO and co-founder of nexB, Inc.
○ pombredanne@nexb.com
○ GitHub: https://github.com/pombredanne
○ LinkedIn: https://www.linkedin.com/in/philippeombredanne
○ Often assisted by Chihuahua Technical Advisor
3
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode and nexB
● AboutCode's FOSS-first mission: FOSS for FOSS
○ Open source tools and open knowledge base (AboutCode stack)
○ Simple and practical standards (Package-URL)
○ Applications for Legal & Business users (DejaCode) with APIs for everything
● Trusted experts in Software Composition Analysis (SCA) since 2007
○ Creator of Package-URL: https://github.com/package-url
○ Co-founders of SPDX: https://spdx.org
○ Contributors to CycloneDX: https://cyclonedx.org
○ Co-founders of ClearlyDefined: https://clearlydefined.io
● nexB provides professional services and support for SCA
○ 800+ SCA projects completed to-date with 100% customer satisfaction
○ Sponsored development for AboutCode projects
○ Technical support and advisory for SCA tools implementations and deployments
4
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Software Composition Analysis
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Identification – Identify distinct “units” of third-party software used in a product or project
and their provenance
● Licensing – Determine the licensing for each software unit
● Security – Identify known security vulnerabilities for each software unit
● Quality – Evaluate the quality of a software unit based on software development data, such
as number of bugs, fixes, etc. - this is the domain of the CHAOSS project
● Read "SCA the FOSS Way" for more information:
https://www.nexb.com/software-composition-analysis/
Software Composition Analysis needs to be a core competency for any
software development organization.
● Embed in the software development workflow from design through release - as it is in
manufacturing
● The choice of SCA tools will depend on your platform, stack and product
Software Composition Analysis (SCA)
6
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Most SCA tools focus on either vulnerabilities OR licensing
○ Current focus is on security vulnerabilities because of perceived higher risk
● The communities of interest are separate - security vs legal - but converging
● License data may be complex, yet mostly stable over time
○ But very few tools get it right. Accuracy is still a major, unsolved problem
● Dependency graphs are highly dynamic and demand constant care
○ They impact the stability of licensing and vulnerability information
● Vulnerability data is complex, but extremely dynamic - if included directly in
an SBOM, it may be wrong by the time you receive an SBOM
● Most SCA security tools are lightweight with respect to both provenance and
licensing, and focus on the easy things
You need SCA coverage for vulnerabilities AND licensing - plus quality.
SCA: Vulnerabilities AND licensing
7
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
SCA: Proprietary tools and data
8
● Increasingly expensive with the surge of interest in SBOMs and pricing
based on number of developers
● Large companies may be able to “afford” proprietary SCA scanning tools,
but they do not scale across the FOSS supply chain
○ The cost of scan curation is prohibitive with high false positive rates and poor license detection
accuracy
● Most current data about FOSS packages and vulnerabilities is proprietary
○ Vendors may offer some free or open source tools but you must pay for access to their data
○ Barrier to community access and analysis
● Many vendors use some open source for marketing only - “fauxpen source”
○ Complex and restrictive licenses
○ No contributions back to the community
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
SCA: Open source tools and data
9
● There are many open source SCA tools and some databases:
○ License compliance focus: ORT, Fossology, SW360
○ Vulnerability SBOM focus: CycloneDX, Dependency Check, Syft/Grype (Anchore) , Trivy (Aqua
Security)
● So, why did we develop AboutCode?
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Free and open source software AND free and open data
○ FOSS for FOSS
○ Open knowledge base with open data for licenses, packages and vulnerabilities
● Modular and integrated best-in-class SCA tools for developers
○ Tackling the harder code analysis problems so you do not have to
○ PURL-based for easier integration in/out
● Bespoke pipelines enable true end-to-end automation
○ Working towards management by exception to focus on the complex cases of origin and license
○ Decentralized analysis, close to the developers
● Management web app for centralized policies, curations and compliance
workflows and data
○ Supports engineering, business and legal stakeholders with features tailored for each using
common/shared information
Why AboutCode? [1]
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● The state of SCA tooling accuracy is not great
● Recently, made a large scale comparison of many container scanners
○ Both FOSS and commercial
○ Using SBOMs as a way to compare scans of the same container images
● Commercial tools are making up packages, "hallucinating" PURLs
● Most look only skin deep, only looking at package manifests and DB
● Beyond package origin, the quality of report licenses is plain bad and
misleading
○ In most case this is a grep on the declared license of package manifests
● Several tools created invalid SBOMs
● We can do better!
Why AboutCode? [2]
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Introducing the
AboutCode stack
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode: Who is using it?
Many organizations and most SCA providers use AboutCode tools,
libraries or standards:
○ Most free software and open source foundations
○ Five of the top big tech companies
○ A leading database company and a leading Linux company
○ European and US government agencies
○ All major European car manufacturers and most of their vendors
○ Major US chip and microprocessor providers
○ Four leading European industrial companies
○ All SBOM and VEX standards
○ All open source SCA and SBOM tools
○ Most proprietary SCA, SBOM or code hosting tools
13
SCA Tools
Management
Apps
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
SCA Tools Management Apps
Open Knowledge Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
SCA Tools Management Apps
Open Knowledge Base
ScanCode DejaCode
Licenses Packages Vulnerabilities
Scan Match Analysis pipelines Policies Curations Software inventory
Workflows SBOMs Custom reports
Binary analysis Dependency analysis
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Supports safe and compliant use of FOSS, with FOSS
○ Recognized worldwide as best-in-class tools
○ Modular design for adaptation to development team processes, tools and environment
○ Coverage for all languages and frameworks
○ Package URL (PURL) used throughout as the package identifier
○ Code AND data licensed under open source licenses, no gimmicks
● Reduce licensing and vulnerability risks from using FOSS or other
third-party software components
○ Share risk management responsibilities among business, legal, engineering and security teams
○ Provide a comprehensive view of open source and other third-party components used in your
software
● Active community of contributors and users, including many FOSS tools
● Technical support, implementation, advisory services available from nexB
Benefits of the AboutCode stack
16
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Contribute to an AboutCode project with code,
documentation, use cases, bug reports
■ https://github.com/nexB
● Sponsor AboutCode project maintainers
○ Accelerate development of new features and fund contributors
■ https://github.com/sponsors/nexB
● Buy support, implementation, and advisory services
from nexB to pay the maintainers
■ https://nexb.com
● Join the community:
■ https://www.aboutcode.org/
■ https://gitter.im/aboutcode-org/discuss
AboutCode also needs your help!
17
"Dependency" by xkcd, used under CC BY-NC 2.5 /
Modified text from original
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode:
New Projects
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Pipeline Input: sources and binaries
● Collect symbols and identifiers from source and binaries
○ Parse Java bytecode, ELF, DWARF, WinPE, Mach-O, JS mapfiles, collect literals,
source symbols
● Map and match these symbols from binaries back to source
● If not mapped, fall back to code matching the PurlDB
● Report discrepancies
○ Code that is found in binaries and NOT in the source
● WIP BUT the code from before xz has been able to detect
xz-utils problems and tagged the problematic, malicious build
script as "require review"! yeah!
Binary, deployment analysis: back2source
19
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
New project: CRAVEX [1]
20
● Goal is to automate App vulnerabilities management
● .... AND compliance regulatory reporting
● Built for open source projects and small businesses as a free and open
solution to comply with the emerging regulatory mandates (SBOMs,
CRA) with minimal friction and costs
● Package- and software product-centric management of vulnerabilities
● Web-based, database-backed application to collect, track, and triage
FOSS package vulnerabilities and determine their exploitability
○ Rank based on urgency, assess remediation
○ Create VEX reports
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
New project: CRAVEX [2]
● Import SBOM and scans for one or more apps, products & components
● Schedule vulnerability lookups and store the results in the database
● Web UI to rank and prioritize package vulnerabilities based on
○ Multiple scores
○ Rule-based automation
○ Vulnerable code reachability and exploitability
○ Usage context
● Export the results of the vulnerabilities triage and processing as VEX
documents and attestations
21
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
New project: Code reachability
● Upcoming companion to CRAVEX
● Goal: help prioritize vulnerabilities based on actual local exploitability
● Use multiple factors to help better qualify the urgency
● Symbols-based reachability of the vulnerable code
● Call graph-based reachability of the vulnerable code
● Integrate local context to assign exploitability priorities
○ Development or internal tool vs. production software or consumer device
● Integrate existing excellent FOSS efforts in the space
○ Eclipse steady, JORN, Chen
22
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Question: How to reuse safely AI-Generated code?
● AI-Generated code is a wonderful productivity booster
● Thought experiment
○ Build a small LM from only GPL-licensed code from the GNU project.
○ Add Gen-AI on top. Is the generated code derived from the GPL-licensed code?
● AI-Generated code may violate licenses and copyrights
● AI-Generated code may copy vulnerable code sections
● Some large businesses and open source foundations have defined
policies wrt. AI-generated code, in some cases prohibiting its use.
New project: GenAI Code Search [1]
23
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Fight fire with fire
○ An approach we considered is to use GenAI to regenerate code under analysis and compute
similarity between regenerated and original code
○ Impractical as too expensive and too slow
● Find similar code fragments
○ The focus of this project
● Traditional code fragments matching does not work for AI
○ The code is broken in chunks using a content-defined heuristic
○ Chunks are matched exactly using a checksum
○ BUT, AI-generated code is seldom exactly the same as indexed FOSS code
○ Existing solutions have ever growing indexes with more fragments to avoid false negative
○ Furthermore, precision and recall are frozen in the choice of parameters for the index
New project: GenAI Code Search [2]
24
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● A new approach to approximate code fragment matching
● Fingerprint-based
○ Helps scale the index, but also scale the query as a whole codebase (Gigabyte size) is the query
○ Traditional Information retrieval with inverted indexes does not work for queries this large
● Approximate, fuzzy fingerprinting
○ Using new algorithm that enables matching code that was never indexed
● Furthermore, tunable fingerprint
○ Can be tuned at query time for precision and recall
New project: GenAI Code Search [2]
25
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
New project idea: Open Containers KB
26
● Containers composition is a mess. Most tools are just plain bad
○ Lack basic tracing of license and package (and therefore vulnerabilities)
● Image builders, OS and distro vendors do not seem to care
○ Official images are sometimes not compliant or not traceable
○ Package volume amplifies vulnerability and license issues
○ Source of binary packages disappears
● We can do better!
● Project idea: create a mini consortium to do a
proper, automated and correct SCA of key
public base images
● Share these as open data
● Work with upstream to clean their acts
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
AboutCode
Roadmap
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Roadmap for AboutCode: ScanCode Toolkit
● Build single exe standalone apps for ScanCode for easier
deployment in Ci/CD
● Improve copyright and license detection speed
● Build smaller single-purpose tools and libraries from
"mono repo"
● Improve data models for Packages and
Dependencies/Requirements
● Parse more package manifests and lock files
● Improve support for license exceptions (WITH)
● Move inconclusive, unknown license detection to clues
● Add post-processing to rematch using SPDX matching
guidelines
28
28
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Integrate with CI and other tools
○ Create Ci/CD pre-configured integrations with main CI (GitHub, GitLab, Jenkins)
● Extend binary analysis and deployment tracing workflows
○ Support ELF/Native, Go, Ruby, Android in addition to Java and JS
○ Find the exact subset of the code that is deployed and used in production
● Automate analysis review in ScanCode.io
○ End to end automated pipelines for embedded devices, Android and C/C++
○ Multi-stack deployment analysis for Java, JS, C/C++
○ Report TODO items to review only "by exception"
Roadmap for AboutCode: SCA Tools
29
29
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Code match smart ranking and disambiguation
○ Avoid false positives
● Accurately match to the correct package version
● Match code snippets approximately
○ Using our new approximate fingerprinting
○ Integrate other code matching schemes from SWH and SCANOSS
● Match source symbols and binary symbols to sources
and binaries
● New matching pipelines
● Decentralized curation and corrections using
in-codebase ABOUT files
Roadmap for AboutCode: Code matching
30
30
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Compare scans to focus review work on changes only
(DeltaCode)
● APIs and CLI to query all the things by PURL from the KB
(purl2all)
● More code inspectors
○ Lightweight package dependency resolution
○ Dedicated ecosystem-focused libraries
● New lightweight package-inspector
○ Single executable to find packages and dependencies
● Trace build execution to find the exact subset of source
code that is deployed and used (TraceCode)
Roadmap for AboutCode: Other SCA Tools
31
31
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Roadmap for AboutCode: Management Apps
32
● Add support for CycloneDX 1.5 and 1.6 and SPDX 3.0
● Create new review automation apps:
○ License detection review
○ Code match review
○ Vulnerability review
● Overall goal is to reduce review and curation work
○ Extend license clarity scoring to code matches with origin clarity scoring
○ "Auto conclude" matches that are conclusive
● New app for advanced Vulnerability management and
support for CRA (Cyber Resiliency Act) compliance
○ Automated triage of vulnerabilities and workflow triggers
○ VEX creation, VEX import and export (Vulnerability Exploitability Exchange)
with CSAF and CycloneDX
32
Management
Apps
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Extend License data with compatibility matrix
● Add new license aliases dataset
● Add more extensive tagging and categorization
● Extend License data with improved exception details
○ To disambiguate license detections of L/GPL with/without exceptions
● Extend License data with improved "or later" details
○ To disambiguate detection of "or later" notices with their primary texts
● Add "key phrases" to all license detection rules
● Add variable text segments to license rules
● Add Fedora alternative SPDX identifiers
● Work with CycloneDX to become their license reference
Roadmap for AboutCode: Licenses
33
33
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
Roadmap for AboutCode: Vulnerabilities
34
● Extend Non-vulnerable dependency resolution
○ Beyond Python - add Java and JS
● Extend vulnerability data with new upstream data sources
● Add fix commit details and support for vulnerability reachability
● Mine the graph to surface related package fixes
● Mine git logs, issues and forums to enrich vulnerability data
● Surface inconsistencies and conflicts between different advisory
data sources (VulnTotal throughout)
● Add source/binary discrepancy data (from back2source)
34
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Confirm the true origin of code to avoid ambiguous matches
● Supply chain package verification
○ Map deployed binary packages to their corresponding source code
○ Find suspicious code drift between package versions
● Mine extensive list of "off registry" packages
○ Common native C/C++ code and libraries for embedded
○ Glibc, Busybox, zlib, etc. that are not published on ecosystem package registries
● Collect code symbols from source and binaries (for matching)
● On demand, just in time code mining to build your KB on the fly
● Federated, decentralized shared KB data with Git and ActivityPub
○ Share scans, vulnerabilities, origin facts and curations
○ Scan once, analyze once and collaborate on reviews to clear out the junk!
Roadmap for AboutCode: Packages
35
35
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode Stack
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode stack for SCA [1]
37
SCA Tools
Management
Apps
Open
Knowledge
Base
● Web-based enterprise management application
○ DejaCode for ensuring license and security compliance
● SCA tools for identifying third-party code and determining
code license and origin
○ ScanCode is the leading code scanner for software component, package
and dependency identification, and license detection
○ MatchCode is a new tool for package and file matching
○ container-inspector: analysis tool for Docker & other images
○ nuget-inspector and python-inspector for in-depth dependency resolution
○ many other libraries
○ See https://aboutcode.org for an overview of AboutCode projects
○ See https://github.com/nexB for the code
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode stack for SCA [2]
38
SCA Tools
Management
Apps
Open
Knowledge
Base
● Open knowledge base with open data for licenses, packages
and vulnerabilities
○ LicenseDB - open source and other public licenses at:
https://scancode-licensedb.aboutcode.org/
○ PurlDB - package data at: https://public.purldb.io/api/packages/
○ VulnerableCode - aggregated vulnerability data and comprehensive
vulnerability reporting at: https://public.vulnerablecode.io/
● Standards
○ Package-URL: Specification and tools for identifying packages at:
https://github.com/package-url
○ Univers: Parse and compare package versions and ranges at:
https://github.com/nexB/univers
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Industry-leading scanning engine
● License detection with multiple techniques, rule-based
● Copyright notices with NLP
● Identify packages
○ Normalize all the package metadata
○ Includes dependencies and package license detection
○ Package manifests, system package databases and lockfile parsing
● New summarization and license clarity scoring
○ Identify and focus curation on actual licensing issues
● Accuracy is paramount
○ An incorrect license detection is treated as a bug
● ABOUT files for curations/corrections stored in the codebase
The AboutCode stack: ScanCode Toolkit
39
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Web-based scanning server using ScanCode
○ Smarter scripted scanning in multiple steps
● Specialized pipelines for customized analysis
○ Tag items that need your review
○ Pipeline for best-in-class container and VM scanning
● Unique deployment analysis using binary analysis
○ Map binaries back to their sources
● Code matching integrated with the knowledge base
○ Starting with exact and approximate file matching
● Integrated enrichment of the knowledge base
○ Collect and pre-scan all the packages that you use
○ Watch and collect new versions continuously
The AboutCode stack: ScanCode.io
40
SCA Tools
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● New Web-based code matching server
● Includes mining for custom knowledge base
○ All package ecosystems and linux distros
● Smarter matching in multiple steps
○ Whole tree, exact file, approximate tree and file
○ Coming up: snippet matching, with a twist for AI-Generated code
● Pipeline for ranking and picking best matches
● A different matching approach
○ Exact matching demands a constantly growing index
○ Approximate matching can match software that is NOT indexed
○ Top down rather than bottom up
The AboutCode stack: MatchCode
41
SCA Tools
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Inspectors: tech-specific tools and dependency resolvers
○ Container and VM images, Debian, ELF and DWARF, NuGet, Python, source
● aboutcode-toolkit: Generate Attribution Notices
○ Using scans or ABOUT files as input
● package-url (PURL): URL string to identify a software package
○ Adopted by CSAF, CycloneDX, SPDX and the whole SCA ecosystem
○ Now part of the CVE specification v5.1
○ Recommended by US CISA and German BSi
● univers: parse and compare package versions and version
ranges
● license-expression: parse and compare License expressions
The AboutCode stack: Other projects
42
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Input: buildable codebase
● Run build under "strace" and collect the trace
○ All kernel syscalls that open, close, write to files, spawn processes
● Reconstruct build graph
○ Determine the subset of the sources used in deployment
● Then Scan and Match the source subset
● Useful, but still marginal usage as it requires a lot of tuning
Build tracing: TraceCode
43
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Licenses: 2,000+ licenses and 35,000 rules
○ ScanCode LicenseDB has the basic license data
○ ScanCode Toolkit has the license detection rules
○ DejaCode is synchronized with LicenseDB and adds License Conditions
○ All licenses have SPDX Identifiers with “Licenseref-scancode” namespace for the
many licenses not included in the SPDX License List (currently 567 licenses)
● No known alternative with comparable depth and breadth
The AboutCode stack: Open Data [1]
44
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Packages: 21M+ packages and, files and their fingerprints
○ PURL-based
○ Public PurlDB is at: https://public.purldb.io/api/packages/
○ All major ecosystems and distributions - sources AND binaries
○ Built-in mining of all package ecosystems, not half-baked
○ Also just-in-time, on-demand data collection
○ Collect, scan, and index all the packages sources, binaries and VCS repos
○ Index with code fingerprints used for code matching
● Other Package databases:
○ Software Heritage, ClearlyDefined, deps.dev (Google)
○ Centralized and too big to share
○ No on-premises option for private operations (too big again)
The AboutCode stack: Open Data [2]
45
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● Vulnerabilities: 760K+ packages and 240K+ vulnerabilities
○ PURL-based
○ Public VulnerableCodeDB is at: https://public.vulnerablecode.io/
○ All major ecosystems and vulnerability DBs aggregated and correlated
○ Discover relations (and inconsistencies) in data from mining the graph
● Other Vulnerability databases:
○ OSV (reuses some AboutCode code too), GitHub, GitLab, NVD
○ Often contain conflicting data for vulnerable ranges, fixed versions or affected
packages
○ Comparison made possible with VulnTotal to query vulnerable version ranges
given a PURL
The AboutCode stack: Open Data [3]
46
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
● PURLs (Package URLs) are wonderful
● : 760K+ packages and 240K+ vulnerabilities
○ PURL-based
○ Public VulnerableCodeDB is at: https://public.vulnerablecode.io/
○ All major ecosystems and vulnerability DBs aggregated and correlated
○ Discover relations (and inconsistencies) in data from mining the graph
● Other Vulnerability databases:
○ OSV (reuses some AboutCode code too), GitHub, GitLab, NVD
○ Often contain conflicting data for vulnerable ranges, fixed versions or affected
packages
○ Comparison made possible with VulnTotal to query vulnerable version ranges
given a PURL
The AboutCode stack: Open Data [3]
47
Open
Knowledge
Base
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode stack: DejaCode [1]
48
Integrate all tools and data in one web-based application for SCA
and compliance management
● Manage product and component Inventories
● Curate code origin and licenses
● Define and apply license policies
● Launch scans and access the Knowledge Base
● Identify package vulnerabilities
● Consume and enrich SBOMs (CycloneDX or SPDX)
● Generate FOSS compliance documents, such as product
Attribution Notices and SBOMs (CycloneDX or SPDX)
Management
Apps
Open
Knowledge
Base
SCA Tools
© AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org
The AboutCode stack: DejaCode [2]
49
Integrate all tools and data in one web-based application for SCA
and compliance management
● Standard and custom reports
● JSON API and webhooks
● Built-in basic workflows
● Integrated with AboutCode SCA Tools and Open Knowledge
Base
Management
Apps
Open
Knowledge
Base
SCA Tools

More Related Content

Similar to OpenChain Webinar: AboutCode and Beyond - End-to-End SCA (20)

PPTX
VulnerableCode: Finding FOSS software vulnerabilities with FOSS tools
Michael Herzog
 
PDF
Your Code Isn’t Static. Your Processes Shouldn’t be Either.
DevOps.com
 
PPTX
VulnTotal: Cross-validate vulnerability coverage of VulnerableCode
Michael Herzog
 
PDF
Supercharging project health check
David Horvath
 
PDF
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
Perforce
 
PPTX
Managing Open Source Software License Compliance with DejaCode
nexB Inc.
 
PDF
Research Software Engineering A Guide To The Open Source Ecosystem Matthias B...
kleksramble
 
PPTX
Generating SBOMS FROM FOSS_(Detecting OSS licences)
Thierry Gayet
 
PPTX
A Vulnerability Database Should Not Be About Vulnerabilities!
Michael Herzog
 
PDF
Nurturing the Software Ecosystems of the Future
Tom Mens
 
PDF
Analyse de la composition logicielle à l’aide d’outils open source
Open Source Experience
 
PDF
nexB - Software audit for product release
nexB Inc.
 
PDF
Securing Open Source Code in Enterprise
Asankhaya Sharma
 
PDF
Maintaining and Releasing Open Source Software
Joel Nothman
 
PPT
Adopting Open Source Software for Longer-term Reuse
Software Sustainability Institute
 
PPTX
How to Manage Open Source requirements with AboutCode
nexB Inc.
 
PPTX
Contemporary software TRENDS SOFTWARE TRENDS
melissaguillermo
 
PDF
Software Bill of Materials - Accelerating Your Secure Embedded Development.pdf
ICS
 
PPTX
CodeQuality.pptx
Shahram Foroozan
 
PDF
WSO2Con EU 2015: Open Source Journey at Ordnance Survey
WSO2
 
VulnerableCode: Finding FOSS software vulnerabilities with FOSS tools
Michael Herzog
 
Your Code Isn’t Static. Your Processes Shouldn’t be Either.
DevOps.com
 
VulnTotal: Cross-validate vulnerability coverage of VulnerableCode
Michael Herzog
 
Supercharging project health check
David Horvath
 
Code Reuse Made Easy: Uncovering the Hidden Gems of Corporate and Open Source...
Perforce
 
Managing Open Source Software License Compliance with DejaCode
nexB Inc.
 
Research Software Engineering A Guide To The Open Source Ecosystem Matthias B...
kleksramble
 
Generating SBOMS FROM FOSS_(Detecting OSS licences)
Thierry Gayet
 
A Vulnerability Database Should Not Be About Vulnerabilities!
Michael Herzog
 
Nurturing the Software Ecosystems of the Future
Tom Mens
 
Analyse de la composition logicielle à l’aide d’outils open source
Open Source Experience
 
nexB - Software audit for product release
nexB Inc.
 
Securing Open Source Code in Enterprise
Asankhaya Sharma
 
Maintaining and Releasing Open Source Software
Joel Nothman
 
Adopting Open Source Software for Longer-term Reuse
Software Sustainability Institute
 
How to Manage Open Source requirements with AboutCode
nexB Inc.
 
Contemporary software TRENDS SOFTWARE TRENDS
melissaguillermo
 
Software Bill of Materials - Accelerating Your Secure Embedded Development.pdf
ICS
 
CodeQuality.pptx
Shahram Foroozan
 
WSO2Con EU 2015: Open Source Journey at Ordnance Survey
WSO2
 

More from Shane Coughlan (20)

PPTX
Operations Profile SPDX_Update_20250711_Example_05_03.pptx
Shane Coughlan
 
PDF
The 3rd OSPO Summit - China (Beijing - 2025-06-12)
Shane Coughlan
 
PPTX
OpenChain Korea Work Group Meeting - 2025-06-16
Shane Coughlan
 
PPTX
OpenChain Tooling Work Group - 2025-07-02
Shane Coughlan
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
OpenChain China Work Group – Regular Meeting 3 – 2024-11-29 @ 14:00 to 17:30
Shane Coughlan
 
PPTX
OpenChain @ InnerSource Summit 2024 - 2024-11-20
Shane Coughlan
 
PPTX
OpenChain Korea Work Group Meeting #24 - 2024-11-26
Shane Coughlan
 
PDF
Compliance and Integrity in the Software Supply Chain with Software Heritage:...
Shane Coughlan
 
PDF
Fujitsu’s OSS standards conformance and AI Management System Standardization ...
Shane Coughlan
 
PPTX
OpenChain China Work Group Presentation @ OSCAR 2024
Shane Coughlan
 
PPTX
OpenChain Japan Community Day - 2024-10-17
Shane Coughlan
 
PPTX
ETRI EOST2024 Seoul Keynote - 2024-10-15
Shane Coughlan
 
PDF
OpenChain Webinar- The Role of Data in the Supply Chain of AI - 2024-10-10
Shane Coughlan
 
PDF
SBOM Implementation Reality - From Crawl to Walk, the SPDX Lite Profile for t...
Shane Coughlan
 
PPTX
OpenChain Webinar - AI Legal Landscape - Slides
Shane Coughlan
 
PDF
OpenChain Telco SBOM Guide Overview - 2024-09-25
Shane Coughlan
 
Operations Profile SPDX_Update_20250711_Example_05_03.pptx
Shane Coughlan
 
The 3rd OSPO Summit - China (Beijing - 2025-06-12)
Shane Coughlan
 
OpenChain Korea Work Group Meeting - 2025-06-16
Shane Coughlan
 
OpenChain Tooling Work Group - 2025-07-02
Shane Coughlan
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
OpenChain China Work Group – Regular Meeting 3 – 2024-11-29 @ 14:00 to 17:30
Shane Coughlan
 
OpenChain @ InnerSource Summit 2024 - 2024-11-20
Shane Coughlan
 
OpenChain Korea Work Group Meeting #24 - 2024-11-26
Shane Coughlan
 
Compliance and Integrity in the Software Supply Chain with Software Heritage:...
Shane Coughlan
 
Fujitsu’s OSS standards conformance and AI Management System Standardization ...
Shane Coughlan
 
OpenChain China Work Group Presentation @ OSCAR 2024
Shane Coughlan
 
OpenChain Japan Community Day - 2024-10-17
Shane Coughlan
 
ETRI EOST2024 Seoul Keynote - 2024-10-15
Shane Coughlan
 
OpenChain Webinar- The Role of Data in the Supply Chain of AI - 2024-10-10
Shane Coughlan
 
SBOM Implementation Reality - From Crawl to Walk, the SPDX Lite Profile for t...
Shane Coughlan
 
OpenChain Webinar - AI Legal Landscape - Slides
Shane Coughlan
 
OpenChain Telco SBOM Guide Overview - 2024-09-25
Shane Coughlan
 
Ad

Recently uploaded (20)

PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
PPTX
Presentation about Database and Database Administrator
abhishekchauhan86963
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PDF
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
PDF
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
PDF
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
Presentation about Database and Database Administrator
abhishekchauhan86963
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
Presentation about variables and constant.pptx
kr2589474
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
Ad

OpenChain Webinar: AboutCode and Beyond - End-to-End SCA

  • 1. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode and beyond: End-to-end SCA with open source code and open data Philippe Ombredanne, Lead maintainer of AboutCode and CTO of nexB, Inc.
  • 2. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Agenda 2 ● About AboutCode & nexB ● Software Composition Analysis ○ Vulnerabilities AND licensing ○ Proprietary problems ● The AboutCode stack ● New projects ● Roadmap ● Questions?
  • 3. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org About me ● On a mission to enable easier and safer to reuse FOSS code with best-in-class open source Software Composition Analysis (SCA) tools, data, and standards for open source discovery, license & security compliance ● Lead maintainer of AboutCode projects (ScanCode, DejaCode, VulnerableCode and others) ● Factoids ○ In 2010, I said that Docker technology would never succeed ○ Signed off on the largest deletion of code in the Linux kernel (but these were only license comments) ● CTO and co-founder of nexB, Inc. ○ [email protected] ○ GitHub: https://github.com/pombredanne ○ LinkedIn: https://www.linkedin.com/in/philippeombredanne ○ Often assisted by Chihuahua Technical Advisor 3
  • 4. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode and nexB ● AboutCode's FOSS-first mission: FOSS for FOSS ○ Open source tools and open knowledge base (AboutCode stack) ○ Simple and practical standards (Package-URL) ○ Applications for Legal & Business users (DejaCode) with APIs for everything ● Trusted experts in Software Composition Analysis (SCA) since 2007 ○ Creator of Package-URL: https://github.com/package-url ○ Co-founders of SPDX: https://spdx.org ○ Contributors to CycloneDX: https://cyclonedx.org ○ Co-founders of ClearlyDefined: https://clearlydefined.io ● nexB provides professional services and support for SCA ○ 800+ SCA projects completed to-date with 100% customer satisfaction ○ Sponsored development for AboutCode projects ○ Technical support and advisory for SCA tools implementations and deployments 4
  • 5. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Software Composition Analysis
  • 6. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Identification – Identify distinct “units” of third-party software used in a product or project and their provenance ● Licensing – Determine the licensing for each software unit ● Security – Identify known security vulnerabilities for each software unit ● Quality – Evaluate the quality of a software unit based on software development data, such as number of bugs, fixes, etc. - this is the domain of the CHAOSS project ● Read "SCA the FOSS Way" for more information: https://www.nexb.com/software-composition-analysis/ Software Composition Analysis needs to be a core competency for any software development organization. ● Embed in the software development workflow from design through release - as it is in manufacturing ● The choice of SCA tools will depend on your platform, stack and product Software Composition Analysis (SCA) 6
  • 7. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Most SCA tools focus on either vulnerabilities OR licensing ○ Current focus is on security vulnerabilities because of perceived higher risk ● The communities of interest are separate - security vs legal - but converging ● License data may be complex, yet mostly stable over time ○ But very few tools get it right. Accuracy is still a major, unsolved problem ● Dependency graphs are highly dynamic and demand constant care ○ They impact the stability of licensing and vulnerability information ● Vulnerability data is complex, but extremely dynamic - if included directly in an SBOM, it may be wrong by the time you receive an SBOM ● Most SCA security tools are lightweight with respect to both provenance and licensing, and focus on the easy things You need SCA coverage for vulnerabilities AND licensing - plus quality. SCA: Vulnerabilities AND licensing 7
  • 8. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org SCA: Proprietary tools and data 8 ● Increasingly expensive with the surge of interest in SBOMs and pricing based on number of developers ● Large companies may be able to “afford” proprietary SCA scanning tools, but they do not scale across the FOSS supply chain ○ The cost of scan curation is prohibitive with high false positive rates and poor license detection accuracy ● Most current data about FOSS packages and vulnerabilities is proprietary ○ Vendors may offer some free or open source tools but you must pay for access to their data ○ Barrier to community access and analysis ● Many vendors use some open source for marketing only - “fauxpen source” ○ Complex and restrictive licenses ○ No contributions back to the community
  • 9. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org SCA: Open source tools and data 9 ● There are many open source SCA tools and some databases: ○ License compliance focus: ORT, Fossology, SW360 ○ Vulnerability SBOM focus: CycloneDX, Dependency Check, Syft/Grype (Anchore) , Trivy (Aqua Security) ● So, why did we develop AboutCode?
  • 10. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Free and open source software AND free and open data ○ FOSS for FOSS ○ Open knowledge base with open data for licenses, packages and vulnerabilities ● Modular and integrated best-in-class SCA tools for developers ○ Tackling the harder code analysis problems so you do not have to ○ PURL-based for easier integration in/out ● Bespoke pipelines enable true end-to-end automation ○ Working towards management by exception to focus on the complex cases of origin and license ○ Decentralized analysis, close to the developers ● Management web app for centralized policies, curations and compliance workflows and data ○ Supports engineering, business and legal stakeholders with features tailored for each using common/shared information Why AboutCode? [1]
  • 11. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● The state of SCA tooling accuracy is not great ● Recently, made a large scale comparison of many container scanners ○ Both FOSS and commercial ○ Using SBOMs as a way to compare scans of the same container images ● Commercial tools are making up packages, "hallucinating" PURLs ● Most look only skin deep, only looking at package manifests and DB ● Beyond package origin, the quality of report licenses is plain bad and misleading ○ In most case this is a grep on the declared license of package manifests ● Several tools created invalid SBOMs ● We can do better! Why AboutCode? [2]
  • 12. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Introducing the AboutCode stack
  • 13. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode: Who is using it? Many organizations and most SCA providers use AboutCode tools, libraries or standards: ○ Most free software and open source foundations ○ Five of the top big tech companies ○ A leading database company and a leading Linux company ○ European and US government agencies ○ All major European car manufacturers and most of their vendors ○ Major US chip and microprocessor providers ○ Four leading European industrial companies ○ All SBOM and VEX standards ○ All open source SCA and SBOM tools ○ Most proprietary SCA, SBOM or code hosting tools 13 SCA Tools Management Apps Open Knowledge Base
  • 14. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org SCA Tools Management Apps Open Knowledge Base
  • 15. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org SCA Tools Management Apps Open Knowledge Base ScanCode DejaCode Licenses Packages Vulnerabilities Scan Match Analysis pipelines Policies Curations Software inventory Workflows SBOMs Custom reports Binary analysis Dependency analysis
  • 16. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Supports safe and compliant use of FOSS, with FOSS ○ Recognized worldwide as best-in-class tools ○ Modular design for adaptation to development team processes, tools and environment ○ Coverage for all languages and frameworks ○ Package URL (PURL) used throughout as the package identifier ○ Code AND data licensed under open source licenses, no gimmicks ● Reduce licensing and vulnerability risks from using FOSS or other third-party software components ○ Share risk management responsibilities among business, legal, engineering and security teams ○ Provide a comprehensive view of open source and other third-party components used in your software ● Active community of contributors and users, including many FOSS tools ● Technical support, implementation, advisory services available from nexB Benefits of the AboutCode stack 16
  • 17. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Contribute to an AboutCode project with code, documentation, use cases, bug reports ■ https://github.com/nexB ● Sponsor AboutCode project maintainers ○ Accelerate development of new features and fund contributors ■ https://github.com/sponsors/nexB ● Buy support, implementation, and advisory services from nexB to pay the maintainers ■ https://nexb.com ● Join the community: ■ https://www.aboutcode.org/ ■ https://gitter.im/aboutcode-org/discuss AboutCode also needs your help! 17 "Dependency" by xkcd, used under CC BY-NC 2.5 / Modified text from original
  • 18. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode: New Projects
  • 19. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Pipeline Input: sources and binaries ● Collect symbols and identifiers from source and binaries ○ Parse Java bytecode, ELF, DWARF, WinPE, Mach-O, JS mapfiles, collect literals, source symbols ● Map and match these symbols from binaries back to source ● If not mapped, fall back to code matching the PurlDB ● Report discrepancies ○ Code that is found in binaries and NOT in the source ● WIP BUT the code from before xz has been able to detect xz-utils problems and tagged the problematic, malicious build script as "require review"! yeah! Binary, deployment analysis: back2source 19 SCA Tools
  • 20. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org New project: CRAVEX [1] 20 ● Goal is to automate App vulnerabilities management ● .... AND compliance regulatory reporting ● Built for open source projects and small businesses as a free and open solution to comply with the emerging regulatory mandates (SBOMs, CRA) with minimal friction and costs ● Package- and software product-centric management of vulnerabilities ● Web-based, database-backed application to collect, track, and triage FOSS package vulnerabilities and determine their exploitability ○ Rank based on urgency, assess remediation ○ Create VEX reports
  • 21. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org New project: CRAVEX [2] ● Import SBOM and scans for one or more apps, products & components ● Schedule vulnerability lookups and store the results in the database ● Web UI to rank and prioritize package vulnerabilities based on ○ Multiple scores ○ Rule-based automation ○ Vulnerable code reachability and exploitability ○ Usage context ● Export the results of the vulnerabilities triage and processing as VEX documents and attestations 21
  • 22. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org New project: Code reachability ● Upcoming companion to CRAVEX ● Goal: help prioritize vulnerabilities based on actual local exploitability ● Use multiple factors to help better qualify the urgency ● Symbols-based reachability of the vulnerable code ● Call graph-based reachability of the vulnerable code ● Integrate local context to assign exploitability priorities ○ Development or internal tool vs. production software or consumer device ● Integrate existing excellent FOSS efforts in the space ○ Eclipse steady, JORN, Chen 22
  • 23. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Question: How to reuse safely AI-Generated code? ● AI-Generated code is a wonderful productivity booster ● Thought experiment ○ Build a small LM from only GPL-licensed code from the GNU project. ○ Add Gen-AI on top. Is the generated code derived from the GPL-licensed code? ● AI-Generated code may violate licenses and copyrights ● AI-Generated code may copy vulnerable code sections ● Some large businesses and open source foundations have defined policies wrt. AI-generated code, in some cases prohibiting its use. New project: GenAI Code Search [1] 23
  • 24. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Fight fire with fire ○ An approach we considered is to use GenAI to regenerate code under analysis and compute similarity between regenerated and original code ○ Impractical as too expensive and too slow ● Find similar code fragments ○ The focus of this project ● Traditional code fragments matching does not work for AI ○ The code is broken in chunks using a content-defined heuristic ○ Chunks are matched exactly using a checksum ○ BUT, AI-generated code is seldom exactly the same as indexed FOSS code ○ Existing solutions have ever growing indexes with more fragments to avoid false negative ○ Furthermore, precision and recall are frozen in the choice of parameters for the index New project: GenAI Code Search [2] 24
  • 25. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● A new approach to approximate code fragment matching ● Fingerprint-based ○ Helps scale the index, but also scale the query as a whole codebase (Gigabyte size) is the query ○ Traditional Information retrieval with inverted indexes does not work for queries this large ● Approximate, fuzzy fingerprinting ○ Using new algorithm that enables matching code that was never indexed ● Furthermore, tunable fingerprint ○ Can be tuned at query time for precision and recall New project: GenAI Code Search [2] 25
  • 26. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org New project idea: Open Containers KB 26 ● Containers composition is a mess. Most tools are just plain bad ○ Lack basic tracing of license and package (and therefore vulnerabilities) ● Image builders, OS and distro vendors do not seem to care ○ Official images are sometimes not compliant or not traceable ○ Package volume amplifies vulnerability and license issues ○ Source of binary packages disappears ● We can do better! ● Project idea: create a mini consortium to do a proper, automated and correct SCA of key public base images ● Share these as open data ● Work with upstream to clean their acts
  • 27. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org AboutCode Roadmap
  • 28. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Roadmap for AboutCode: ScanCode Toolkit ● Build single exe standalone apps for ScanCode for easier deployment in Ci/CD ● Improve copyright and license detection speed ● Build smaller single-purpose tools and libraries from "mono repo" ● Improve data models for Packages and Dependencies/Requirements ● Parse more package manifests and lock files ● Improve support for license exceptions (WITH) ● Move inconclusive, unknown license detection to clues ● Add post-processing to rematch using SPDX matching guidelines 28 28 SCA Tools
  • 29. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Integrate with CI and other tools ○ Create Ci/CD pre-configured integrations with main CI (GitHub, GitLab, Jenkins) ● Extend binary analysis and deployment tracing workflows ○ Support ELF/Native, Go, Ruby, Android in addition to Java and JS ○ Find the exact subset of the code that is deployed and used in production ● Automate analysis review in ScanCode.io ○ End to end automated pipelines for embedded devices, Android and C/C++ ○ Multi-stack deployment analysis for Java, JS, C/C++ ○ Report TODO items to review only "by exception" Roadmap for AboutCode: SCA Tools 29 29 SCA Tools
  • 30. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Code match smart ranking and disambiguation ○ Avoid false positives ● Accurately match to the correct package version ● Match code snippets approximately ○ Using our new approximate fingerprinting ○ Integrate other code matching schemes from SWH and SCANOSS ● Match source symbols and binary symbols to sources and binaries ● New matching pipelines ● Decentralized curation and corrections using in-codebase ABOUT files Roadmap for AboutCode: Code matching 30 30 SCA Tools
  • 31. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Compare scans to focus review work on changes only (DeltaCode) ● APIs and CLI to query all the things by PURL from the KB (purl2all) ● More code inspectors ○ Lightweight package dependency resolution ○ Dedicated ecosystem-focused libraries ● New lightweight package-inspector ○ Single executable to find packages and dependencies ● Trace build execution to find the exact subset of source code that is deployed and used (TraceCode) Roadmap for AboutCode: Other SCA Tools 31 31 SCA Tools
  • 32. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Roadmap for AboutCode: Management Apps 32 ● Add support for CycloneDX 1.5 and 1.6 and SPDX 3.0 ● Create new review automation apps: ○ License detection review ○ Code match review ○ Vulnerability review ● Overall goal is to reduce review and curation work ○ Extend license clarity scoring to code matches with origin clarity scoring ○ "Auto conclude" matches that are conclusive ● New app for advanced Vulnerability management and support for CRA (Cyber Resiliency Act) compliance ○ Automated triage of vulnerabilities and workflow triggers ○ VEX creation, VEX import and export (Vulnerability Exploitability Exchange) with CSAF and CycloneDX 32 Management Apps
  • 33. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Extend License data with compatibility matrix ● Add new license aliases dataset ● Add more extensive tagging and categorization ● Extend License data with improved exception details ○ To disambiguate license detections of L/GPL with/without exceptions ● Extend License data with improved "or later" details ○ To disambiguate detection of "or later" notices with their primary texts ● Add "key phrases" to all license detection rules ● Add variable text segments to license rules ● Add Fedora alternative SPDX identifiers ● Work with CycloneDX to become their license reference Roadmap for AboutCode: Licenses 33 33 Open Knowledge Base
  • 34. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org Roadmap for AboutCode: Vulnerabilities 34 ● Extend Non-vulnerable dependency resolution ○ Beyond Python - add Java and JS ● Extend vulnerability data with new upstream data sources ● Add fix commit details and support for vulnerability reachability ● Mine the graph to surface related package fixes ● Mine git logs, issues and forums to enrich vulnerability data ● Surface inconsistencies and conflicts between different advisory data sources (VulnTotal throughout) ● Add source/binary discrepancy data (from back2source) 34 Open Knowledge Base
  • 35. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Confirm the true origin of code to avoid ambiguous matches ● Supply chain package verification ○ Map deployed binary packages to their corresponding source code ○ Find suspicious code drift between package versions ● Mine extensive list of "off registry" packages ○ Common native C/C++ code and libraries for embedded ○ Glibc, Busybox, zlib, etc. that are not published on ecosystem package registries ● Collect code symbols from source and binaries (for matching) ● On demand, just in time code mining to build your KB on the fly ● Federated, decentralized shared KB data with Git and ActivityPub ○ Share scans, vulnerabilities, origin facts and curations ○ Scan once, analyze once and collaborate on reviews to clear out the junk! Roadmap for AboutCode: Packages 35 35 Open Knowledge Base
  • 36. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode Stack
  • 37. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode stack for SCA [1] 37 SCA Tools Management Apps Open Knowledge Base ● Web-based enterprise management application ○ DejaCode for ensuring license and security compliance ● SCA tools for identifying third-party code and determining code license and origin ○ ScanCode is the leading code scanner for software component, package and dependency identification, and license detection ○ MatchCode is a new tool for package and file matching ○ container-inspector: analysis tool for Docker & other images ○ nuget-inspector and python-inspector for in-depth dependency resolution ○ many other libraries ○ See https://aboutcode.org for an overview of AboutCode projects ○ See https://github.com/nexB for the code
  • 38. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode stack for SCA [2] 38 SCA Tools Management Apps Open Knowledge Base ● Open knowledge base with open data for licenses, packages and vulnerabilities ○ LicenseDB - open source and other public licenses at: https://scancode-licensedb.aboutcode.org/ ○ PurlDB - package data at: https://public.purldb.io/api/packages/ ○ VulnerableCode - aggregated vulnerability data and comprehensive vulnerability reporting at: https://public.vulnerablecode.io/ ● Standards ○ Package-URL: Specification and tools for identifying packages at: https://github.com/package-url ○ Univers: Parse and compare package versions and ranges at: https://github.com/nexB/univers
  • 39. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Industry-leading scanning engine ● License detection with multiple techniques, rule-based ● Copyright notices with NLP ● Identify packages ○ Normalize all the package metadata ○ Includes dependencies and package license detection ○ Package manifests, system package databases and lockfile parsing ● New summarization and license clarity scoring ○ Identify and focus curation on actual licensing issues ● Accuracy is paramount ○ An incorrect license detection is treated as a bug ● ABOUT files for curations/corrections stored in the codebase The AboutCode stack: ScanCode Toolkit 39 SCA Tools
  • 40. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Web-based scanning server using ScanCode ○ Smarter scripted scanning in multiple steps ● Specialized pipelines for customized analysis ○ Tag items that need your review ○ Pipeline for best-in-class container and VM scanning ● Unique deployment analysis using binary analysis ○ Map binaries back to their sources ● Code matching integrated with the knowledge base ○ Starting with exact and approximate file matching ● Integrated enrichment of the knowledge base ○ Collect and pre-scan all the packages that you use ○ Watch and collect new versions continuously The AboutCode stack: ScanCode.io 40 SCA Tools Open Knowledge Base
  • 41. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● New Web-based code matching server ● Includes mining for custom knowledge base ○ All package ecosystems and linux distros ● Smarter matching in multiple steps ○ Whole tree, exact file, approximate tree and file ○ Coming up: snippet matching, with a twist for AI-Generated code ● Pipeline for ranking and picking best matches ● A different matching approach ○ Exact matching demands a constantly growing index ○ Approximate matching can match software that is NOT indexed ○ Top down rather than bottom up The AboutCode stack: MatchCode 41 SCA Tools Open Knowledge Base
  • 42. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Inspectors: tech-specific tools and dependency resolvers ○ Container and VM images, Debian, ELF and DWARF, NuGet, Python, source ● aboutcode-toolkit: Generate Attribution Notices ○ Using scans or ABOUT files as input ● package-url (PURL): URL string to identify a software package ○ Adopted by CSAF, CycloneDX, SPDX and the whole SCA ecosystem ○ Now part of the CVE specification v5.1 ○ Recommended by US CISA and German BSi ● univers: parse and compare package versions and version ranges ● license-expression: parse and compare License expressions The AboutCode stack: Other projects 42 SCA Tools
  • 43. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Input: buildable codebase ● Run build under "strace" and collect the trace ○ All kernel syscalls that open, close, write to files, spawn processes ● Reconstruct build graph ○ Determine the subset of the sources used in deployment ● Then Scan and Match the source subset ● Useful, but still marginal usage as it requires a lot of tuning Build tracing: TraceCode 43 SCA Tools
  • 44. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Licenses: 2,000+ licenses and 35,000 rules ○ ScanCode LicenseDB has the basic license data ○ ScanCode Toolkit has the license detection rules ○ DejaCode is synchronized with LicenseDB and adds License Conditions ○ All licenses have SPDX Identifiers with “Licenseref-scancode” namespace for the many licenses not included in the SPDX License List (currently 567 licenses) ● No known alternative with comparable depth and breadth The AboutCode stack: Open Data [1] 44 Open Knowledge Base
  • 45. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Packages: 21M+ packages and, files and their fingerprints ○ PURL-based ○ Public PurlDB is at: https://public.purldb.io/api/packages/ ○ All major ecosystems and distributions - sources AND binaries ○ Built-in mining of all package ecosystems, not half-baked ○ Also just-in-time, on-demand data collection ○ Collect, scan, and index all the packages sources, binaries and VCS repos ○ Index with code fingerprints used for code matching ● Other Package databases: ○ Software Heritage, ClearlyDefined, deps.dev (Google) ○ Centralized and too big to share ○ No on-premises option for private operations (too big again) The AboutCode stack: Open Data [2] 45 Open Knowledge Base
  • 46. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● Vulnerabilities: 760K+ packages and 240K+ vulnerabilities ○ PURL-based ○ Public VulnerableCodeDB is at: https://public.vulnerablecode.io/ ○ All major ecosystems and vulnerability DBs aggregated and correlated ○ Discover relations (and inconsistencies) in data from mining the graph ● Other Vulnerability databases: ○ OSV (reuses some AboutCode code too), GitHub, GitLab, NVD ○ Often contain conflicting data for vulnerable ranges, fixed versions or affected packages ○ Comparison made possible with VulnTotal to query vulnerable version ranges given a PURL The AboutCode stack: Open Data [3] 46 Open Knowledge Base
  • 47. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org ● PURLs (Package URLs) are wonderful ● : 760K+ packages and 240K+ vulnerabilities ○ PURL-based ○ Public VulnerableCodeDB is at: https://public.vulnerablecode.io/ ○ All major ecosystems and vulnerability DBs aggregated and correlated ○ Discover relations (and inconsistencies) in data from mining the graph ● Other Vulnerability databases: ○ OSV (reuses some AboutCode code too), GitHub, GitLab, NVD ○ Often contain conflicting data for vulnerable ranges, fixed versions or affected packages ○ Comparison made possible with VulnTotal to query vulnerable version ranges given a PURL The AboutCode stack: Open Data [3] 47 Open Knowledge Base
  • 48. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode stack: DejaCode [1] 48 Integrate all tools and data in one web-based application for SCA and compliance management ● Manage product and component Inventories ● Curate code origin and licenses ● Define and apply license policies ● Launch scans and access the Knowledge Base ● Identify package vulnerabilities ● Consume and enrich SBOMs (CycloneDX or SPDX) ● Generate FOSS compliance documents, such as product Attribution Notices and SBOMs (CycloneDX or SPDX) Management Apps Open Knowledge Base SCA Tools
  • 49. © AboutCode - License: CC-BY-SA-4.0 - https://www.aboutcode.org The AboutCode stack: DejaCode [2] 49 Integrate all tools and data in one web-based application for SCA and compliance management ● Standard and custom reports ● JSON API and webhooks ● Built-in basic workflows ● Integrated with AboutCode SCA Tools and Open Knowledge Base Management Apps Open Knowledge Base SCA Tools