Quality on Autopilot: Scaling Testing in Uyuni

Download as PPTX, PDF

0 likes65 views

How do you scale quality in a fast-moving open source project like Uyuni? In this talk, I’ll walk you through our journey adopting TestOps practices to bring automation, visibility, and resilience into our quality engineering process. We’ll explore how infrastructure automation with Terraform and Salt laid the foundation for consistent, on-demand testing environments. From there, we introduced test observability using Prometheus and Grafana to analyze historical trends and gain visibility into test behavior over time. A key part of our process includes tracking flaky tests and known issues through a custom system that connects our internal QE workflows directly with test reports. To accelerate feedback loops, we implemented smart test selection in Pull Requests, targeting only the affected components with relevant end-to-end tests. We also integrated synthetic monitoring of key product metrics into our daily pipelines, ensuring continuous insight into overall health. Thanks to designing our own DevContainers, we improved engineer's experience on their IDEs. We are also working on providing public AWS-based test reports reinforcing transparency. Our journey is a story of small, iterative changes that helped us align quality with development at scale.

Software

Scaling Testing in
Uyuni
A TestOps Journey
opensuse-
oSC25
@openSUS
Óscar Barrios Torrero
QE Architect, SUSE

Navigating Rapid Evolution
Dynamic Development
Uyuni is a fast-moving project with frequent feature additions and enhancements.
The Quality Challenge
Rapid iteration inherently increases the risk of regressions and integration complexities.
Quality Engineering Focus
Emphasizing robust testing, thorough unit test and e2e tests on Pull Requests, Continuous Testing
pipeline
and TestOps.

The Problem Space
🎲
Unreliable (Flaky)
Acceptance Tests
≠
🖥️ 💻
Environment
Inconsistencies
🐌
Slow
Quality Feedback
🚀 Automation, Visibility, Scalability

Test Environment automation
⚙️
Infrastructure as Code with Sumaform
Terraform provisions VMs on KVM
Salt auto-configures test systems on these VMs
Consistent • Repeatable • On-Demand • Maintainable
Laying the Foundation: Infrastructure as
Code

Gaining Visibility: Observability in Tests
 Observe Test Infrastructure Metrics
 Monitor Test Trends & Stability
Use cases
• Historical Trends for Regressions & Flakiness
• Compare test suite: Uyuni / Multi-Linux
Manager
• Alerts if we reach defined limits
Monitoring
Stack

The Problem
• Flaky tests (pass/fail inconsistently).
• Rediscovering known issues wastes time.
• This is a challenge for the Round Robin “Test Geeko” role when reviewing CI Test reports.
Tackling Flakiness: Our Tracking System
Our Tracking System
• GitHub Project Board (Test Suite Status)
• Columns: New, Debugging, Bug, Test Framework Issue, Flaky Test, Fixed, etc..
• Card Title Format: Feature: <Feature Name> | Scenario: <Scenario Title>
• Detailed notes & history per card.

Solution: Tool requesting to GitHub GraphQL API
• Fetches cards from our specific project board.
• Parses card titles (Feature/Scenario).
• Automatically adds Cucumber tags (e.g., @flaky, @bug_reported) to scenarios in .feature files before
tests run.
Impact for Test Report Reviewer
• Visual status tags directly visible during test analysis, as part of the HTML report
• Immediate context on known issues, reducing redundant effort.
• Improved knowledge sharing & efficiency.
Goal: Bring Test Suite Status directly into the test execution context.
Automating Insight: GitHub-Driven Test
Tagging

Smart Test Selection
The Challenge
• Full test suites on every PR are slow & resource-intensive, delaying
feedback.
• How to run only relevant tests for a given change?
Our Solution
Code Coverage-Driven Test Selection

Smart Test Selection
1. Map Tests to Code
• Periodically run tests with coverage tools (e.g., JaCoCo for Java).
• Store a map: Code (method/class) -> Tests covering it (e.g., in
Redis).
Key Benefits
Faster feedback loops
Efficient resource usage
Increase developer confidence
2. Analyze PR
• Identify exact code methods/classes modified in the Pull Request.
3. Select & Execute
• Query the map to find tests associated with the changed code.
• Run this smaller, targeted suite for faster, meaningful feedback.

Goal: Inject synthetic checks into CI to monitor key product health metrics.
Fitness Functions
• Objective measures summarizing how close a given design solution is to achieving set aims.
• We track these via continuous testing to highlight impacts on customer experience and
motivate improvements.
• These metrics are pushed to Prometheus Push Gateway and visualized in Grafana.
Benefits
• Provides early warnings for product-level health
degradation.
• Ensures continuous insight into overall system
performance.
Continuous Insight: Synthetic
Monitoring

DevContainers Boost Onboarding
Challenge: Slow Onboarding & Inconsistent Developer Setups
Key Benefits
Fast and Consistent
Onboarding
Identical Environments for All
Our Solution: DevContainers
Pre-configured Docker environments for IDEs,
ensuring consistency.
• Uyuni Development: installs necessary build tools (Java, Ant, etc.).
• Test Framework Development: installs testing stack (Ruby, browser drivers,
etc.)

Transparency and Community
Engagement
Goal
True transparency, sharing Uyuni's ongoing quality insights openly.
Currently
We ensure Uyuni's quality with daily AWS test suite, analyzing reports via our internal
Jenkins.
🚧 The Gap
This vital test data isn't visible to our external community.
Our Solution
Export these daily reports to a public HTTP server, making them accessible to everyone.

Exploring the Future: AI in Our
TestOps
: AI-Driven Test Selection
🎓 Exploring AI to enhance test selection accuracy.
🎯 Aim: Even more precise & efficient PR testing.
AI-Powered Test Report Analysis : srbarrios/FailTale
🤖 Real-time evidence collection on errors, through a MCP Server
💡 AI provides root cause hints directly in HTML reports.

Takeaways
🌱 Our TestOps Journey: Achieved through small, iterative enhancements, not a big
bang.
TestOps is Dual-Fold
🤝 Cultural Shift: Embracing collaboration, shared ownership of quality, and continuous
improvement.
🤖 Technical Evolution: Implementing automation, observability, and smart solutions step-by-
step.
Your Path to "Quality on Autopilot"
📍 Start where you are, identify pain points.
✨ Introduce incremental changes.
🧩 Adapt TestOps principles to your unique context.

Questions? Happy to discuss!
Chat with me on @srbarrios:matrix.org
Connect with me on https://www.linkedin.com/in/oscarbarrios
Everything else on https://oubiti.com

Quality on Autopilot: Scaling Testing in Uyuni

1. Scaling Testing in Uyuni A TestOps Journey opensuse- oSC25 @openSUS Óscar Barrios Torrero QE Architect, SUSE

2. What is Uyuni? • Open-source software for managing large sets of Linux servers. • Provides a one central view for keeping your Linux fleet healthy, secure, and consistently configured. • Automates tedious IT tasks, freeing up your time. • Upstream of Multi-Linux Manager, backed by SUSE. Key Capabilities: 💻 Automated Patch & Package Management: Keep systems up-to-date and secure with ease. ⚙️ Configuration Management: Enforce consistent states across all your servers. ️ 🛡️ Security & Compliance: Audit for vulnerabilities (CVEs) and maintain security policies. 🌐 Multi-Distribution Support: Manages openSUSE distros, Ubuntu, Debian, and more.

3. Navigating Rapid Evolution Dynamic Development Uyuni is a fast-moving project with frequent feature additions and enhancements. The Quality Challenge Rapid iteration inherently increases the risk of regressions and integration complexities. Quality Engineering Focus Emphasizing robust testing, thorough unit test and e2e tests on Pull Requests, Continuous Testing pipeline and TestOps.

4. The Problem Space 🎲 Unreliable (Flaky) Acceptance Tests ≠ 🖥️ 💻 Environment Inconsistencies 🐌 Slow Quality Feedback 🚀 Automation, Visibility, Scalability

5. Test Environment automation ⚙️ Infrastructure as Code with Sumaform Terraform provisions VMs on KVM Salt auto-configures test systems on these VMs Consistent • Repeatable • On-Demand • Maintainable Laying the Foundation: Infrastructure as Code

6. Gaining Visibility: Observability in Tests  Observe Test Infrastructure Metrics  Monitor Test Trends & Stability Use cases • Historical Trends for Regressions & Flakiness • Compare test suite: Uyuni / Multi-Linux Manager • Alerts if we reach defined limits Monitoring Stack

7. The Problem • Flaky tests (pass/fail inconsistently). • Rediscovering known issues wastes time. • This is a challenge for the Round Robin “Test Geeko” role when reviewing CI Test reports. Tackling Flakiness: Our Tracking System Our Tracking System • GitHub Project Board (Test Suite Status) • Columns: New, Debugging, Bug, Test Framework Issue, Flaky Test, Fixed, etc.. • Card Title Format: Feature: <Feature Name> | Scenario: <Scenario Title> • Detailed notes & history per card.

8. Solution: Tool requesting to GitHub GraphQL API • Fetches cards from our specific project board. • Parses card titles (Feature/Scenario). • Automatically adds Cucumber tags (e.g., @flaky, @bug_reported) to scenarios in .feature files before tests run. Impact for Test Report Reviewer • Visual status tags directly visible during test analysis, as part of the HTML report • Immediate context on known issues, reducing redundant effort. • Improved knowledge sharing & efficiency. Goal: Bring Test Suite Status directly into the test execution context. Automating Insight: GitHub-Driven Test Tagging

9. Smart Test Selection The Challenge • Full test suites on every PR are slow & resource-intensive, delaying feedback. • How to run only relevant tests for a given change? Our Solution Code Coverage-Driven Test Selection

10. Smart Test Selection 1. Map Tests to Code • Periodically run tests with coverage tools (e.g., JaCoCo for Java). • Store a map: Code (method/class) -> Tests covering it (e.g., in Redis). Key Benefits Faster feedback loops Efficient resource usage Increase developer confidence 2. Analyze PR • Identify exact code methods/classes modified in the Pull Request. 3. Select & Execute • Query the map to find tests associated with the changed code. • Run this smaller, targeted suite for faster, meaningful feedback.

11. Goal: Inject synthetic checks into CI to monitor key product health metrics. Fitness Functions • Objective measures summarizing how close a given design solution is to achieving set aims. • We track these via continuous testing to highlight impacts on customer experience and motivate improvements. • These metrics are pushed to Prometheus Push Gateway and visualized in Grafana. Benefits • Provides early warnings for product-level health degradation. • Ensures continuous insight into overall system performance. Continuous Insight: Synthetic Monitoring

12. DevContainers Boost Onboarding Challenge: Slow Onboarding & Inconsistent Developer Setups Key Benefits Fast and Consistent Onboarding Identical Environments for All Our Solution: DevContainers Pre-configured Docker environments for IDEs, ensuring consistency. • Uyuni Development: installs necessary build tools (Java, Ant, etc.). • Test Framework Development: installs testing stack (Ruby, browser drivers, etc.)

13. Transparency and Community Engagement Goal True transparency, sharing Uyuni's ongoing quality insights openly. Currently We ensure Uyuni's quality with daily AWS test suite, analyzing reports via our internal Jenkins. 🚧 The Gap This vital test data isn't visible to our external community. Our Solution Export these daily reports to a public HTTP server, making them accessible to everyone.

14. Exploring the Future: AI in Our TestOps : AI-Driven Test Selection 🎓 Exploring AI to enhance test selection accuracy. 🎯 Aim: Even more precise & efficient PR testing. AI-Powered Test Report Analysis : srbarrios/FailTale 🤖 Real-time evidence collection on errors, through a MCP Server 💡 AI provides root cause hints directly in HTML reports.

15. Takeaways 🌱 Our TestOps Journey: Achieved through small, iterative enhancements, not a big bang. TestOps is Dual-Fold 🤝 Cultural Shift: Embracing collaboration, shared ownership of quality, and continuous improvement. 🤖 Technical Evolution: Implementing automation, observability, and smart solutions step-by- step. Your Path to "Quality on Autopilot" 📍 Start where you are, identify pain points. ✨ Introduce incremental changes. 🧩 Adapt TestOps principles to your unique context.

16. Questions? Happy to discuss! Chat with me on @srbarrios:matrix.org Connect with me on https://www.linkedin.com/in/oscarbarrios Everything else on https://oubiti.com

17. Scaling Testing in Uyuni A TestOps Journey opensuse- oSC25 @openSUS Óscar Barrios Torrero QE Architect, SUSE

Editor's Notes

#1: Hello everyone, and a big virtual welcome! I'm thrilled you're joining this session at the openSUSE Conference 2025 in Nuremberg. My name is Óscar, and I'm a QE Architect at SUSE. And yes, this is a recording! -- As you're watching this, I'm likely working from Spain, maybe even by a swimming pool, but please know that my heart is definitely there, with you all in Nuremberg today! -- I'm really excited to have this chance to share our journey with Uyuni, a key open-source project we're passionate about building. Over the next 30 minutes, I'll be talking about "Scaling Testing in Uyuni: A TestOps Journey." We'll explore how we're making sure a complex, fast-moving open-source project has rock-solid quality by putting our quality processes on "autopilot" using TestOps ideas. If you're interested in test automation, CI/CD, how to manage quality at scale, or just curious about what goes on behind the scenes in a project like Uyuni, then you've found the right session. So, let's get started and dive into how we're aiming for the stars with quality in Uyuni!
#2: Alright, before we dive into how we test Uyuni, let's quickly cover What is Uyuni? for those who might be new to it. In essence, Uyuni is a powerful, open-source software solution made specifically for managing many Linux servers. If you're dealing with a growing number of Linux systems—from a few to hundreds or even thousands—and find yourself spending a lot of time on regular maintenance, struggling with different setups, or constantly working to make sure security rules are met, Uyuni is built to solve these exact problems. -- Think of it as giving you one central view for keeping all your Linux systems healthy, safe, and set up the same way. Its goal is to give you control and a clear picture of everything. -- One of its main aims is to automate those boring IT tasks—like patching, configuring, and checking systems—that can take up so much valuable time. By automating these, Uyuni frees up IT pros and DevOps teams to focus on bigger, more important projects instead of getting stuck in daily operational fires. -- It's also important to know that Uyuni is the upstream, community-driven project for SUSE Multi-Linux Manager. This means it has strong backing from SUSE and a lively open-source community, which helps drive its new ideas and improvements. -- Now, let's talk about some of its Key Capabilities that make this possible: -- Automated Patch and Package Management Uyuni is great to keep your systems up-to-date and secure easily. You can schedule updates, manage software, and make sure bugs are fixed quickly across all your systems. -- Robust Configuration Management Using tools like Salt which is deeply built-in or Ansible that is also supported, Uyuni lets you set and keep consistent setups across all your servers. This means you can say goodbye to different configurations and make sure your systems are set up exactly as they should be. -- Security and Compliance Uyuni offers tools to check your systems for vulnerabilities, using information from CVE databases. It helps you keep security rules and create reports, which is key for meeting compliance needs. -- Multi-Distribution Support While it has strong ties to openSUSE and SUSE2, Uyuni is designed to manage many different Linux versions, including Ubuntu, Debian, Alma Linux, Rocky Linux, and others. This makes it incredibly flexible for environments with various systems. So, in short, if you're looking for a complete, open-source tool to make your Linux setup management easier, automate maintenance, boost security, and ensure consistency at scale, Uyuni is definitely a project worth exploring. It's this complex and important system that our TestOps journey is focused on making sure the quality of.
#3: One of the exciting, but challenging, parts of the Uyuni project is how fast it moves. We’re frequently adding new features, and supporting new operating systems. -- But this speed brings a big problem: keeping quality high and consistent. Rapid iteration inherently increases the risk of regressions and integration complexities -- With many changes and frequent updates, making sure new additions work well and don't break existing things takes a lot of careful work. We focus on embedding quality throughout the development cycle with strong unit and end-to-end tests on pull requests, backed by a continuous testing pipeline for our main branch, and for sure, good TestOps practices. This ensures faster feedback and higher confidence in every change.
#4: So, before we fully embraced TestOps with Uyuni, we faced some familiar hurdles that directly impacted our speed and confidence. I want to quickly outline this 'problem space' because it's what drove our need for change. -- First, slow quality feedback. Developers would submit their code, but it took too long to get full test results. This meant bugs were found late, leading to constant switching between tasks and, in the end, slower fixes. It really dragged down our ability to move quickly. -- Next, the challenge of flaky tests. Some tests would pass, then fail, then pass again, all without any changes to the code. These were super frustrating because they made us lose trust in our automated testing. When the team didn't believe the test results, real bugs could slip through, and a lot of time was wasted chasing problems that weren't actually there. -- And thirdly, environment inconsistencies. This was the classic "it works on my machine!" problem. Tests acted differently on different developer setups and in our continuous integration system, which ate up a lot of time. We spent too much time fixing environment quirks instead of actual code issues, creating a big bottleneck. -- These weren't just one-off problems; they were patterns that limited our ability to improve quality at scale. It became very clear we urgently needed: More Automation – especially for environments. Better Visibility – to truly understand test data and trends. And robust Scalability – to handle Uyuni's continued growth. This urgent need for automation, visibility, and scalability is what truly started us on our TestOps journey.
#5: So, a super important first step in our TestOps journey was to update how we set up our test environments. We couldn't build reliable automated tests on something that was unstable or different every time. Our goal was to stop setting things up by hand and move to something strong and efficient. -- So, we started using Infrastructure as Code. For Uyuni, we rely a lot on Sumaform. Sumaform is our special set of tools, like Terraform modules and Salt formulas, made specifically for SUSE Multi-Linux Manager and Uyuni setups. By writing down how our test systems should be set up in code, we make sure that every environment is built exactly the same way, every single time. This gets rid of a huge reason for problems like "it works on my machine!" -- Looking a bit closer, Terraform is the tool that actually creates the virtual machines. For us, it mostly works with KVM (a way to run virtual machines), but now we also support setting things up on big cloud services like AWS. So, when a set of tests needs certain machines—for example, a server, a proxy, and several client machines—Terraform reads our Sumaform settings and automatically creates those virtual machines with the right basic features like CPU, memory, and network setup. -- Once those basic virtual machines are ready, Salt takes over for the detailed automatic setup. Salt makes sure that each test system within those virtual machines is perfectly set up for its job. This includes installing needed software, setting up Uyuni parts, configuring services, and basically turning a basic operating system into a fully working Uyuni system or client, ready for testing. -- And the results of making this change are that our test environments are now: Consistent: Every environment built from the same code is identical. Repeatable: We can easily take down and rebuild environments, which is key for clean test runs. On-Demand: We can create these complex environments quickly whenever we need them, instead of relying on a small number of fixed setups. Maintainable: Changes and updates to how the environment is set up are tracked in code and rolled out in a structured way. -- This automated, code-driven way of handling our test environments built the essential base for making our entire testing process bigger and better.
#6: Once we had our automated environments, the next big step was to truly understand what was happening inside them and with our tests. This is where observability in our testing process comes in. It helps us go beyond just "pass" or "fail" to get much deeper insights. -- To do this, we brought together three powerful open-source tools: Prometheus, Grafana, and AlertManager. Prometheus is our data collector. Using tools like our Jenkins Exporter and Blackbox Exporter, it gathers a lot of information from our test runs. This includes things like how long tests take, how often they pass or fail, what features and situations they cover, and even how much of our test system's resources are being used. Then, Grafana takes all that data and lets us create helpful and visual dashboards. It's how we see these numbers and trends, making it easy to spot patterns and unusual activity. Finally, AlertManager sends out alerts when a specific condition is met in our data, sending notifications via email. -- So, what are we observing? First, we started watching our test infrastructure metrics. --- This helps us understand if problems in the system itself, like overloaded KVM hosts or network issues, are affecting how reliable or fast our tests are. -- After that, we collect and monitor test trends and stability. -- We track how specific tests or groups of tests are behaving over time. Are they getting slower? Are certain tests failing often? -- This ability to observe has some really valuable uses for us: -- One key use is analyzing historical trends to identify regressions and flaky behavior. For instance, if we see a sudden jump in failures or a test that starts failing sometimes after a specific code change, our Grafana dashboards show this right away. This helps us figure out when a problem was introduced or identify a test that needs to be fixed because it's unreliable. -- Another powerful way we use this is by comparing trends between different products or versions. For example, we can directly compare how well tests perform and how stable they are between our core Uyuni project and its related product, Multi-Linux Manager. This is super helpful for understanding how changes affect things as they move from one version to another, making sure everything stays consistent and helping us catch issues early that might only affect one or the other. -- By having this level of visibility, we can make data-driven decisions, react faster to issues, raise alerts and continuously improve both our tests and the product itself.
#7: One of the ongoing problems when you have many automated tests is dealing with flaky tests. --- These are tests that sometimes pass and sometimes fail, even when nothing in the main product code has changed. It's also hard to efficiently manage problems we already know about. This is especially important for our "Round Robin Test Geeko" (RRTG) role. This is a changing job where a team member checks our automated test reports. If they see a failure, they need to figure out if it's a new problem, something flaky, or a problem that's already been looked at or even has a bug report. Without a good system, we found that different RRTGs might end up investigating the same known problems again, which wastes everyone's time. --- So, our first important step was to create a strong way to track these issues using a GitHub Project Board. This board, which we call our "Test Suite Status," has different sections (columns) like: 'New': for problems found recently. 'Debugging': for problems currently being investigated. 'Bug': for confirmed problems in the product itself. 'Test Framework issue': if the problem is in the test code. And also, a column to group 'Flaky Test’! --- When the Test Geeko finds a test problem that needs to be tracked, they create a card on this board. These cards follow a standard title format: Feature: <name> | Scenario: <name>. All the details from the investigation, logs, error messages, and any comments or updates are then written directly on that card. This board becomes our main, reliable place to find out the status of these difficult test situations.
#8: -- While having this GitHub board was a big improvement for tracking issues, we wanted to do even more. We asked ourselves: how can we make the information from this board even easier to see and use for the Round Robin Test Geeko (RRTG), especially when they're looking at test results or the tests themselves? -- This led us to create a special automated tool—a Ruby script—that closely connects our GitHub project board with our Cucumber test framework. The main goal was to show the status of these tracked problems directly within our test system. Here’s how this script operates: Before any set of tests runs, the script connects to the GitHub GraphQL API. It's set up to ask for information from our special "Test Suite Status" board within the SUSE organization. It gets the cards from the different columns like 'Flaky Tests', 'Debugging', 'Bugs', and so on. Then, it reads the card titles to find the specific Feature and Scenario names. Then, here’s the really clever part: the script automatically adds Cucumber tags right into our Gherkin .feature files. These tags match the column the card is in on the board. So, a test scenario might automatically get tagged with @flaky, or @bug_reported, or @under_debugging. --- The effect of this automation is very important for the person reviewing test reports. When a test fails, they can immediately see these tags. If it’s already tagged as @flaky or @under_debugging, they instantly understand the situation. They know it’s likely a known problem, can quickly find the GitHub card for all the past details, and avoid starting a new investigation from the beginning. --- This has greatly improved how efficient we are, made it easier for the team to share knowledge, and helps us put our effort where it's most needed. ---
#9: Now, a very important part of our TestOps journey in Uyuni has been putting in place Smart Test Selection for Pull Requests. I actually had the chance to talk a lot about how this works at SeleniumConf earlier this year in Valencia. I'll quickly explain the main idea to you now, because it's key to how we get feedback faster. The problem we faced, like many others, was that running all our tests for every single Pull Request takes a huge amount of time and uses a lot of computer resources. This often means developers have to wait a very long time for feedback, and much of that feedback might come from tests that aren't even related to their specific changes. We asked ourselves: how can we smartly pick only the tests that truly matter for a given set of code changes? --- Our answer is to use code coverage data to make this choice.
#10: It's a three-step process. -- First, we connect our tests to the code. We regularly run our tests using tools that check code coverage, like JaCoCo for the Java parts of Uyuni. This makes detailed reports showing exactly which tests check which specific pieces of code, like methods or classes. We then take this information and save it in a way that lets us search it quickly, for example, in a Redis database. -- Then, when a Pull Request comes in, we look at it carefully to find the exact code files, methods, or classes that have been changed. -- Finally, we choose and run the tests. We use the information from the Pull Request to search our code coverage map. This tells us exactly which tests are connected to the changed code. Only this small, specific group of tests is then run to check the Pull Request. -- The benefits are big: developers get feedback much faster and it's more helpful to their changes. We use our CI (Continuous Integration) resources better, and we are certain that we are testing the right things at the right time. This is the system we've built into our GitHub Actions workflow for Uyuni.
#11: But then, beyond testing code changes on Pull Requests, we wanted a way to always see how well the important parts of our product were working. And this is where synthetic monitoring comes into our automated test processes. -- First, we defined important product actions and how well they should perform as 'Fitness Functions.’ A fitness function is basically a way to measure how well a specific part of our system meets its design goals or users' expectations. So, how do we watch these? We've made special Ruby code that can, for example, measure how long it takes for a system to bootstrap, or how long it takes for a system to be onboarded. Another very important one for Uyuni is how long it takes to synchronize products. This code gets these times by reading logs or asking information throw Uyuni API. Then, our Quality Intelligence handler, using the Metrics Collector Handler, sends these measurements (like system_bootstrap_duration_seconds or product_synch_duration_seconds) to our Prometheus Push Gateway. -- This data then shows up in Grafana dashboards like the one you see here. --- By constantly watching these fitness functions, we get early warnings if a main product feature starts degrading. For instance, if the product synchronization time suddenly jumps up, or bootsr times slowly increase, it's a clear sign of a possible problem with the overall product's health, even if all individual tests are passing. This method helps us see any bad effects on what could be the customer's experience and encourages the team to investigate and make improvements before problems get big. It's about making sure the whole system is healthy, not just that each small part works correctly.
#12: Another important part of our TestOps journey, and something that really helps us work better and together, is how we've made our Development Environment (IDE) setup and the process for new developers much simpler. This helps both experienced team members and new open-source contributors. We've all had this problem: it's a pain to set up a complex development environment on your own computer, dealing with "it works on my machine" issues, or spending days just trying to get a new team member ready to work. --- To fix this, we are using DevContainers. These are like ready-made Docker environments that your IDE can open and use. This makes sure everyone works with the exact same setup. We've made two main DevContainers for the Uyuni project: First, the uyuni-dev-container. This one is for developing the main Uyuni product. Its Dockerfile installs things like the correct Java version, Ant, Ivy, and other tools needed to build Uyuni itself. The devcontainer.json file also sets things like environment variables (for example, JAVA_HOME) and commands to run after the container is created, like running an Ivy build to get necessary files. Second, we have the uyuni-test-container. This one is just for developing our many tests. Its Dockerfile installs Ruby, Cucumber, web drivers like ChromeDriver, and all the other tools needed for writing and running tests. The devcontainer.json for this one sets up different environment variables that point to test systems, registries, and settings for our test framework. So, a developer simply opens the Uyuni project in their IDE (like VS Code). If the IDE sees a DevContainer setup, it offers to build and open it. --- In just minutes, they have a complete, consistent environment with all tools and needed files installed and set up. This is perfectly aligned with everyone else on the team and our automated systems. The benefits are huge: setting things up is much, much easier. We've greatly reduced problems related to different environments. And new contributors can start coding incredibly fast. It really helps everyone focus on developing rather than struggling with their computer setup. It's important to say that this DevContainers work is ongoing; we are always making it better. We are actively encouraging our developers and the wider community to use them, as we believe they make it much easier to contribute and improve the overall experience for working on Uyuni.
#13: A main idea for a healthy open-source project is transparency. This also includes how we show the quality and stability of Uyuni. We want to build trust and responsibility with our users and those who help us. -- Right now, we have a complete Uyuni test suite that runs every day in AWS. These tests are started, and the test reports are shown and looked at through a Jenkins system that is inside our own network. This is great for our internal team, but the problem is that these detailed reports are not easy for outside people or the wider community to see. -- So, to make this better, our goal is to make our test reports available to everyone. We believe this is key to building stronger trust and openly showing our commitment to quality. -- This is still a work in progress. We are actively creating a way to securely share these daily test reports. The plan is to then put these reports on a public website. This way, anyone who is interested can see the latest status of our tests, understand what is passing, and get ideas about the ongoing quality work for Uyuni. --- Our goal here is clear: to openly show the quality of Uyuni. We think this will not only encourage more people from the community to get involved, but also strengthen our promise to deliver a strong and dependable open-source solution.
#14: Beyond our current TestOps practices, we're actively looking at how Artificial Intelligence can make our quality work even better. This is still new and experimental, but we see exciting possibilities in a couple of areas. -- First, we're excited to explore how AI can improve our Smart Test Selection. This is part of a Google Summer of Code Project I'm mentoring this year. While our current smart selection works well, we think AI could make it even more exact. By teaching a computer model using past data and other information we'll explore, the AI could potentially find with even greater accuracy which specific tests are most important to run for any given Pull Request. This would make our feedback loops even faster and more efficient. -- Secondly, we're looking into AI-powered test report analysis. The idea here is that when a set of tests shows an error, an AI agent can immediately start working. This isn't just passively looking at data. The system is designed to connect through SSH to the specific systems involved in the failed test. Once connected, it gathers special evidence made for that system's role. For example, it might collect specific service logs from a Uyuni server, or client registration details from a client machine. The AI then looks at this collected data and tries to give an early idea about what might be causing the problem, putting this information directly into our HTML test reports. This could greatly speed up how fast our engineers find and fix problems, by giving them very relevant, context-aware information. These are early days for using AI in our Uyuni testing, but it’s an area we're very interested in and experimenting with to push the limits of what's possible in automated quality checks.
#15: So, as we wrap up, what are the main things I hope you take away from our Uyuni TestOps journey? --- First, and most important, this was not an overnight transformation. Our progress towards 'Quality on Autopilot' has been a story of small, iterative improvements. We didn't try to boil the ocean; instead, we focused on making incremental changes, learning from each one, and building upon that success. -- Second, it's crucial to understand that TestOps is more than just implementing new tools or scripts. It's truly a dual-fold shift. It's a cultural shift within the team: fostering closer collaboration between development and QE, making quality a shared responsibility, and cultivating a mindset of continuous learning and improvement. And yes, it's also a technical evolution: strategically adopting automation for infrastructure and testing, gaining deep observability into our processes, and implementing smarter solutions like targeted test selection and synthetic monitoring. --- If you're looking to enhance your own quality engineering practices, I encourage you to view TestOps as an adaptable set of principles, not a rigid prescription. Start by identifying your most significant pain points, introduce changes incrementally, and tailor your approach to your project's specific needs and context. You don't have to do everything at once. -- In the end, building a robust, scalable, and efficient quality process – achieving that sense of 'Quality on Autopilot' – is an ongoing journey. It's built step-by-step, through consistent effort, a wish to adapt, and a commitment to both the cultural and technical aspects of TestOps. Thank you.

Quality on Autopilot: Scaling Testing in Uyuni

More Related Content

Similar to Quality on Autopilot: Scaling Testing in Uyuni (20)

Recently uploaded (20)

Quality on Autopilot: Scaling Testing in Uyuni

Editor's Notes