How AI is changing the on-call experience for startups

View profile for Shuo Chen

CTO & Co-founder of Alma

In startups, there's no concept of being on call. At Alma, Everyone is on call. Always. At 2 AM when our document processing pipeline went down, it wasn't the "designated on-call engineer" who fixed it. It was whoever saw the Slack notification first. But here's what's changing fast - AI is reshaping what "on-call" actually means. Traditional on-call: Get paged → SSH into server → Debug logs → Apply hotfix → Write postmortem AI-enhanced on-call: Get intelligent alert → AI suggests root cause → Review proposed fix → Approve deployment → AI writes incident summary A few startups are already building this. The engineer's role is shifting from "human debugger" to "decision maker" btw, if you're an immigrant who's exploring work visa options like O-1, H-1B, EB-1A etc. - get in touch with us at Alma.

David Cameron🇺🇦

CTO | Co-Founder at payFURL

2mo

This is quite poor advice. If everyone is on call, then nobody ever sleeps. It means that you have an overworked team who will make poor decisions.

Ryan Bright

Engineering at Moonrise Labs

2mo

Introducing AI to synthesize logs and error reports is great, but it's not sensible to have the entire team rallying on this in the middle of the night except in truly extraordinary circumstances. Rotations exist, because having your entire team on edge about being awakened at 2:00 all the time is unhealthy and counterproductive. One notable time for me was about 10 years ago. AWS had a major component fail, and we needed to redesign a chunk of architecture to bypass it on the fly. Even still, there was a standard on-call. We just literally picked up the phone to bring additional people in as-needed. We were also given the following day off, because it wasn't resolved until sunrise.

Mohammad Shaheer Zaman

Software Engineer | Founder and CEO at ibrazain.ai | IIIT-H

2mo

So the engineer's role shifts from "human debugger" to "human who gets blamed when the AI's proposed fix wipes the production database."

Nikola Mušikić

Senior Backend Engineer

2mo

Regardless of the exact oncall process, this approach only works if there is a direct relationship between company success and personal reward. It's easy to always be implicitly available if there is a positive consequence to it.

Like
Reply
Phil Schnee

GTM @ DuploCloud | AI + DevOps ☁️

2mo

Any foreseeable risks with hallucinations/errors if the engineer is not effective when they are needed?

Like
Reply
Caleb Lawson

Senior Software Engineer

2mo

Thank you for this information!

Shiva Pundir

Data to Decisions — Instantly 🚀 | Co-Founder @ incerto.in

2mo

Interesting! We’re building the same for databases. Cutting down 90% of time that goes into forming hypotheses and testing them - just to conclude that "'X is the problem." Once identified, then having AI agents implement the fix, while you approve it like a master. Of course, it’s not that simple. There are plenty of guardrails and agent-level optimisations behind the scenes. But I believe we are moving in the right direction :)

Gorish Aggarwal

CEO@Sybill - (AE+CRO)’s best friend

2mo

#foundermentality

Vitalii Honcharenko

Software Development Services at Yael Acceptic | AI/ML/NLP, Web & Mobile, IoT, E-commerce/learning, Fin/Insur/PropTech, Logistics

2mo

The future of on-call isn’t about waking up faster, it’s about needing to wake up less. AI’s turning alerts into actions, and engineers into approvers.

See more comments

To view or add a comment, sign in

Explore content categories