In startups, there's no concept of being on call. At Alma, Everyone is on call. Always. At 2 AM when our document processing pipeline went down, it wasn't the "designated on-call engineer" who fixed it. It was whoever saw the Slack notification first. But here's what's changing fast - AI is reshaping what "on-call" actually means. Traditional on-call: Get paged → SSH into server → Debug logs → Apply hotfix → Write postmortem AI-enhanced on-call: Get intelligent alert → AI suggests root cause → Review proposed fix → Approve deployment → AI writes incident summary A few startups are already building this. The engineer's role is shifting from "human debugger" to "decision maker" btw, if you're an immigrant who's exploring work visa options like O-1, H-1B, EB-1A etc. - get in touch with us at Alma.
Introducing AI to synthesize logs and error reports is great, but it's not sensible to have the entire team rallying on this in the middle of the night except in truly extraordinary circumstances. Rotations exist, because having your entire team on edge about being awakened at 2:00 all the time is unhealthy and counterproductive. One notable time for me was about 10 years ago. AWS had a major component fail, and we needed to redesign a chunk of architecture to bypass it on the fly. Even still, there was a standard on-call. We just literally picked up the phone to bring additional people in as-needed. We were also given the following day off, because it wasn't resolved until sunrise.
So the engineer's role shifts from "human debugger" to "human who gets blamed when the AI's proposed fix wipes the production database."
Regardless of the exact oncall process, this approach only works if there is a direct relationship between company success and personal reward. It's easy to always be implicitly available if there is a positive consequence to it.
Any foreseeable risks with hallucinations/errors if the engineer is not effective when they are needed?
Thank you for this information!
Interesting! We’re building the same for databases. Cutting down 90% of time that goes into forming hypotheses and testing them - just to conclude that "'X is the problem." Once identified, then having AI agents implement the fix, while you approve it like a master. Of course, it’s not that simple. There are plenty of guardrails and agent-level optimisations behind the scenes. But I believe we are moving in the right direction :)
#foundermentality
The future of on-call isn’t about waking up faster, it’s about needing to wake up less. AI’s turning alerts into actions, and engineers into approvers.
CTO | Co-Founder at payFURL
2moThis is quite poor advice. If everyone is on call, then nobody ever sleeps. It means that you have an overworked team who will make poor decisions.