Top LinkedIn Content on Advancing AI Development

Head of AIOps @ IBM || Speaker | Lecturer | Advisor

243,745 followers 1y

𝗜𝗳 𝘆𝗼𝘂 𝗳𝗼𝗹𝗹𝗼𝘄 𝘁𝗵𝗲 𝗻𝗲𝘄𝘀, 𝘆𝗼𝘂’𝘃𝗲 𝗽𝗿𝗼𝗯𝗮𝗯𝗹𝘆 𝘀𝗲𝗲𝗻 𝗶𝘁 𝗮𝗹𝗹: 𝗔𝗜 𝗶𝘀 𝗯𝗼𝗼𝗺𝗶𝗻𝗴. 𝗔𝗜 𝗶𝘀 𝗼𝘃𝗲𝗿𝗵𝘆𝗽𝗲𝗱. 𝗔𝗜 𝘄𝗶𝗹𝗹 𝘀𝗮𝘃𝗲 𝘂𝘀. 𝗔𝗜 𝘄𝗶𝗹𝗹 𝗱𝗲𝘀𝘁𝗿𝗼𝘆 𝗷𝗼𝗯𝘀. The Stanford University AI Index 2025 cuts through all of it. Produced by the Institute for Human-Centered Artificial Intelligence, it’s one of the most respected and data-driven reports on the state of AI today. Over 400+ pages of concrete insights — from technical benchmarks and real-world adoption to policy shifts, economic impact, education, and public sentiment. 𝗧𝗵𝗲 2025 𝗲𝗱𝗶𝘁𝗶𝗼𝗻 𝗱𝗿𝗼𝗽𝗽𝗲𝗱 𝗹𝗮𝘀𝘁 𝘄𝗲𝗲𝗸. 𝗛𝗲𝗿𝗲 𝗮𝗿𝗲 12 𝗸𝗲𝘆 𝘁𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀: 1. 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀 𝗮𝗿𝗲 𝗯𝗲𝗶𝗻𝗴 𝗰𝗿𝘂𝘀𝗵𝗲𝗱. ➝ AI performance on complex reasoning and programming tasks surged by up to 67 percentage points in just one year. 2. 𝗔𝗜 𝗶𝘀 𝗻𝗼 𝗹𝗼𝗻𝗴𝗲𝗿 𝘀𝘁𝘂𝗰𝗸 𝗶𝗻 𝘁𝗵𝗲 𝗹𝗮𝗯. ➝ 223 FDA-approved AI medical devices. Over 150,000 autonomous rides weekly from Waymo. This is mainstream adoption. 3. 𝗕𝘂𝘀𝗶𝗻𝗲𝘀𝘀 𝗶𝘀 𝗴𝗼𝗶𝗻𝗴 𝗮𝗹𝗹-𝗶𝗻. ➝ $109B in U.S. private AI investment. 78% of organizations using AI. Productivity gains are no longer theoretical. 4. 𝗧𝗵𝗲 𝗨.𝗦. 𝗹𝗲𝗮𝗱𝘀 𝗶𝗻 𝗾𝘂𝗮𝗻𝘁𝗶𝘁𝘆—𝗖𝗵𝗶𝗻𝗮’𝘀 𝗰𝗮𝘁𝗰𝗵𝗶𝗻𝗴 𝘂𝗽 𝗼𝗻 𝗾𝘂𝗮𝗹𝗶𝘁𝘆. ➝ Chinese models now rival U.S. models on MMLU, HumanEval, and more. Global AI is becoming a multi-polar game. 5. 𝗥𝗲𝘀𝗽𝗼𝗻𝘀𝗶𝗯𝗹𝗲 𝗔𝗜 𝗶𝘀 𝗹𝗮𝗴𝗴𝗶𝗻𝗴 𝗯𝗲𝗵𝗶𝗻𝗱 𝗶𝗻𝗻𝗼𝘃𝗮𝘁𝗶𝗼𝗻. ➝ Incidents are rising, but standardized RAI benchmarks and audits are still rare. Governments are stepping in faster than vendors. 6. 𝗚𝗹𝗼𝗯𝗮𝗹 𝗼𝗽𝘁𝗶𝗺𝗶𝘀𝗺 𝗶𝘀 𝗿𝗶𝘀𝗶𝗻𝗴—𝗯𝘂𝘁 𝗻𝗼𝘁 𝗲𝘃𝗲𝗻𝗹𝘆. ➝ 83% of people in China are optimistic about AI. In the U.S., that number is just 39%. 7. 𝗔𝗜 𝗶𝘀 𝗴𝗲𝘁𝘁𝗶𝗻𝗴 𝗰𝗵𝗲𝗮𝗽𝗲𝗿, 𝘀𝗺𝗮𝗹𝗹𝗲𝗿, 𝗮𝗻𝗱 𝗳𝗮𝘀𝘁𝗲𝗿. ➝ The cost of GPT-3.5-level inference dropped 280x in two years. Open-weight models are nearly matching closed ones. 8. 𝗚𝗼𝘃𝗲𝗿𝗻𝗺𝗲𝗻𝘁𝘀 𝗮𝗿𝗲 𝗿𝗲𝗴𝘂𝗹𝗮𝘁𝗶𝗻𝗴 𝗮𝗻𝗱 𝗶𝗻𝘃𝗲𝘀𝘁𝗶𝗻𝗴. ➝ From Canada’s $2.4B to Saudi Arabia’s $100B push—states aren’t watching from the sidelines anymore. 9. 𝗘𝗱𝘂𝗰𝗮𝘁𝗶𝗼𝗻 𝗶𝘀 𝗲𝘅𝗽𝗮𝗻𝗱𝗶𝗻𝗴—𝗯𝘂𝘁 𝗿𝗲𝗮𝗱𝗶𝗻𝗲𝘀𝘀 𝗹𝗮𝗴𝘀. ➝ Access is improving, but infrastructure gaps and lack of teacher training still limit global reach. 10. 𝗜𝗻𝗱𝘂𝘀𝘁𝗿𝘆 𝗶𝘀 𝗱𝗼𝗺𝗶𝗻𝗮𝘁𝗶𝗻𝗴 𝗺𝗼𝗱𝗲𝗹 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁. ➝ 90% of top AI models now come from companies—not academia. The gap between top players is shrinking fast. 11. 𝗔𝗜 𝗶𝘀 𝘀𝗵𝗮𝗽𝗶𝗻𝗴 𝘀𝗰𝗶𝗲𝗻𝗰𝗲. ➝ AI-driven breakthroughs in physics, chemistry, and biology are earning Nobel Prizes and Turing Awards. 12. 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗿𝗲𝗺𝗮𝗶𝗻𝘀 𝘁𝗵𝗲 𝗰𝗲𝗶𝗹𝗶𝗻𝗴. ➝ Despite all the progress, models still struggle with logic-heavy tasks. Precision is still a challenge. You can download the full report FREE here: https://lnkd.in/dzzuE5tN

83 Comments

Jyothish Nair

Doctoral Researcher in AI Strategy & Human-Centred AI | Technical Delivery Manager at Openreach

19,903 followers 2mo

Change is rarely blocked by technology. It is usually blocked by what the technology implies. Over the past year, I have been observing how different industries respond to AI. The pattern is consistent. People closest to learning and reinvention tend to move first. People closest to reputation and responsibility tend to pause. From my research, I have learned that most hesitation is not a knowledge gap. It is a risk calculation, wrapped in a story. Reaction sounds like: “This feels like hype.” “We are doing fine without it.” “Why buy a Ferrari if the current car, even though we can afford better, still runs?” Presence sounds like: “What problem are we solving?” “What would make this safe to scale?” “How do we keep trust while we move faster?” When you look at the data, the same blockers appear repeatedly: skills gaps, poor data quality, privacy and security concerns, integration challenges, ethics, and unclear regulation. Here is what shifts adoption from stalled to steady: Treat AI as a capability, not an experiment. ↳If it remains a side project, it stays fragile. Start with one clear use case. ↳Resistance drops when the value is specific. Make data readiness unglamorous and non-negotiable. ↳AI is only as reliable as the information it depends on. Lower the fear of “getting it wrong”. ↳People do not experiment when mistakes feel career-limiting. Name the real worry. In many cases, the unspoken question is, “Where do I fit if this works?” →Choose the right model for the job. →Sometimes that is a smaller, more controllable model. →Sometimes it is a larger one with stronger safeguards. →The point is fit, not fashion. Put governance on the same timeline as delivery. Speed without guardrails creates backlash later. Invest in AI literacy across the organisation. Not everyone needs to build models, but everyone should understand the limits and responsible use. The organisations that move fastest are not the most aggressive. They are the calmest. They create clarity, make learning safe, and treat trust as part of the design. That is what composure looks like when the world changes. Sources I drew on (for the data points and recurring barriers). ➕ Follow (Jyothish Nair) for reflections on AI, change, and human-centred AI. #ArtificialIntelligence #AIAdoption #DigitalTransformation #FutureOfWork

199 Comments

Brij kishore Pandey

AI Architect & Engineer | AI Strategist

724,474 followers 1mo

Claude Code's source code leaked last week. 512,000 lines of TypeScript. Most people focused on the drama. I focused on the memory architecture. Here's how Claude Code actually remembers things across sessions — and why it's a masterclass in agent design: 𝗧𝗵𝗲 𝟯-𝗟𝗮𝘆𝗲𝗿 𝗠𝗲𝗺𝗼𝗿𝘆 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: 𝗟𝗮𝘆𝗲𝗿 𝟭 — 𝗠𝗘𝗠𝗢𝗥𝗬. 𝗺𝗱 (𝗔𝗹𝘄𝗮𝘆𝘀 𝗟𝗼𝗮𝗱𝗲𝗱) A lightweight index file. Not storage — pointers. Each line is under 150 characters. First 200 lines get injected into context at every session start. It points to topic files. It never holds the actual knowledge. Think of it as a table of contents, not the book. 𝗟𝗮𝘆𝗲𝗿 𝟮 — 𝗧𝗼𝗽𝗶𝗰 𝗙𝗶𝗹𝗲𝘀 (𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱) Detailed knowledge spread across separate markdown files. Architecture decisions. Naming conventions. Test commands. Loaded only when MEMORY. md says they're relevant. Not everything gets loaded. Only what's needed right now. 𝗟𝗮𝘆𝗲𝗿 𝟯 — 𝗥𝗮𝘄 𝗧𝗿𝗮𝗻𝘀𝗰𝗿𝗶𝗽𝘁𝘀 (𝗚𝗿𝗲𝗽-𝗕𝗮𝘀𝗲𝗱 𝗦𝗲𝗮𝗿𝗰𝗵) Past session transcripts are never fully reloaded. They're searched using grep for specific identifiers. Fast. Deterministic. No embeddings. No vector DB. Just plain text search when the first two layers aren't enough. But here's the part that blew my mind: 𝗦𝗸𝗲𝗽𝘁𝗶𝗰𝗮𝗹 𝗠𝗲𝗺𝗼𝗿𝘆. The agent treats its own memory as a hint, not a fact. Memory says a function exists? → Verify against the codebase first. Memory says a file is at this path? → Check before using it. And one more design principle hidden in the code: If something can be re-derived from source code — it doesn't get stored. Code patterns, conventions, architecture? Excluded from memory saves entirely. Because if it can be looked up, it shouldn't be remembered. 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀 𝗯𝗲𝘆𝗼𝗻𝗱 𝗖𝗹𝗮𝘂𝗱𝗲 𝗖𝗼𝗱𝗲: This 3-layer pattern is model-agnostic. Any team building AI agents can steal this: → Keep your always-loaded context tiny → Reference everything else via pointers → Never persist what can be looked up → Treat memory as a hint, not truth The future of AI agents isn't about how much they remember. It's about how well they forget. What memory patterns are you using in your agent builds?

54 Comments

Montgomery Singman

Managing Partner @ Radiance Strategic Solutions | xSony, xElectronic Arts, xCapcom, xAtari

27,722 followers 5mo

The AI race isn’t just about smarter models anymore—it’s about who controls the silicon and the stack. Google, NVIDIA, and a shifting center of gravity Google’s Gemini 3 launch, backed by in-house Tensor ASICs, has forced even Nvidia and OpenAI to publicly tip their hats—an unusual moment of mutual acknowledgement in a fiercely competitive market. At the same time, Google’s stock jumped while Nvidia’s dipped, underscoring how capital markets are already repricing what “AI leadership” might look like when hyperscalers own more of the hardware narrative. ASICs vs GPUs: control vs versatility Nvidia and AMD still dominate with GPUs that serve broad, complex workloads and are wrapped in a mature software and data center ecosystem that is very hard to displace. Google’s Tensor chips, as ASICs, trade that general-purpose versatility for efficiency on narrower, highly-optimized AI tasks—enough to attract interest from Meta and Anthropic, but not yet enough to unseat Nvidia’s platform-scale advantage. Ecosystems, not winners, will define value Gemini 3 now tops many public benchmarks across text and image tasks, but other models outperform it on search and specialized use cases—a reminder that “best model” is becoming context-dependent. The more interesting story is ecosystem interdependence: Google is both a rival and a major Nvidia customer, and enterprises are increasingly assembling multi-model, multi-cloud, multi-chip strategies rather than betting on a single winner. What this means for leaders For executives, the real strategic questions are shifting from “Which model is best?” to: ⚫ Where do we need tight vertical integration (data + model + chip) versus flexible, multi-vendor optionality? ⚫ How do we avoid over-dependence on a single GPU vendor while not underestimating the cost of moving away from a mature platform? ⚫ Which workloads justify ASIC-style optimization, and which demand GPU-style breadth and agility? If your current AI roadmap doesn’t explicitly address hardware strategy, ecosystem risk, and a multi-model future, it’s time to revisit it. Bring your product, infra, and finance leaders into the same room and pressure-test your AI stack assumptions for the next 3–5 years—before the chip layer, not the model layer, becomes your biggest strategic constraint. Read More 👉 https://lnkd.in/g7C5nzd2 #AI #GenAI #GoogleGemini #Nvidia #AIChips #CloudComputing #Developers #AIInfrastructure #TechStrategy #EnterpriseAI

31 Comments

Sol Rashidi, MBA

115,520 followers 3mo

The best thing I ever did for my AI projects? I invited the biggest critics into the room. Sounds counterintuitive, right? Here's what I've learned across 200+ AI deployments: The skeptics, the curmudgeons, the people who question everything — they're not trying to kill your project. They're showing you exactly how to bulletproof it. But here's the catch: timing is everything. Bring them in too early (during brainstorming or exploration), and they'll suffocate momentum before ideas have room to develop. Bring them in too late (after you've committed resources), and their insights can't save you anymore. The sweet spot? Post-design. When you have a concrete solution that needs stress-testing. When the strategy is formed enough to withstand scrutiny but flexible enough to improve. That's when skeptics deliver maximum value. Here's how to make it work: Set expectations upfront. Tell your team you're deliberately bringing in critics to strengthen the project. Frame it as quality assurance, not a threat. Give them a clear job: Find every weakness. Surface every risk. Reveal organizational realities that need addressing. Document everything they say. Every objection becomes a blind spot you can now account for. The result? → Your plan becomes stronger. → Your strategy accounts for real resistance. → Your risk mitigation addresses actual organizational dynamics. Everything is already accounted for before implementation begins. The critics aren't sabotaging your AI initiative. You're sabotaging it by not leveraging them properly. What's been your experience with managing skeptics in transformation projects?

93 Comments

Lenny Rachitsky

Deeply researched no-nonsense product, growth, and career advice

367,605 followers 6mo

My biggest takeaways from Fei-Fei Li: 1. Just nine years ago, calling yourself an AI company was considered bad for business. Nobody believed the technology would work back in 2016. By 2017, companies started embracing the term. Today, virtually every company calls itself an AI company. 2. The modern AI revolution started with a simple but overlooked insight from Fei Fei: AI models needed large amounts of labeled data. While researchers focused on sophisticated mathematical models and algorithms, she realized the missing ingredient was data. Her team spent three years working with tens of thousands of people across more than 100 countries to label 15 million images, creating ImageNet. This dataset became the foundation for today’s AI systems. 3. The human brain’s efficiency vastly exceeds current AI systems. Humans operate on about 20 watts of power—less than any lightbulb—yet accomplish tasks that require AI systems to use massive computing resources. Current AI still can’t do things elementary school children find easy. 4. Simply scaling current approaches won’t be enough. While adding more data, computing power, and bigger models will continue advancing AI, fundamental innovations are still needed. Throughout AI history, simpler approaches combined with enormous datasets consistently outperformed sophisticated algorithms with limited data. 5. Breakthrough technologies often start as toys or fun experiments before changing the world. ChatGPT was tweeted by Sam Altman as “Here’s a cool thing we’re playing with” and became the fastest-growing product in history. What seems like play today might transform civilization tomorrow. 6. Spatial intelligence is as crucial as language for real-world applications. In emergency situations like fires or natural disasters, first responders organize rescue efforts through spatial awareness, movement coordination, and understanding physical environments—not primarily through language. This is why world models that understand three-dimensional space represent the next frontier beyond text-based chatbots. 7. Physical robots face much harder challenges than self-driving cars, which took 20 years from prototype to street deployment and still aren’t finished. Self-driving cars are metal boxes moving on flat surfaces, trying not to touch anything. Robots are three-dimensional objects moving in three-dimensional spaces, specifically trying to touch and manipulate things. This makes robotics far harder than creating chatbots. 8. Everyone has a role in AI’s future, regardless of profession. Whether you’re an artist using AI tools to tell unique stories, a farmer participating in community decisions about AI deployment, or a nurse who could benefit from AI assistance in an overworked health-care system, you can and should engage with this technology. AI should augment human dignity and agency, not replace it—which means both using AI as a tool and having a voice in how it’s governed.

49 Comments

Aishwarya Srinivasan

631,099 followers 11mo

If you are building AI agents or learning about them, then you should keep these best practices in mind 👇 Building agentic systems isn’t just about chaining prompts anymore, it’s about designing robust, interpretable, and production-grade systems that interact with tools, humans, and other agents in complex environments. Here are 10 essential design principles you need to know: ➡️ Modular Architectures Separate planning, reasoning, perception, and actuation. This makes your agents more interpretable and easier to debug. Think planner-executor separation in LangGraph or CogAgent-style designs. ➡️ Tool-Use APIs via MCP or Open Function Calling Adopt the Model Context Protocol (MCP) or OpenAI’s Function Calling to interface safely with external tools. These standard interfaces provide strong typing, parameter validation, and consistent execution behavior. ➡️ Long-Term & Working Memory Memory is non-optional for non-trivial agents. Use hybrid memory stacks, vector search tools like MemGPT or Marqo for retrieval, combined with structured memory systems like LlamaIndex agents for factual consistency. ➡️ Reflection & Self-Critique Loops Implement agent self-evaluation using ReAct, Reflexion, or emerging techniques like Voyager-style curriculum refinement. Reflection improves reasoning and helps correct hallucinated chains of thought. ➡️ Planning with Hierarchies Use hierarchical planning: a high-level planner for task decomposition and a low-level executor to interact with tools. This improves reusability and modularity, especially in multi-step or multi-modal workflows. ➡️ Multi-Agent Collaboration Use protocols like AutoGen, A2A, or ChatDev to support agent-to-agent negotiation, subtask allocation, and cooperative planning. This is foundational for open-ended workflows and enterprise-scale orchestration. ➡️ Simulation + Eval Harnesses Always test in simulation. Use benchmarks like ToolBench, SWE-agent, or AgentBoard to validate agent performance before production. This minimizes surprises and surfaces regressions early. ➡️ Safety & Alignment Layers Don’t ship agents without guardrails. Use tools like Llama Guard v4, Prompt Shield, and role-based access controls. Add structured rate-limiting to prevent overuse or sensitive tool invocation. ➡️ Cost-Aware Agent Execution Implement token budgeting, step count tracking, and execution metrics. Especially in multi-agent settings, costs can grow exponentially if unbounded. ➡️ Human-in-the-Loop Orchestration Always have an escalation path. Add override triggers, fallback LLMs, or route to human-in-the-loop for edge cases and critical decision points. This protects quality and trust. PS: If you are interested to learn more about AI Agents and MCP, join the hands-on workshop, I am hosting on 31st May: https://lnkd.in/dWyiN89z If you found this insightful, share this with your network ♻️ Follow me (Aishwarya Srinivasan) for more AI insights and educational content.

87 Comments

Roman Eisenberg

Head of Technology for Chase Card and Connected Commerce - Consumer and Community Banking. Managing Director.

6,633 followers 1mo

Let skepticism shape your innovation, not stall you. Most rooms I’m in are brimming with Al-assisted development demos and genuine optimism about how quickly software teams can now move. That energy is real and valuable. AI is no longer just helping developers write a few lines of code faster. It increasingly helps teams refactor across files and repos, produce tests, explain unfamiliar code, and advance work through the SDLC workflows. Yet, I sometimes notice the quiet pauses before the tough questions. People worry about sounding negative, or slowing momentum, or being the only one who is uneasy. Those instincts are not only okay, but they are also just as valuable. The skepticism matters more now, not less, because the question is no longer whether AI can generate code. For me, bringing the hard questions supports progress: • What business or engineering outcome is this improving, beyond developer velocity? • Where can this fail: logic, resiliency, security, privacy, or maintainability? • What is the smallest production-relevant test that proves value? • What review, monitoring, and rollback mechanisms need to exist before we scale it? • How do we preserve human judgment where it matters most? I invite challenges to my ideas because that is how we build better ones. A few principles I’ve found useful, especially in the context of mission-critical platforms: • Challenge constructively. Do not just identify the risk and admire the problem, help design the safer path forward. • Trade “no” with “how.” If this approach is not ready, what is the fastest responsible way to learn? • Pair excitement with evidence. Instrument outcomes, test rigorously, and keep a clean rollback path. • Treat trust as a deliverable. In AI-assisted development, control is not friction. It makes speed sustainable. Our best outcomes happen when excitement fuels ambition while skepticism sharpens it. Because in this new environment, skepticism is not the enemy of innovation but is part of the engineering discipline that keeps innovation real and production worthy.

6 Comments

Sahar Mor

I help researchers and builders make sense of AI | ex-Stripe | aitidbits.ai | Angel Investor

41,981 followers 1y

Voice is the next frontier for AI Agents, but most builders struggle to navigate this rapidly evolving ecosystem. After seeing the challenges firsthand, I've created a comprehensive guide to building voice agents in 2024. Three key developments are accelerating this revolution: -> Speech-native models - OpenAI's 60% price cut on their Realtime API last week and Google's Gemini 2.0 Realtime release mark a shift from clunky cascading architectures to fluid, natural interactions -> Reduced complexity - small teams are now building specialized voice agents reaching substantial ARR - from restaurant order-taking to sales qualification -> Mature infrastructure - new developer platforms handle the hard parts (latency, error handling, conversation management), letting builders focus on unique experiences For the first time, we have god-like AI systems that truly converse like humans. For builders, this moment is huge. Unlike web or mobile development, voice AI is still being defined—offering fertile ground for those who understand both the technical stack and real-world use cases. With voice agents that can be interrupted and can handle emotional context, we’re leaving behind the era of rule-based, rigid experiences and ushering in a future where AI feels truly conversational. This toolkit breaks down: -> Foundation layers (speech-to-text, text-to-speech) -> Voice AI middleware (speech-to-speech models, agent frameworks) -> End-to-end platforms -> Evaluation tools and best practices Plus, a detailed framework for choosing between full-stack platforms vs. custom builds based on your latency, cost, and control requirements. Post with the full list of packages and tools as well as my framework for choosing your voice agent architecture https://lnkd.in/g9ebbfX3 Also available as a NotebookLM-powered podcast episode. Go build. P.S. I plan to publish concrete guides so follow here and subscribe to my newsletter.

10 Comments

Aurimas Griciūnas

Founder @ SwirlAI • Ex-CPO @ neptune.ai (Acquired by OpenAI) • UpSkilling the Next Generation of AI Talent • Author of SwirlAI Newsletter • Public Speaker

184,151 followers 5mo

I have been developing Agentic Systems for the past few years and the same patterns keep emerging. 👇 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗗𝗿𝗶𝘃𝗲𝗻 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 is the most reliable way to be successful in building your 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 - here is my template. Let’s zoom in: 𝟭. Define a problem you want to solve: is GenAI even needed? 𝟮. Build a Prototype: figure out if the solution is feasible. 𝟯. Define Performance Metrics: you must have output metrics defined for how you will measure success of your application. 𝟰. Define Evals: split the above into smaller input metrics that can move the key metrics forward. Decompose them into tasks that could be automated and move the given input metrics. Define Evals for each. Store the Evals in your Observability Platform. ℹ️ Steps 𝟭. - 𝟰. are where AI Product Managers can help, but can also be handled by AI Engineers. 𝟱. Build a PoC: it can be simple (excel sheet) or more complex (user facing UI). Regardless of what it is, expose it to the users for feedback as soon as possible. 𝟲. Instrument your application: gather traces and human feedback and store it in an Observability Platform next to previously stored Evals. 𝟳. Run Evals on traced data: traces contain inputs and outputs of your application, run evals on top of them. 𝟴. Analyse Failing Evals and negative user feedback: this data is gold as it specifically pinpoints where the Agentic System needs improvement. 𝟵. Use data from the previous step to improve your application - prompt engineer, improve AI system topology, finetune models etc. Make sure that the changes move Evals into the right direction. 𝟭𝟬. Build and expose the improved application to the users. 𝟭𝟭. Monitor the application in production: this comes out of the box - you have implemented evaluations and traces for development purposes, they can be reused for monitoring. Configure specific alerting thresholds and enjoy the peace of mind. ✅ 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗺𝗲𝗻𝘁 𝗼𝗳 𝘆𝗼𝘂𝗿 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻: ➡️ Run steps 𝟲. - 𝟭𝟬. to continuously improve and evolve your application. ➡️ As you build up in complexity, new requirements can be added to the same application, this includes running steps 𝟭. - 𝟱. and attaching the new logic as routes to your Agentic System. ➡️ You start off with a simple Chatbot and add a route that can classify user intent to take action (e.g. add items to a shopping cart). What is your experience in evolving Agentic Systems? Let me know in the comments 👇

34 Comments

LinkedIn respects your privacy

Advancing AI Development

Explore categories

Advancing AI Development

More in Advancing AI Development

More Artificial Intelligence topics

Explore categories