Introducing AI Steerability 360 — a set of lightweight algorithms for controlling LLM behavior: https://ibm.co/6041BzRpT Led by Erik Miehling and his team at IBM Research, AISteer360 can help software developers and researchers steer an LLM down the right path by targeting four key steps in the generative process — the prompt, and the model’s weights, state, and final output. AISteer360 also provides a framework for combining algorithms into "steering pipelines" that can be systematically evaluated for a given task — allowing users to find the best solution for their use case.
IBM Research
Research Services
Yorktown Heights, New York 92,581 followers
Inventing what's next in science and technology.
About us
IBM Research is a group of researchers, scientists, technologists, designers, and thinkers inventing what’s next in computing. We’re relentlessly curious about all the ways that computing can change the world. We’re obsessed with advancing the state of the art in AI and hybrid cloud, and quantum computing. We’re discovering the new materials for the next generation of computer chips; we’re building bias-free AI that can take the burden out of business decisions; we’re designing a hybrid-cloud platform that essentially operates as the world’s computer. We’re moving quantum computing from a theoretical concept to machines that will redefine industries. The problems the world is facing today require us to work faster than ever before. We want to catalyze scientific progress by scaling the technologies we’re working on and deploying them with partners across every industry and field of study. Our goal is to be the engine of change for IBM, our partners, and the world at large.
- Website
-
http://www.research.ibm.com/
External link for IBM Research
- Industry
- Research Services
- Company size
- 10,001+ employees
- Headquarters
- Yorktown Heights, New York
Updates
-
IBM Research reposted this
IBM 𝗷𝘂𝘀𝘁 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲𝗱 𝗶𝘁𝘀 𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗨𝘀𝗶𝗻𝗴 𝗚𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝘁 𝗔𝗴𝗲𝗻𝘁 (𝗖𝗨𝗚𝗔). A few months ago, our very talented IBM Research team presented CUGA — a configurable, general-purpose AI agent capable of executing complex tasks across APIs, browsers, (soon) file systems and command lines. t’s a bridge between AI research and enterprise reality. And now, it’s open source – open for builders, researchers, and innovators to shape what comes next. 𝗪𝗵𝗮𝘁 𝗺𝗮𝗸𝗲𝘀 𝗖𝗨𝗚𝗔 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁? → Complex task execution – State-of-the-art performance across web and APIs. → Multi-tool mastery – Works with REST APIs (via OpenAPI), MCP servers, and custom connectors. → Composable agent architecture – CUGA can act as a tool for other agents, enabling nested reasoning and collaboration. → Configurable reasoning modes – Switch between fast heuristics and deep planning depending on your latency and cost needs. Instead of manually coding prompts or chaining tools, developers simply configure MCP tools, define domain knowledge and SOPs, and let CUGA handle the orchestration. This dramatically cuts down development time, cost, and operational risk – while embedding enterprise guarantees around trust and governance. 𝗛𝗲𝗿𝗲’𝘀 𝗵𝗼𝘄 𝘆𝗼𝘂 𝗰𝗮𝗻 𝗵𝗲𝗹𝗽: → Share use cases – How would you use CUGA in your workflows? → Request features – What capabilities do you need next? → Report bugs – Help IBM Research to improve with clear, reproducible reports. All contributions are welcome on GitHub. Read more IBM Research Blog: https://lnkd.in/dxfgMsWT Link to Git: https://lnkd.in/dsqbn-FT 𝗣.𝗦. 𝗜 𝗿𝗲𝗰𝗲𝗻𝘁𝗹𝘆 𝗹𝗮𝘂𝗻𝗰𝗵𝗲𝗱 𝗮 𝗻𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿 𝘄𝗵𝗲𝗿𝗲 𝗜 𝘄𝗿𝗶𝘁𝗲 𝗮𝗯𝗼𝘂𝘁 𝗔𝗜 + 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀. 𝗜𝘁’𝘀 𝗳𝗿𝗲𝗲, 𝗮𝗻𝗱 𝗮𝗹𝗿𝗲𝗮𝗱𝘆 𝗿𝗲𝗮𝗱 𝗯𝘆 𝟮𝟱𝗸+ 𝗽𝗲𝗼𝗽𝗹𝗲: https://lnkd.in/dbf74Y9E
-
-
IBM Research reposted this
A few months ago, we shared with you our progress on developing novel decoding algorithms for qLDPC codes. That effort resulted in the Relay-BP algorithm (https://lnkd.in/eFbWNFeU), which surpassed prior state-of-the-art qLDPC decoders in terms of logical error rate while simultaneously removing barriers toward real-time implementation. In particular, we showed that a novel variation of the belief propagation (BP) algorithm was sufficient for accurate decoding of our gross code without the need of an expensive second-stage decoder to fix cases where BP failed to converge. I’m excited to tell you about some of the progress we’ve made on taking the first steps towards implementing a real-time decoder in hardware (https://lnkd.in/e8CShTmT). Our initial effort has focused on FPGAs because they are very flexible and allow for very low-latency integration into our quantum control system. FPGAs’ flexibility in supporting custom logic and user-defined numerical formats allowed us to evaluate the performance of Relay-BP across a range of floating-point, fixed-point, and integer precisions. Encouragingly, we observe a high tolerance to reduced precision. Our experiments show that even 6-bit arithmetic is sufficient to maintain decoding performance. We explored the speed limits of an FPGA Relay-BP implementation in a maximally-parallel computational architecture. Like traditional BP, the Relay-BP algorithm is a message-passing algorithm where messages are exchanged between nodes on a decoding graph. Our maximally parallel implementation assigns a unique compute resource to every node in this graph, allowing a full BP iteration to be computed on every clock cycle. This decoder architecture is resource-intensive, but we succeeded in building a Relay-BP decoder for the gross code and fit it within a single AMD VU19P FPGA. Our implementation is limited to split X/Z decoding of the gross code syndrome cycle (we decode windows of 12 cycles), a simpler implementation than we’d need for Starling. That being said, it is extremely fast, an absolute requirement for practical implementation. In fact, we can execute a Relay-BP iteration in 24ns. As physical error rates drop below 1e-3, Relay-BP typically converges in less than 20 iterations. This means we can complete the decoding task in about 480ns. This is significantly faster than what is possible with NVIDIA’s DGX-Quantum solution, which requires a 4000ns start-up cost before decoding begins. The figure below compares the logical error performance versus physical error rate of our FPGA implementation compared to a floating-point software implementation for memory experiments of the size of Loon and Kookaburra on our Innovation roadmap. This and further data shows that the reduced precision arithmetic in the FPGA matches the accuracy of a software model, while simultaneously running dramatically faster. Further details are in the pre-print: https://lnkd.in/e8CShTmT
-
-
THANK YOU to the developers, engineers, educators, and attendees who visited our booth at PyTorch Conference 2025 this week. We showcased new open-source AI tools that are expanding possibilities in a multi-accelerator world: https://ibm.co/6041BzG07 ▶️ A key highlight was our work with AMD and Red Hat to build Triton-based kernels for vLLM—enabling efficient, hardware-agnostic inference across GPU platforms without proprietary libraries. This effort, presented by Thomas Parnell and Aleksandr M., strengthens support for open, extensible systems in the PyTorch and vLLM communities. ▶️ Researcher Linsong Chu and his team shared a major training milestone: Using torchtitan, a PyTorch-native training framework, they successfully trained one of the first Llama 3-70B-derived models from an open repository—achieving comparable quality with just one-third of the original training budget, half the token count, and FP8 low-precision quantization. ▶️ Lastly, but certainly not least, IBM Research distinguished engineer Mudhakar Srivatsa spotlighted our adoption of vLLM and torch.compile to integrate emerging accelerators like the IBM Spyre AI accelerator. IBM Research is working on a Spyre backend compiler and vLLM plugin with paged attention, with a goal of boosting memory efficiency and scalability for LLM inference. Follow the link above for a full recap #PyTorchCon and of how we’re expanding AI model training and inference for the open-source community.
-
-
-
-
-
+5
-
-
This week in IBM Research, we’re diving into quantum-centric supercomputing, everything new at PyTorch 2025, and an exciting new dataset named after a colorful bird. Toucan, a new open-source dataset for AI agents, takes flight. IBM and the University of Washington have open-sourced Toucan, a landmark dataset of 1.5 million AI agent task scenarios designed to accelerate tool-use learning in language models. By capturing real-world API executions across 2,000 web services, Toucan enables agents to reason, act, and collaborate more effectively—transforming chatbots into capable digital assistants. At this year’s PyTorch 2025, IBM unveiled a new collection of open-source AI tools redefining possibilities in a multi-AI accelerator world. Highlights included torchtitan, a PyTorch-native framework for efficient low-precision training, and Triton-powered kernels that boost inference throughput on vLLM across multiple GPU architectures. IBM Quantum is expanding the frontier of quantum-centric supercomputing with the release of a new C-language API for Qiskit. Designed for developers working in compiled languages like C++ and Fortran, this interface enables hybrid quantum-classical workflows that are integrated with high-performance computing environments. Included in Qiskit v2.2, the new demo showcases a complete quantum workflow, demonstrating real-world quantum applications in modern HPC systems. Catch up with this week in Research ⤵️
-
Don’t miss Dr. Zaira Nazario speak on the panel “Accuracy and Trust in the Digital Age” at MetLife’s 2025 Triangle Tech X conference this week: https://ibm.co/6049Bz8iW This free, virtual conference brings together leaders across industries to explore agility, authenticity and inclusion in STEM innovation. Her panel will highlight how organizations can balance fast-paced digital transformation with the integrity and accountability needed to build trust. Register via the link above. #TTX2025
-
-
IBM Research reposted this
vLLM V1 is now fully supported on AMD GPUs! New blog from vLLM Team at IBM Research, vLLM Team at Red Hat, and vLLM Team at AMD: Learn how the teams at IBM, AMD and Red Hat built an optimized attention backend for vLLM V1 using Triton kernels. In this deep technical blog, we describe the optimizations we performed to ensure that the Triton backend achieves state-of-the-art performance on AMD. Check it out if you are interested in knowing more about the art of writing lighting fast Triton kernels for AI applications! 🔗 Read here: https://hubs.la/Q03PC6kh0 #vLLM #PyTorch #OpenSourceAI
-
-
“How can you train better agents? Through diverse, high-quality examples sourced from the real world.” Researchers from IBM and the University of Washington release Toucan-1.5M, the largest tool-calling dataset to date, designed to improve how agents interact with the world: https://ibm.co/6047BK7G3 Comprising over 1.5 million real-life tool-calling task sequences—called trajectories—synthesized from 495 real-world Model Context Protocols (MCPs) servers, Toucan is built to teach AI agents how to call APIs by connecting to MCP servers. Models fine-tuned on Toucan data demonstrated impressive performance gains across common agentic benchmarks, including Berkeley Function-Calling Leaderboard V3, τ-Bench, and MCP-Universe. Now available on Hugging Face, Toucan is currently more than five times larger than the next largest open-source dataset—NVIDIA’s Nemotron, which contains 310,000 trajectories. With over 2,000 tools represented, it’s also likely the most diverse.
-
-
IBM Research reposted this
-
IBM Research reposted this
🚀 CUGA is now open source! We’re thrilled to share that the CUGA Agent, which is top-ranked on WebArena and AppWorld, is now available to the community. CUGA is a configurable, general-purpose AI agent designed to accelerate the development of high-performing, domain-specific agents for real-world applications. It offers built-in enterprise-grade guarantees, integrates seamlessly with MCP, and supports interaction with REST APIs, browser-based web applications, and soon, file systems, data stores, and command-line interfaces. Rather than manually crafting prompts or making complex architectural decisions, developers simply configure MCP tools and supply domain expertise, standard operation proceduers and guardrails. As a result, developers using CUGA will likely see significant improvement in development time and cost, 🔗 Learn more in the official IBM Research blog: https://lnkd.in/d4m5jkJB 🌐 Explore CUGA: www.cuga.dev 📂 Dive into the code and give us a star: https://lnkd.in/d4zCy2W4 We can’t wait to see what the community builds with it. Contributions, feedback, and collaborations are welcome! #OpenSource #AIagents #CUGA #AgentFramework #IBMResearch #WebArena #AppWorld