Advancing AI reliability

Research for reliable AI in critical systems

Thoughtworks AI Research Labs conducts research into how AI models can be evaluated, understood and controlled for use in critical environments at scale.

Our research

We focus on rigorous evaluation, interpretability, robustness and model control: how AI behaves, where it fails and how it can be guided. We also explore model interoperability and AI decision-making.

Read the research. Use the code.

Browse the Labs’ projects to access the research behind each one, with links to papers, code and datasets where available. Access the methods, run the code and build on the results.

p-less sampling: A robust LLM decoding strategy

By Phillip Howard, Parag Mahajani, Runyan Tan and Shuang Wu

Published: December 15, 2025

𝜌-less sampling improves autoregressive text generation by offering a new alternative to existing sampling techniques for LLMs.

The next frontiers in AI — according to industry leaders

By Parag Mahajani

Published: June 04, 2025

AI is evolving fast. Keynotes and events reveal the most important emerging trends and narratives shaping generative AI today.

Evaluating LLM-generated summaries using the Lie algebra framework

By Parag Mahajani and Manikandan Ravikiran

Published: October 10, 2025

We model summarization as a geometric flow, using Lie algebra to detect incompleteness via source-text contribution vectors.

Calculating uncertainty in generative AI

By Parag Mahajani and Runyan Tan

Published: May 06, 2025

Dropout estimates model uncertainty cost-effectively using prediction variance or softmax entropy while improving generalization and reducing overfitting.

More research

June 22, 2026

Curveball steering — Geometry-aware non-linear steering to control LLM behavior

Curveball steering is a technique developed by Thoughtworks researchers to control LLM outputs. It improves consistency through polynomial kernel PCA.

Anti-slopping — An innovation for rectifying LLM writing clichés

February 27, 2026

Concept consistency score

December 15, 2025

p-less sampling: A robust hyperparameter-free approach for LLM decoding

December 15, 2025

p-less sampling: A robust LLM decoding strategy

November 29, 2025

Steering smarter

October 10, 2025

Evaluating LLM-generated summaries using the Lie algebra framework

September 07, 2025

Beyond I am sorry, I can’t: dissecting large language model refusal

August 29, 2025

Distribution-aware feature selection for SAEs

August 06, 2025

Towards transparent AI grading: Entropy as a signal for human-AI disagreement

June 04, 2025

The next frontiers in AI — according to industry leaders

May 30, 2025

Beyond linear steering: Unified multi-attribute control for language models

May 06, 2025

Calculating uncertainty in generative AI

March 17, 2025

TinySQL

March 07, 2025

Evaluating LLMs using semantic entropy

October 31, 2024

LLM benchmarks, evals and tests

July 01, 2024

Turning up the heat: Min-p samling for creative and coherent creative outputs

October 16, 2023

Decoding LLM uncertainties for better predictability

September 08, 2023

A surprisingly effective way to estimate token importance in LLM prompts

September 02, 2021

Probabilistic machine learning and weak supervision

September 01, 2021

A gentle introduction to machine teaching

Partners and collaborations

Thoughtworks AI labs sit within a wider network of organizations spanning public AI research, semiconductor innovation, cloud platforms, open source and AI engineering.

These relationships strengthen the lab’s ability to contribute to the methods, tools and technical standards shaping reliable AI.

For partnerships and collaboration inquiries

email ai-labs@thoughtworks.com