The Zero-Shot Crisis: Lessons Learned in the AI/ML Community

The Zero-Shot Crisis: Lessons Learned in the AI/ML Community

In mid-early 2023, my team, myself, and likely many others in the hardworking AI/ML community experienced what I now refer to as the "zero-shot crisis." So, what exactly is the zero-shot crisis? The Jackie Chan meme in the title picture of this article captures the essence of our collective astonishment. For the last decade, working with AI or machine learning has been a complex endeavor that typically unfolded in the following manner:

  1. Translate a business problem into an algorithmic problem and sanity check if machine learning is needed to solve it: The vast majority of those problems translated into supervised machine learning, i.e. we want to label things (classification) or we want to predict a more continuous outcome (regression) based on (often human created) past data.
  2. Acquiring enough high-quality(!) labeled data to train and evaluate the model is crucial. This step is often the most critical in the entire workflow and can determine the success or failure of the project.
  3. Setup a proper evaluation workflow that allows to you compare different approaches and to also establish a base line.
  4. Do a few iteration with different approaches using 3. until upfront agreed quality is reached.
  5. Deploy the model in operations. Deploying the model into production can be challenging. Transitioning from a Jupyter notebook to an organization's IT ecosystem, managing data streams, and ensuring automated re-training and monitoring is a complex task, often referred to as Machine Learning Operations (MLOps).

From these steps, it is clear that achieving a successful AI/ML project that generates real business impact requires significant resources. One such project was the European Patent Office's (EPO) Auto-Classification AI implementation, which we rolled out shortly before ChatGPT was released. Naturally, we tested ChatGPT and later other models by copy-pasting a published patent and, voila, we got a CPC classification. Just like that. If you're unfamiliar with zero-shot classification, it means that the model was never specifically trained to do patent classification in the Cooperative Patent Classification system, yet it managed to perform the task. It was astounding.

Article content
GPT4.0 Turbo doing zero shot classification

Jacky Chan a second time - just for the effect:

Article content
Zero Shot Crisis Meme

Data Science strikes back

Unfortunately, or perhaps fortunately, the story has a part two: "The Return of Data Science." Soon after we overcame the initial shock, we realized a few critical issues. For example, some of the symbols (CPC-classes produced) didn't exist—they were made up. Today those are called hallucinations or more recently ChatGPT is bullshit.

Article content
Data Scientist working hard on solving complex AI/ML problems - we are currently still missing the lights sabres in my team, though.

We now know that zero-shotting a large language model does not necessarily yield better quality results for this specific case. It is just more expensive and more difficult to evaluate, as the generative answer can sometimes be hard to parse for a machine. What we learned is that we still need an evaluation framework. In reality, all points 1-5 are still very much needed; they just look a bit different now. I can personally only recommend always coming up with a proper evaluation framework on a controlled test data set for all problems you want to solve with generative AI and LLMs. Where genAI can really support is in generating plausible training data - but this is yet another project for which you need to understand the performance! It will push the 1-5 just to another task!

This holds also true for anything Retrieval Augmented Generations (RAG) - do not trust that random chunks combined with random embeddings will deliver the output that you want your users to have: Create or collect real question-answer pairs and evaluate which combination of prompts+model+embedding and chunk size (to just mention a few of the free parameters) is the best to make informed business decision and do not become a genAI-zombie just dumping everything together, closing your eyes and hope for the best.

This article was also partly inspired by the fact that I had been personally contacted a few times to advise on "which LLM is best for working with patents." This question is almost impossible to answer in such general terms. More importantly, it shows that many people and colleagues who have recently entered the AI field still think it is just a matter of choosing the correct LLM and all problems are solved auto-magically. In our experience, this is not the case at all. While LLMs allow for fantastic capabilities, they have fundamentally changed the way we work. However, they have not eliminated the need for a robust methodology to deliver high and consistent quality to our business clients.

Proper testing and evaluation remain crucial. This also means we do still need high quality data to compare against. A pure qualitative analysis is not a replacement for a robust qualitative evaluation.




Dr. Lukas Wollenschlaeger

Co-Founder @ Palito | Patent litigator | Ex-Hogan Lovells

1y
Christoph Hewel

German / European Patent Attorney, UPC Representative, Partner

1y

Thanks Alex, sounds like what I experienced (e.g. comparing a fine tuned QA model with AWS partyrock / Claude). The evaluation (data) becomes even more important. Matthias Blume

To view or add a comment, sign in

More articles by Alexander Klenner-Bajaja

  • EPO's Legal Interactive Platform

    The European Patent Convention currently spans 908 pages and outlines the autonomous legal framework for the European…

    32 Comments
  • Introducing AI-PreSearch: A Revolutionary AI-Driven Search Tool to support our Patent Examiners

    Today marks a significant milestone for the European Patent Office and the Artificial Intelligence Programme of SP2023…

    15 Comments
  • AI guided CPC Classification

    We're proud to announce the release of a groundbreaking AI solution for one of our toughest challenges: AI-assisted…

    27 Comments
  • Trend Monitoring in Patents

    As part of our Data Science activities Yassine and Giacomo from our team worked in the Trend Monitoring Project of…

    9 Comments
  • 2 million patent publications processed

    Today we reach the two million mark of patent publication front pages analysed and processed by our computer vision…

    3 Comments
  • Computer Vision for Patent Figures

    In patents, information is transported mainly in three different ways: Bibliographic or meta data (applicant, inventor,…

    13 Comments
  • EP-BERT goes live for Pre-Classification

    In 2020 we started to train our own BERT deep neural network. Based on Google’s published architecture we trained BERT…

    22 Comments
  • EPO Neural Machine Translation

    On 1st of April we rolled out successfully our first in-house developed Neural Machine Translation engines. They are…

    16 Comments

Others also viewed

Explore content categories