Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Official implementation of our paper: "Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation", accepted to EMNLP 2025 Findings.

📄 Paper: https://arxiv.org/pdf/2505.16146

Installation

conda create -n ssl python=3.10.14
conda activate ssl
pip install -r requirements.txt

⚠️ Note: The default requirements.txt is configured for LLaVA-NeXT-8B, LLaVA-1.5-7B and InstructBLIP-7B. If you plan to use Llama-3.2-11B-Vision-Instruct for inference, you must upgrade transformers to version 4.53.0:

pip install --upgrade "transformers==4.53.0"

Dataset and Model Preparation

Datasets. Download MSCOCO and organize the files under ./data as follows:

├── coco
│     ├── val2014
│     └── annotations
│           ├── captions_val2014.json
│           └── instances_val2014.json
└── pope
      └── coco
            ├── coco_pope_popular.json
            ├── coco_pope_random.json
            └── coco_pope_adversarial.json

Large Vision-Language Models (LVLMs). Download the following LVLMs and update model_dir in the .sh scripts if needed:
Sparse Autoencoder (SAE). This work uses the SAE provided by lmms-lab: llama3-llava-next-8b-hf-sae-131k. After downloading, organize it under:
```
├── data
      └── sae
            └── llama3-llava-next-8b-hf-sae-131k
```

Inference and Evaluation

Run inference using the provided shell scripts:

# General syntax
bash infer_script.sh

# Example runs
bash scripts/infer_chair.sh
bash scripts/infer_pope.sh

👉 Note:

You can set the target LVLM and adjust gamma and layer hyperparameters in each .sh script.
num_chunks controls parallel GPU usage (default: 8).

Citations

If you find this work useful, please cite:

@inproceedings{hua-etal-2025-steering,
    title = "Steering {LVLM}s via Sparse Autoencoder for Hallucination Mitigation",
    author = "Hua, Zhenglin  and
      He, Jinghan  and
      Yao, Zijun  and
      Han, Tianxu  and
      Guo, Haiyun  and
      Jia, Yuheng  and
      Fang, Junfeng",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.572/",
    pages = "10808--10828",
    ISBN = "979-8-89176-335-7"
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
eval		eval
sae		sae
scripts		scripts
README.md		README.md
datasets.py		datasets.py
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Installation

Dataset and Model Preparation

Inference and Evaluation

Citations

About

Uh oh!

Releases

Packages

Languages

huazhenglin2003/SSL

Folders and files

Latest commit

History

Repository files navigation

Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Installation

Dataset and Model Preparation

Inference and Evaluation

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages