Skip to content

huazhenglin2003/SSL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Official implementation of our paper: "Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation", accepted to EMNLP 2025 Findings.

📄 Paper: https://arxiv.org/pdf/2505.16146


Installation

conda create -n ssl python=3.10.14
conda activate ssl
pip install -r requirements.txt

⚠️ Note: The default requirements.txt is configured for LLaVA-NeXT-8B, LLaVA-1.5-7B and InstructBLIP-7B. If you plan to use Llama-3.2-11B-Vision-Instruct for inference, you must upgrade transformers to version 4.53.0:

pip install --upgrade "transformers==4.53.0"

Dataset and Model Preparation

  1. Datasets. Download MSCOCO and organize the files under ./data as follows:

    ├── coco
    │     ├── val2014
    │     └── annotations
    │           ├── captions_val2014.json
    │           └── instances_val2014.json
    └── pope
          └── coco
                ├── coco_pope_popular.json
                ├── coco_pope_random.json
                └── coco_pope_adversarial.json
    
  2. Large Vision-Language Models (LVLMs). Download the following LVLMs and update model_dir in the .sh scripts if needed:

  3. Sparse Autoencoder (SAE). This work uses the SAE provided by lmms-lab: llama3-llava-next-8b-hf-sae-131k. After downloading, organize it under:

    ├── data
          └── sae
                └── llama3-llava-next-8b-hf-sae-131k
    

Inference and Evaluation

Run inference using the provided shell scripts:

# General syntax
bash infer_script.sh

# Example runs
bash scripts/infer_chair.sh
bash scripts/infer_pope.sh

👉 Note:

  • You can set the target LVLM and adjust gamma and layer hyperparameters in each .sh script.
  • num_chunks controls parallel GPU usage (default: 8).

Citations

If you find this work useful, please cite:

@inproceedings{hua-etal-2025-steering,
    title = "Steering {LVLM}s via Sparse Autoencoder for Hallucination Mitigation",
    author = "Hua, Zhenglin  and
      He, Jinghan  and
      Yao, Zijun  and
      Han, Tianxu  and
      Guo, Haiyun  and
      Jia, Yuheng  and
      Fang, Junfeng",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.572/",
    pages = "10808--10828",
    ISBN = "979-8-89176-335-7"
}

About

[EMNLP 2025 Findings] Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published