Welcome to the official repository for the MOS: Modeling Object-Scene Associations in Generalized Category Discovery project!
pip install -r requirements.txt
We recommend using the same configuration as ours: Python 3.8, CUDA > 12, and torch 2.3.1.
We use fine-grained benchmarks in this paper, including:
In addition, we need to extract the mask for each image (where pixel value 255 represents the object and 0 represents the scene). Please follow the IS-Net for this process (model is isnet-general-use). Alternatively, you can use the pre-processed masks that we have already prepared. The Google Drive link is link.
The placement of the mask foler is as follows:
- For cub: your_path/cub/masks
- For stanford_car: your_path/stanford_car/cars_train_mask and your_path/stanford_car/cars_test_mask
- For aircraft: your_path/fgvc-aircraft-2013b/data/masks
- For oxford-pet: your_path/Oxford-pet/data/masks
Train the model:
bash scripts/run_${DATASET_NAME}.sh
Please note that in the .sh file, you need to specify the root directory of the dataset and DINO weight.
You can contact [email protected] to obtain logs and checkpoints from multiple experiments for any dataset. Feel free to reach out.
Please note that we have commented out the last norm layer in the DINO backbone.
If you find this repo useful for your research, please consider citing our paper:
@inproceedings{peng2025mos,
title={MOS: Modeling Object-Scene Associations in Generalized Category Discovery},
author={Peng, Zhengyuan and Ma, Jinpeng and Sun, Zhimin and Yi, Ran and Song, Haichuan and Tan, Xin and Ma, Lizhuang},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
The codebase is largely built on this repo: SimGCD.
For inquiries or further information, contact: [email protected]
Happy coding!