Google Scholar

Prompt pre-training with twenty-thousand classes for open-vocabulary visual recognition

S Ren, A Zhang, Y Zhu, S Zhang… - Advances in …, 2023 - proceedings.neurips.cc

S Ren, A Zhang, Y Zhu, S Zhang, S Zheng, M Li, AJ Smola, X Sun

Advances in Neural Information Processing Systems, 2023•proceedings.neurips.cc

This work proposes POMP, a prompt pre-training method for vision-language models. Being
memory and computation efficient, POMP enables the learned prompt to condense semantic
information for a rich set of visual concepts with over twenty-thousand classes. Once pre-
trained, the prompt with a strong transferable ability can be directly plugged into a variety of
visual recognition tasks including image classification, semantic segmentation, and object
detection, to boost recognition performances in a zero-shot manner. Empirical evaluation …

Abstract

This work proposes POMP, a prompt pre-training method for vision-language models. Being memory and computation efficient, POMP enables the learned prompt to condense semantic information for a rich set of visual concepts with over twenty-thousand classes. Once pre-trained, the prompt with a strong transferable ability can be directly plugged into a variety of visual recognition tasks including image classification, semantic segmentation, and object detection, to boost recognition performances in a zero-shot manner. Empirical evaluation shows that POMP achieves state-of-the-art performances on 21 datasets, eg, 67.0% average accuracy on 10 classification datasets (+ 3.1% compared to CoOp) and 84.4 hIoU on open-vocabulary Pascal VOC segmentation (+ 6.9 compared to ZSSeg).

proceedings.neurips.cc

Show moreShow less

Save Cite Cited by 38 Related articles All 8 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Prompt pre-training with twenty-thousand classes for open-vocabulary visual recognition