总目录 大模型安全相关研究:https://blog.csdn.net/WhiffeYF/article/details/142132328
“Do Anything Now”: Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
“立即行动”:在大型语言模型上表征和评估野外越狱提示
https://dl.acm.org/doi/pdf/10.1145/3658644.3670388
https://github.com/verazuo/jailbreak_llms
https://www.doubao.com/chat/2737010530194946
文章目录
速览
这篇论文主要研究大语言模型(LLMs)的越狱提示攻击问题,通过收集和分析大量数据,揭示了越狱提示的特点、攻击策略以及对LLMs的威胁程度,并评估了现有防护机制的有效性