Safespace Research hat dies direkt geteilt
How do human experts compare to AI in detecting suicide and self-harm risk? Important note: AI cannot replace professionals, please read the FULL post ⚠️ One of the most common and important questions about AI in mental health is whether it can detect risk. In our first exploratory analysis using 190 real distress messages annotated by human domain experts, we found that both our Safespace Research AI safety pipeline prototype and GPT-5 outperformed human experts, in identifying suicide and self-harm risk in one-shot (single-turn) interactions. 🔍 What does the analysis show? - Many of the “false negatives” by humans were still flagged as panic or anxiety crises and would have been explored further by a therapist, showing why AI can only act as a bridge and NOT replace professionals. - AI can be a valuable and safe bridge connecting individuals in distress to professional help, when properly designed. - There is a strong need to study multi-turn interactions, which often reveal more nuance. We urgently need customised escalation pipelines. OpenAI’s U.S. based emergency contacts are often inappropriate to the case or irrelevant for non-U.S. users, functioning more as legal safeguards than as truly useful resources. ❗️Some important notes 1. This does NOT mean the AI responded harmfully in cases where risk was not detected. 2. False negatives (missed risks) are obviously the most concerning, but false positives can also be problematic. They may cause unnecessary distress or mislead users about their condition. GPT-5 tended to escalate even mild cases to emergency contacts. 3. The majority rule may not always represent the true ground truth: if one expert is more accurate than the others, their precision can be underweighted. 4. One-shot interactions are a limited setting and don’t reflect full conversations. 5. AI cannot de-escalate crises or interpret linguistically subtle but crucial cues, for example, distinguishing between active and passive suicidal thoughts, or detecting active masking of distress. We are currently expanding this analysis with a larger annotated dataset. Results have proven robust across alternative comparison rules: using simple majority rule (3 out of 4 experts detect risk) or a veto rule (at least 1 expert detects risk). Big tech companies, OpenAI, Google, Meta, are not built to adapt their mental health systems to specific cultural, geographical, or contextual needs. They simply don’t have the incentive structure or ROI to make that a priority 👎 To do better than big tech, you just need people with the right motivation and the right tools 🧡 And if you give us a chance, that’s exactly what we’re building 🐰 *Thanks to @Alexander Hoyle for the early feedback, to Oliver for co-working thoroughly on this despite tight deadlines, and to our advisor Elliott Ash, for his support of research on AI for Mental Health. #HumanVsAI #AIsafety #DigitalHealth #MentalHealthMatters #SafespaceResearch