Back to AI information
Anthropic released Claude users' physical and mental health protection measures: self-injury conversation interception and resource guidance

Anthropic released Claude users' physical and mental health protection measures: self-injury conversation interception and resource guidance

AI information Admin 93 views

Anthropic issued an announcement introducing the latest security measures and evaluation results of its chatbot Claude in terms of "user physical and mental health", focusing on responding to the topic of suicide and self-harm, as well as reducing the model's tendency to "flatter catering", and once again emphasizing the requirements for Claude to be used over the age of 18. The announcement pointed out that Claude is not a professional medical or psychological alternative service, and when there are signs of self-harm risk in the conversation, it should respond with empathy and try to guide users to obtain real human support.


At the product level, Anthropic adds a suicide and self-injury identification classifier to Claude.ai conversations: when the system determines that there is a potential crisis or related scenario (including fictional scenarios), it triggers a prompt banner and provides a national helpline for help, and the relevant resources are supported by the global hotline and service network maintained by ThroughLine. In terms of evaluation, Claude Opus 4.5, Sonnet 4.5, and Haiku 4.5 achieved appropriate response performance of about 98.6%, 98.7%, and 99.3% respectively in a single round of "clear high-risk" requests. In the multi-round dialogue scenario, Opus 4.5 and Sonnet 4.5 are about 86% and 78% respectively, which is a significant increase over the previous version.


In response to the risk of "flattery" and possible reinforcement of delusions, Anthropic said that it will continuously improve training and testing, and open source the automated behavioral audit evaluation set and tool Petri for external researchers to compare and reproduce risky behaviors in multiple rounds of interactions. In terms of protection of minors, Claude.ai require users to confirm that they are over 18 years old when registering; If you describe yourself as under the age of 18 in the conversation, the system will trigger a review and deactivate the account after confirmation, while also developing more implicit underage identification mechanisms and participating in relevant industry organizations to promote children's online safety practices.



FAQ Q: What is the main content of this announcement?

A: The announcement focuses on Claude's product measures and evaluation results in suicide and self-injury dialogue, anti-"flattery pandering", and the 18+ threshold and the protection of minors.


Q: What does Claude do when he encounters a suspected self-injury help?

A: The system may trigger crisis alert banners, provide live hotlines or local resources, and respond in a more cautious manner to avoid giving inappropriate details or reinforcing risks.


Q: What role does ThroughLine play in this?

A: ThroughLine provides and maintains a cross-country crisis resource network to show users a human support channel that can be contacted.


Q: What is "sycophancy" and why should it be reduced?

A: Flattery refers to the model catering to users and only saying what users want to hear, which may amplify the risk in delusions or disconnected topics from reality, so it needs to be reduced through training and evaluation.


Q: Why does Claude require people over 18 years old?

A: The announcement said that young users are more susceptible to adverse effects, so it has set up an 18+ confirmation and minor identification and disposal mechanism, and continues to strengthen relevant testing.

Anthropic Announcement Interprets Claude's New Measures for Physical and Mental Health and Safety Anthropic Announces Claude's Self-Injury Coping Mechanism and Evaluation Results Anthropic strengthens Claude's suicide and self-injury identification and help guidance Anthropic upgrades Claude crisis banners and global hotline resources Anthropic explains that Claude is not a medical substitute and leads to human support Anthropic discloses Claude 4. 5. Appropriate response rate for high-risk requests Anthropic announced that Claude's multi-round dialogue self-injury scene performance has been improved Anthropic adds suicide and self-injury classifier prompts to Claude.ai Anthropic introduces ThroughLine to support Claude's global help channel Anthropic emphasized that Claude prioritized empathetic responses to signs of self-harm Anthropic emphasized that Claude avoided providing detailed advice on self-harm Anthropic updates Claude safety assessment to focus on suicide and self-harm topics Anthropic lowers Claude's flattery to prevent reinforcement delusions Anthropic explains the safety implications of Claude in reducing sycophancy Anthropic's open-source Petri tool audits Claude for multiple rounds of risk behavior Anthropic releases Petri assessment set to help researchers reproduce comparisons Anthropic Announces Claude Opus 4.5 Self-Injury Response of 98.6% Anthropic Announces Claude Sonnet 4.5 Self-Injury Response 98.7% Anthropic Announces Claude Haiku 4.5 Self-Injury Response of 99.3% Anthropic said that the Opus 4.5 multi-round crisis response is about 86% Anthropic said that Sonnet 4.5 responded to about 78% of the multiple crises Anthropic Summary Claude 4. 5. Self-injury safety is significantly improved compared with the old version Anthropic explains that fictional self-harm situations can also trigger crisis prompts Anthropic uses banner prompts on the product side to connect local help Anthropic clarifies that ThroughLine maintains a global network of hotlines and services Anthropic emphasized that Claude should guide offline real people to help when encountering high risks Anthropic reiterated that Claude was not counseling or medical services Anthropic reiterated Claude.ai registration is limited to users over 18 years old Anthropic states that self-reports under 18 will trigger review and deactivation Anthropic develops more implicit underage identification mechanisms to protect children Anthropic participates in industry organizations that promote children's online safety practices Anthropic announced the process of protecting minors and the rules for disposing of accounts Anthropic explains that the 18+ threshold is more susceptible to younger users Anthropic emphasizes empathetic response and risk referral as core strategies Anthropic showcases Claude's more cautious response style for crisis conversations Anthropic discloses the trigger logic for the self-injury risk identification classifier Anthropic explains that multiple rounds of conversation are more difficult, so it continuously improves the test Anthropic claims that anti-flattery training reduces delusional reinforcement and pandering Anthropic proposes to use Petri to evaluate bad behavior in multiple rounds of interaction Anthropic open tools allow external researchers to audit Claude safety Anthropic Announcement Overview Self-injury help resource display and update mechanism Anthropic responds to Claude's suspected self-injury request Anthropic answers the role of ThroughLine in the maintenance of the helpline channel Anthropic answers what flattery is and the conversational risks it brings Anthropic answers why Claude emphasizes the requirement for people over 18 years old Anthropic integrates product measures, assessment data, and security commitments Anthropic emphasizes avoiding reinforcing dangerous intent in crisis conversations Anthropic Releases Claude Health Safety Measures and Transparency Report Anthropic drives Claude's full-link security design from identification to referral

Recommended Tools

More