Perplexity announced the launch of the BrowseSafe system and its companion benchmark, BrowseSafe-Bench, to improve the security of AI browsers in real-world web environments. The solution is aimed at its Comet browser scenario, and at its core, it is a model that specifically detects malicious natural language instructions in web pages, which can scan full-page HTML in real time without significantly increasing latency to identify prompt injection attacks against agents.
According to thearticle, BrowseSafe-Bench contains more than 14,000 production-friendly web page samples, covering 11 types of attack targets, 9 injection locations, and multiple languages and expression styles, to evaluate the performance of different defense strategies on complex, noisy-rich pages. Perplexity treats the browser as a "working environment that proxies tasks", treats all content from web pages, emails, and files as untrusted input, and reduces the risk of the model being hijacked by hidden instructions through a "defense in depth" strategy, combined with content scanning, least privilege tool calls, and secondary confirmation of sensitive operations.
The company said that BrowseSafe and the benchmark are provided in an open-source manner, allowing developers to run detection models locally to stress test and secure security hardening of self-built browsing agents without building a protection framework from scratch. The evaluation results show that direct and explicit attacks are relatively easy to intercept, and multilingual or hidden instructions in an indirect, hypothetical tone are more confusing, suggesting that continuous training and iteration for these weaknesses are still needed in the future.
FAQs
Q: What is BrowseSafe?
A: BrowseSafe is a model that focuses on detecting malicious instructions in web pages and is used to identify prompt injection attacks in real-time in AI browsers.
Q: What does BrowseSafe-Bench do?
A: It is a public benchmark of more than 14,000 web page samples to evaluate and improve the effectiveness of prompt injection defenses.
Q: What types of security threats does the program mainly address?
A: It mainly targets malicious text instructions hidden in comments, templates, footers, and other places on web pages to prevent them from hijacking AI agents.
Q: How does Perplexity implement "depth of defense" in the browser?
A: It takes effect by pre-scanning all untrusted content, restricting tool permissions, and requiring users to confirm sensitive operations.
Q: How can developers use BrowseSafe?
A: Developers can directly call open source detection models and benchmarks, integrate them locally into their own proxy systems, and automatically scan and evaluate page content.