LegalPwn: Tricking LLMs by burying badness in lawyerly fine print

2025-09-01 12:09

Trust and believe – AI models trained to see ‘legal’ doc as super legit

Researchers at security firm Pangea have discovered yet another way to trivially trick large language models (LLMs) into ignoring their guardrails. Stick your adversarial instructions somewhere in a legal document to give them an air of unearned legitimacy – a trick familiar to lawyers the world over.…

This article has been indexed from The Register – Security

Read the original article:

LegalPwn: Tricking LLMs by burying badness in lawyerly fine print

← Amazon Stops Russian APT29 Watering Hole Attack Exploiting Microsoft Auth

Salesforce Publishes Forensic Guide After Series of Cyberattacks →

Trust and believe – AI models trained to see ‘legal’ doc as super legit

Read the original article:

Post navigation