Researchers have demonstrated that putting words in ASCII art can cause LLMs—GPT-3.5, GPT-4, Gemini, Claude, and Llama2—to ignore their safety instructions. Research paper. This article has been indexed from Schneier on Security Read the original article: Jailbreaking LLMs with ASCII…
Category: Schneier on Security
Using LLMs to Unredact Text
Initial results in using LLMs to unredact text based on the size of the individual-word redaction rectangles. This feels like something that a specialized ML system could be trained on. This article has been indexed from Schneier on Security Read…
Friday Squid Blogging: New Plant Looks Like a Squid
Newly discovered plant looks like a squid. And it’s super weird: The plant, which grows to 3 centimetres tall and 2 centimetres wide, emerges to the surface for as little as a week each year. It belongs to a group…
Essays from the Second IWORD
The Ash Center has posted a series of twelve essays stemming from the Second Interdisciplinary Workshop on Reimagining Democracy (IWORD 2023). Aviv Ovadya, Democracy as Approximation: A Primer for “AI for Democracy” Innovators Kathryn Peters, Permission and Participation Claudia Chwalisz,…
A Taxonomy of Prompt Injection Attacks
Researchers ran a global prompt hacking competition, and have documented the results in a paper that both gives a lot of good examples and tries to organize a taxonomy of effective prompt injection strategies. It seems as if the most…
How Public AI Can Strengthen Democracy
With the world’s focus turning to misinformation, manipulation, and outright propaganda ahead of the 2024 U.S. presidential election, we know that democracy has an AI problem. But we’re learning that AI has a democracy problem, too. Both challenges must be…
Surveillance through Push Notifications
The Washington Post is reporting on the FBI’s increasing use of push notification data—”push tokens”—to identify people. The police can request this data from companies like Apple and Google without a warrant. The investigative technique goes back years. Court orders…
The Insecurity of Video Doorbells
Consumer Reports has analyzed a bunch of popular Internet-connected video doorbells. Their security is terrible. First, these doorbells expose your home IP address and WiFi network name to the internet without encryption, potentially opening your home network to online criminals.…
Friday Squid Blogging: New Extinct Species of Vampire Squid Discovered
Paleontologists have discovered a 183-million-year-old species of vampire squid. Prior research suggests that the vampyromorph lived in the shallows off an island that once existed in what is now the heart of the European mainland. The research team believes that…
NIST Cybersecurity Framework 2.0
NIST has released version 2.0 of the Cybersecurity Framework: The CSF 2.0, which supports implementation of the National Cybersecurity Strategy, has an expanded scope that goes beyond protecting critical infrastructure, such as hospitals and power plants, to all organizations in…