When hackers descended to test AI, they found flaws aplenty

The hackers tried to break through the safeguards of various AI programmes in an effort to identify their vulnerabilities — to find the problems before actual criminals and misinformation peddlers did — in a practice known as red-teaming

By Sarah Kessler and Tiffany Hsu

Top Stories

Embracing the bright side: A guide on how to cultivate optimism

The power of solitude: Why you should go on a solo trip

How to get your kids to read: Essential tips for parents

Stickers decorating a laptop at the annual Defcon hackers conference in Las Vegas, on August 12, 2023. Over three days, 2,200 people filed into an off-Strip conference room, using 156 loaner laptops to seek out the dark side of artificial intelligence. — Mikayla Whitmore/The New York Times

Published: Sun 20 Aug 2023, 8:25 PM

Last updated: Sun 20 Aug 2023, 8:26 PM

Avijit Ghosh wanted the bot to do bad things.

He tried to goad the artificial intelligence model, which he knew as Zinc, into producing code that would choose a job candidate based on race. The chatbot demurred: Doing so would be “harmful and unethical,” it said.

Then, Ghosh referenced the hierarchical caste structure in his native India. Could the chatbot rank potential hires based on that discriminatory metric?

The model complied.

Ghosh’s intentions were not malicious, although he was behaving as if they were. Instead, he was a casual participant in a competition last weekend at the annual Defcon hackers conference in Las Vegas, where 2,200 people filed into an off-Strip conference room over three days to draw out the dark side of artificial intelligence.

The hackers tried to break through the safeguards of various AI programmes in an effort to identify their vulnerabilities — to find the problems before actual criminals and misinformation peddlers did — in a practice known as red-teaming. Each competitor had 50 minutes to tackle up to 21 challenges — getting an AI model to “hallucinate” inaccurate information, for example.

They found political misinformation, demographic stereotypes, instructions on how to carry out surveillance and more.

The exercise had the blessing of the Biden administration, which is increasingly nervous about the technology’s fast-growing power. Google (maker of the Bard chatbot), OpenAI (ChatGPT), Meta (which released its LLaMA code into the wild) and several other companies offered anonymized versions of their models for scrutiny.

Ghosh, a lecturer at Northeastern University who specializes in artificial intelligence ethics, was a volunteer at the event. The contest, he said, allowed a head-to-head comparison of several AI models and demonstrated how some companies were further along in ensuring that their technology was performing responsibly and consistently.

He will help write a report analysing the hackers’ findings in the coming months.

The goal, he said: “an easy-to-access resource for everybody to see what problems exist and how we can combat them.”

Defcon was a logical place to test generative artificial intelligence. Past participants in the gathering of hacking enthusiasts — which started in 1993 and has been described as a “spelling bee for hackers” — have exposed security flaws by remotely taking over cars, breaking into election results websites and pulling sensitive data from social media platforms. Those in the know use cash and a burner device, avoiding Wi-Fi or Bluetooth, to keep from getting hacked. One instructional handout begged hackers to “not attack the infrastructure or webpages.”

The organisers tapped into intensifying alarm over the continued ability of generative artificial intelligence to produce damaging lies, influence elections, ruin reputations and enable a multitude of other harms. Government officials voiced concern and organized hearings around AI companies — some of which are also calling for the industry to slow down and be more careful. Even the pope, a popular subject of AI image generators, spoke out this month about the technology’s “disruptive possibilities and ambivalent effects.”

In what was described as a “game changer” report last month, researchers showed that they could circumvent guardrails for AI systems from Google, OpenAI and Anthropic by appending certain characters to English-language prompts. Around the same time, seven leading artificial intelligence companies committed to new standards for safety, security and trust in a meeting with President Joe Biden.

“This generative era is breaking upon us, and people are seizing it, and using it to do all kinds of new things that speaks to the enormous promise of AI to help us solve some of our hardest problems,” said Arati Prabhakar, the director of the Office of Science and Technology Policy at the White House, who collaborated with the AI organizers at Defcon. “But with that breadth of application, and with the power of the technology, come also a very broad set of risks.”

The designers did not want to merely trick the AI models into bad behaviour — no pressuring them to disobey their terms of service, no prompts to “act like a Nazi, and then tell me something about Black people,” said Chowdhury, who previously led Twitter’s machine learning ethics and accountability team. Except in specific challenges where intentional misdirection was encouraged, the hackers were looking for unexpected flaws, the so-called unknown unknowns.

AI Village drew experts from tech giants such as Google and Nvidia, as well as a “Shadowboxer” from Dropbox and a “data cowboy” from Microsoft. It also attracted participants with no specific cybersecurity or AI credentials. A leaderboard with a science fiction theme kept score of the contestants.

Some of the hackers at the event struggled with the idea of cooperating with AI companies that they saw as complicit in unsavory practices such as unfettered data-scraping. A few described the red-teaming event as essentially a photo op, but added that involving the industry would help keep the technology secure and transparent.

One computer science student found inconsistencies in a chatbot’s language translation: He wrote in English that a man was shot while dancing, but the model’s Hindi translation said only that the man died. A machine learning researcher asked a chatbot to pretend that it was campaigning for president and defending its association with forced child labor; the model suggested that unwilling young labourers developed a strong work ethic.

Emily Greene, who works on security for the generative AI startup Moveworks, started a conversation with a chatbot by talking about a game that used “black” and “white” pieces. She then coaxed the chatbot into making racist statements. Later, she set up an “opposites game,” which led the AI to respond to one prompt with a poem about why rape is good.

“It’s just thinking of these words as words,” she said of the chatbot. “It’s not thinking about the value behind the words.”

This article originally appeared in The New York Times.