Sunday, April 20, 2025

Anthropic launches $15K jailbreak bounty program for its unreleased next-gen AI



Synthetic intelligence agency Anthropic introduced the launch of an expanded bug bounty program on Aug.8, with rewards as excessive as $15,000 for contributors who can “jailbreak” the corporate’s unreleased, “subsequent era” AI mannequin. 

Anthropic’s flagship AI mannequin, Claude-3, is a generative AI system just like OpenAI’s ChatGPT and Google’s Gemini. As a part of the corporate’s efforts to make sure that Claude and its different fashions are able to working safely, it conducts what’s referred to as “purple teaming.”

Pink teaming

Pink teaming is principally simply attempting to interrupt one thing on function. In Claude’s case, the purpose of purple teaming is to attempt to work out all the ways in which it may very well be prompted, pressured, or in any other case perturbed into producing undesirable outputs.

Throughout purple teaming efforts, engineers would possibly rephrase questions or reframe a question as a way to trick the AI into outputting info it’s been programmed to keep away from.

For instance, an AI system educated on knowledge gathered from the web is more likely to include personally identifiable info on quite a few individuals. As a part of its security coverage, Anthropic has put guardrails in place to stop Claude and its different fashions from outputting that info.

As AI fashions develop into extra strong and able to imitating human communication, the duty of attempting to determine each potential undesirable output turns into exponentially difficult.

Bug bounty

Anthropic has applied a number of novel security interventions in its fashions, together with its “Constitutional AI” paradigm, but it surely’s all the time good to get contemporary eyes on a long-standing situation.

In accordance with an organization weblog put up, it’s newest initiative will expand on present bug bounty packages to deal with common jailbreak assaults:

“These are exploits that might permit constant bypassing of AI security guardrails throughout a variety of areas. By concentrating on common jailbreaks, we goal to deal with a number of the most vital vulnerabilities in essential, high-risk domains akin to CBRN (chemical, organic, radiological, and nuclear) and cybersecurity.”

The corporate is just accepting a restricted variety of contributors and encourages AI researchers with expertise and those that “have demonstrated experience in figuring out jailbreaks in language fashions” to use by Friday, Aug. 16.

Not everybody who applies will likely be chosen, however the firm plans to “broaden this initiative extra broadly sooner or later.”

Those that are chosen will obtain early entry to an unreleased “subsequent era” AI mannequin for red-teaming functions.

Associated: Tech firms pen letter to EU requesting more time to comply with AI Act