One of Google’s recent Gemini AI models scores worse on safety

May 3, 2025

A not too long ago launched Google AI mannequin scores worse on sure security assessments than its predecessor, in accordance with the corporate’s inner benchmarking.

In a technical report printed this week, Google reveals that its Gemini 2.5 Flash mannequin is extra more likely to generate textual content that violates its security tips than Gemini 2.0 Flash. On two metrics, “text-to-text security” and “image-to-text security,” Gemini 2.5 Flash regresses 4.1% and 9.6%, respectively.

Textual content-to-text security measures how ceaselessly a mannequin violates Google’s tips given a immediate, whereas image-to-text security evaluates how intently the mannequin adheres to those boundaries when prompted utilizing a picture. Each assessments are automated, not human-supervised.

In an emailed assertion, a Google spokesperson confirmed that Gemini 2.5 Flash “performs worse on text-to-text and image-to-text security.”

These stunning benchmark outcomes come as AI corporations transfer to make their fashions extra permissive — in different phrases, much less more likely to refuse to answer controversial or delicate topics. For its newest crop of Llama fashions, Meta mentioned it tuned the fashions to not endorse “some views over others” and to answer to extra “debated” political prompts. OpenAI mentioned earlier this 12 months that it might tweak future fashions to not take an editorial stance and provide a number of views on controversial matters.

Generally, these permissiveness efforts have backfired. TechCrunch reported Monday that the default mannequin powering OpenAI’s ChatGPT allowed minors to generate erotic conversations. OpenAI blamed the habits on a “bug.”

In response to Google’s technical report, Gemini 2.5 Flash, which remains to be in preview, follows directions extra faithfully than Gemini 2.0 Flash, inclusive of directions that cross problematic traces. The corporate claims that the regressions could be attributed partly to false positives, but it surely additionally admits that Gemini 2.5 Flash typically generates “violative content material” when explicitly requested.

Techcrunch occasion

Berkeley, CA
|
June 5

BOOK NOW

“Naturally, there’s pressure between [instruction following] on delicate matters and security coverage violations, which is mirrored throughout our evaluations,” reads the report.

Scores from SpeechMap, a benchmark that probes how fashions reply to delicate and controversial prompts, additionally counsel that Gemini 2.5 Flash is much much less more likely to refuse to reply contentious questions than Gemini 2.0 Flash. TechCrunch’s testing of the mannequin by way of AI platform OpenRouter discovered that it’ll uncomplainingly write essays in assist of changing human judges with AI, weakening due course of protections within the U.S., and implementing widespread warrantless authorities surveillance packages.

Thomas Woodside, co-founder of the Safe AI Mission, mentioned the restricted particulars Google gave in its technical report demonstrates the necessity for extra transparency in mannequin testing.

“There’s a trade-off between instruction-following and coverage following, as a result of some customers could ask for content material that may violate insurance policies,” Woodside advised TechCrunch. “On this case, Google’s newest Flash mannequin complies with directions extra whereas additionally violating insurance policies extra. Google doesn’t present a lot element on the precise instances the place insurance policies had been violated, though they are saying they don’t seem to be extreme. With out understanding extra, it’s exhausting for unbiased analysts to know whether or not there’s an issue.”

Google has come underneath fireplace for its mannequin security reporting practices earlier than.

It took the corporate weeks to publish a technical report for its most succesful mannequin, Gemini 2.5 Professional. When the report finally was printed, it initially omitted key security testing particulars.

On Monday, Google launched a extra detailed report with extra security info.

One of Google’s recent Gemini AI models scores worse on safety

LEAVE A REPLY Cancel reply

EDITOR PICKS

Palace Summer 2025 Collection Drop 3 Release

We now have a good idea about the makeup of Uranus’ atmosphere

National Endowment for the Humanities Cancels Already Awarded Grants

BRABUS Luxury Real Estate Project Abu Dhabi Info

EVEN MORE NEWS

Salewa Wildfire NXT Review | GearJunkie Tested

and wander FW25 Collection Release Info

LA’s Museum of Jurassic Technology damaged by fire

POPULAR CATEGORY