Google's latest AI safety report explores AI beyond human control

wildpixel/ iStock/Getty Images Plus via Getty Images

Follow ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

Google latest Frontier Safety Framework explores
It identifies three risk categories for AI.
Despite risks, regulation remains slow.

One of the great ironies of the ongoing AI boom has been that as the technology becomes more technically advanced, it also becomes more unpredictable. AI’s “black box” gets darker as a system’s number of parameters — and the size of its dataset — grows. In the absence of strong federal oversight, the very tech companies that are so aggressively pushing consumer-facing AI tools are also the entities that, by default, are setting the standards for the safe deployment of the rapidly evolving technology.

Also: AI models know when they’re being tested – and change their behavior, research shows

On Monday, Google published the latest iteration of its Frontier Safety Framework (FSF), which seeks to understand and mitigate the dangers posed by industry-leading AI models. It focuses on what Google describes as “Critical Capability Levels,” or CCLs, which can be thought of as thresholds of ability beyond which AI systems could escape human control and therefore endanger individual users or society at large.

Google published its new framework with the intention of setting a new safety standard for both tech developers and regulators, noting they can’t do it alone.

“Our adoption of them would result in effective risk mitigation for society only if all relevant organisations provide similar levels of protection,” the company’s team of researchers wrote.

Also: AI’s not ‘reasoning’ at all – how this team debunked the industry hype

The framework builds upon ongoing research throughout the AI industry to understand models’ capacity to deceive and sometimes even threaten human users when they perceive that their goals are being undermined. This capacity (and its accompanying danger) has grown with the rise of AI agents, or systems that can execute multistep tasks and interact with a plethora of digital tools with minimal human oversight.

Three categories of risk

The new Google framework identifies three categories of CCLs.

The first is “misuse,” in which models provide assistance with the execution of cyber attacks, the manufacture of weapons (chemical, biological, radiological, or nuclear), or the malicious and intentional manipulation of human users.

The second is “machine learning R&D,” which refers to technical breakthroughs in the field that increase the likelihood that new risks will arise in the future. For example, picture a tech company deploying an AI agent whose sole responsibility is to devise ever more efficient means of training new AI systems, with the result being that the inner workings of the new systems being churned out are increasingly difficult for humans to understand.

Also: Will AI think like humans? We’re not even close – and we’re asking the wrong question

Then there are what the company describes as “misalignment” CCLs. These are defined as instances in which models with advanced reasoning capabilities manipulate human users through lies or other kinds of deception. The Google researchers acknowledge that this is a more “exploratory” area compared to the other two, and their suggested means of mitigation — a “monitoring system to detect illicit use of instrumental reasoning capabilities” — is therefore somewhat hazy.

“Once a model is capable of effective instrumental reasoning in ways that cannot be monitored, additional mitigations may be warranted — the development of which is an area of active research,” the researchers said.

At the same time, in the background of Google’s new safety framework is a growing number of accounts of AI psychosis, or instances in which extended use of AI chatbots causes users to slip into delusional or conspiratorial thought patterns as their preexisting worldviews are recursively mirrored back to them by the models.

Also: If your child uses ChatGPT in distress, OpenAI will notify you now

How much of a user’s reaction can be attributed to the chatbot itself, however, is still a matter of legal debate, and fundamentally unclear at this point.

A complex safety landscape

For now, many safety researchers concur that frontier models that are available and in use are unlikely to carry out the worst of these risks today — much safety testing tackles issues future models could exhibit and aims to work backward to prevent them. Still, amidst mounting controversies, tech developers have been locked in an escalating race to build more lifelike and agentic AI chatbots.

Also: Bad vibes: How an AI agent coded its way to disaster

In lieu of federal regulation, those same companies are the primary bodies studying the risks posed by their technology and determining safeguards. OpenAI, for example, recently introduced measures to notify parents when kids or teens are exhibiting signs of distress while using ChatGPT.

In the balance between speed and safety, however, the brute logic of capitalism has tended to prioritize the former.

Some companies have been aggressively pushing out AI companions, virtual avatars powered by large language models and intended to engage in humanlike — and sometimes overtly flirtatious — conversations with human users.

Also: Even OpenAI CEO Sam Altman thinks you shouldn’t trust AI for therapy

Although the second Trump administration has taken a generally lax approach to the AI industry, giving it broad leeway to build and deploy new consumer-facing tools, the Federal Trade Commission (FTC) launched an investigation earlier this month into seven AI developers (including Alphabet, Google’s parent company) to understand how the use of AI companions could be harming kids.

Local legislation is trying to create protections in the meantime. California’s State Bill 243, meanwhile, which would regulate the use of AI companions for children and some other vulnerable users, has passed both the State Assembly and Senate, and needs only to be signed by Governor Gavin Newsom before becoming state law.

What's Hot

My Favorite Airbnb: A Stylish Studio in Amsterdam’s Jordaan District

Oldest Shell Jewelry Workshop in Western Europe Dates Back 42,000 Years

Pottery Barn Outlet End-of-Summer Deals Start at $6

Oldest Shell Jewelry Workshop in Western Europe Dates Back 42,000 Years

Moonflow and Everything Dead & Dying

AI startup Friend spent more than $1M on all those subway ads

Tiny Wi-Fi gadget smashes Kickstarter with $600,000 as thousands rush to back remote PC control innovation

Should you buy a Windows mini PC in 2025? My verdict after a week of testing

Today’s NYT Mini Crossword Answers for Sept. 28

My Favorite Airbnb: A Stylish Studio in Amsterdam’s Jordaan District

Oldest Shell Jewelry Workshop in Western Europe Dates Back 42,000 Years

Pottery Barn Outlet End-of-Summer Deals Start at $6

Moonflow and Everything Dead & Dying

Cuts to ICB nurse leaders ‘risk patient safety’, RCN warns

TechCrunch Mobility: Waymo’s Big Apple score and Nvidia backs Nuro

How to Create Your Own Summer to Fall Transition at Home

News

catrgories

useful link

Subscribe to Updates

What's Hot

Google’s latest AI safety report explores AI beyond human control

ZDNET’s key takeaways

Three categories of risk

A complex safety landscape

Related Posts

News

catrgories

useful link

Subscribe to Updates