In the 19th century, the invention of railways led to a panoply of safety concerns. Some were well-founded—such as the risk of accidents and environmental pollution. Others were not—such as the ideas that railways caused madness or that the human body could not withstand railway speeds, including that “uteruses would fly out of [female] bodies as they were accelerated ”.
Current debates about AI safety1 are at a similar stage, mixing science, speculation and hype. A couple of factors add to the confusion. First, like railways, AI is a general-purpose technology, with effects and applications across many sectors. Second, AI has major technical complexity, well beyond that of railways, resulting in lack of understanding even among some active AI safety practitioners.
Before I dive in further, I’ll make my standard disclaimer that, by talking about AI safety, I am not questioning the benefits of AI.2 We have many beneficial inventions—e.g. aircraft, automobiles, medicines, chainsaws, to name a very few—that are properly subject to detailed safety regimes. I am Founder & CEO of LearnerShape, an AI-driven edtech company that is commercializing PlaylistBuilder, and I believe strongly in a bright and beneficial future for AI, if done responsibly.
Lost in the maze of model testing
I recently attended a couple of events with leaders in the AI safety field3 that brought home to me the true scale of the AI safety mess. There were at least two major problems with the discussions at these events:
almost complete lack of agreement on what AI harms deserve attention, an issue on which I have written multiple times before; and
extreme focus on ‘testing’ of large language models against ‘benchmarks’ and ‘standards’, a narrow focus that ignores the true nature of the AI safety problem.
It is this second problem on which I want to focus in this blog.
There is currently rapid growth of AI ‘testing’ initiatives, led by a global network of AI safety institutes (AISIs), starting with the UK and US AISIs established in November 2023, and a year later including 9 countries plus the EU at the launch of the global network.
Private companies are also making related efforts, such as Anthropic’s Responsible Scaling Policy and OpenAI’s safety program, both of which involve model evaluation.
Such testing appears likely to become even more widespread, including because Chapter V of the EU AI Act will require testing of ‘general-purpose AI models’ that pose ‘systemic risk’ beginning in 2025.
This growth of testing efforts sounds promising … until one takes the time to consider why it cannot solve the real challenges of safe AI deployment.
What the AISIs and some private entities seem to be doing is akin testing a computer to determine whether it is ‘safe’. But this would make no sense. It is obvious that ‘safety’ of a computer depends on how it is used—e.g. in a word processor vs. a nuclear attack warning system. The same is true of AI systems.
Others recognise this problem. AI researcher Andrew Critch wrote this summer that “There are no technical advances in AI that are safe per se; the safety or unsafety of an idea is a function of the human environment in which the idea lands.” Likewise, in a presentation on the UK Advanced Research and Invention Agency (ARIA) Safeguarded AI program, ARIA program director David “davidad” Dalrymple said: “To be clear, we don't have any viable pathway to get increasingly precise safety guarantees on fully general purpose AI agents.”
This problem appears likely to get worse before it gets better, for a couple of reasons. First, the emergence of ChatGPT and its imitators has led to a nearly-obsessive AI market focus on testing large language models (LLMs) and similar generative AI. But LLMs are beginning to show reduced progress, and it seems nearly certain that AI will eventually move on to other techniques, which will render work on testing LLMs obsolete and require new methods.
Second, the AI safety community is, at best, moderately technically informed, and is rapidly losing the plot of increasingly complex AI technologies. At one of the recent events I intended, a technical expert complained about this fact, and he was met immediately with objections that policy-makers can address AI harms without deep technical understanding. Well … no. It is a well-known phenomenon that technological progress is often too rapid for regulators to keep up, and AI presents a particularly wicked version of that challenge.
Moving towards application-specific, multi-dimensional AI safety?
The way out of this maze is to recognize that AI safety initiatives (1) must be application-specific and (2) are almost always multi-dimensional.
There are reasons for hope that such an approach will eventually gain traction. Although the very detailed regulatory approach of the EU AI Act appears hard to manage and likely to be bad overall for EU success in the AI economy, the AI Act does take a risk-based approach to individual “AI systems”, which may include both general-purpose systems like LLMs and application-specific systems. Likewise, immediately after davidad’s statement quoted above on intractability of safety methods for general purpose AI agents, he explained that ARIA is “hoping to construct a general purpose AI workflow that is used only to produce domain specific AI applications or agents or AI decision support systems for managing cyber, physical, or otherwise well-defined systems … .”
But even if sane application-specific safety approaches replace current LLM madness, the path forward is not straightforward. The AI safety community currently has an excessive focus on regulation. This is not sensible, because of both the inevitable inability of regulators to keep up with technical progress (noted above) and, more importantly, new regulation being only one of multiple tools to promote AI safety, and in most cases not the most important tool.
Main tools to address the multi-dimensional challenge of AI safety include:
product design — This choice of where and how AI makes sense in a product or system, and of which AI techniques are fit-for-purpose, is probably the most important AI safety tool. No one would use hallucination-prone LLMs for air traffic control.
technical safety — Technical understanding of AI models and development of associated safety techniques (e.g. Anthropic’s work on mechanistic interpretability) is essential.
governance — Even a well-designed AI system can be misused. For example, Tesla automobiles are extremely safe, but there have been multiple Tesla fatalities involving drivers who have used its Autopilot features in unintended ways. Designing governance processes for technical systems is a challenging disclipline, involving behavioral, organizational and technical dimensions.
application of existing laws — Even where regulation of AI is needed, existing law is likely to be sufficient in many circumstances. For example, the US Federal Trade Commission has begun using its authority to regulate ‘‘unfair or deceptive acts or practices in or affecting commerce” in the AI sector (although the FTC’s enforcement approach is likely to change with the replacement of current FTC Chair Lina Khan by President Trump’s appointee Andrew Ferguson).
new, AI-specific laws — In general, new laws on AI should only be necessary to the extent existing laws are insufficient, and where there are sufficient risks that require legislative attention.
Looking from the 19th century to the 22nd century
Much of the debate about AI safety has been driven by predictions that runaway AI could spell doom for humanity. Although attention to the doomsayers seems to be waning (and it has become old hat to worry about paperclips or gray goo), the doom story continues to be fomented by unfounded hype like Sam Altman’s recent prediction that we may see “artificial general intelligence” (AGI) by 2025 (which appears to be driven as much by the terms of OpenAI’s agreement with Microsoft as by Sam’s beliefs).
While risks of superintelligent AGI should not be discounted, my personal view is that we should be playing the long game, and that technological progress on AI will not be as fast as some suppose (while we should not expect another imminent AI “winter”, current hype is excessive).
Western societies did not stabilize from the Industrial Revolution of the 18th and 19th centuries until perhaps the early 20th century. I believe a reasonable goal for stabilization of societal change driven by AI is the end of this century, just over 75 years away. This is shorter than the period since the end of World War II, when the first general-purpose digital computers were developed and whose effects on our society are only fairly recently pervasive. Of course, I may be wrong … many say that history is accelerating.
But right or wrong, we deserve a program to take AI safety from 19th century confusion towards a future of hoped-for abundance. I look forward to continuing to tread that path with a sensible view on what evolving AI systems can and can’t, should and shouldn’t, do—controlled through a multi-dimensional approach like that articulated above.
We have developed stable, multi-dimensional approaches to safety of railways, automobile and aircraft. There is no obvious reason that the same cannot be achieved with AI.
I use the term “AI safety” to cover the full set of harms that AI can produce, which we’re seeking to identify (with potential solutions) via Saihub. Some others use the term more narrowly.
This disclaimer is prompted by widespread and sometimes misguided criticisms of AI safety work as “impeding technological progress”.
I intend no criticism of well-intentioned, well-organized people and events.