We are moving rapidly and inevitably towards a world of pervasive AI—one where:
humans will often interact with AIs in roles where we today interact with other humans
AIs will play fundamentally new roles (most of which we have not yet imagined).
A crucial requirement for this new world to be a desirable (and not dangerously unstable) one for humans and the natural environment is trust in AI systems. The dictionary definition of ‘trust’ is:
firm belief in the reliability, truth, or ability of someone or something1.
It is widely recognized that trust in AI is crucial. However, understandably, disciplined engagement with trust in AI has been blinkered by efforts to define and promote ‘trustworthy AI’2. This has often involved setting out global ‘principles’ for ‘trustworthy AI’, typically with detail on how such principles should be implemented.
While such work on AI principles is valuable, the new world of pervasive AI will require a much more technical focus on trust, applied to a massive variety of specific, local circumstances. This focused approach requires a narrower, functional definition of ‘trust’ in AI, which I adapt from the dictionary definition above:
firm3 and evidence-supported belief in the ability and reliability4
, truth,5or abilityof an AI system to meet defined requirements.
Early work has begun on such technical, functional approaches to AI trust. This blog aims to illustrate this work, and to set out a vision of where it could lead. But first, it’s important to explain the current state of play on ‘trustworthy AI’.
Principles for ‘trustworthy AI’
The earliest major effort to define ‘trustworthy AI’ was the 2019 Ethics Guidelines for Trustworthy Artificial Intelligence, issued by the EU High-Level Expert Group on AI. These guidelines specify that trustworthy AI must be legal, ethical and robust, and set out 7 key requirements to address the latter two criteria: (1) human agency and oversight, (2) technical robustness and safety, (3) privacy and data governance, (4) transparency, (5) diversity, non-discrimination and fairness, (6) societal and environmental well-being and (7) accountability.
These principles ultimately led to adoption in 2024 of the EU AI Act—the aim of which “is to foster trustworthy AI in Europe”. The EU AI Office is now working hard to translate the Act into detailed regulatory obligations, including through the General Purpose AI Code of Practice and developing rules for high-risk AI systems.
There are many similar efforts. For example, Nvidia—the hardware engine of the AI economy—has articulated four ‘guiding principles’ for trustworthy AI, which echo a subset of the EU principles: (1) privacy, (2) safety and security, (3) transparency and (4) non-discrimination.
Such principles-based approaches are understandable, because they address a variety of known risks associated with AI, and they are structurally similar to principles-based approaches taken in other governance and regulatory contexts involving digital technologies (e.g. content regulation, competition regulation). Such approaches are also (somewhat) useful in guiding governance of AI systems.
However, applying the general principles of ‘trustworthy AI’ to specific use cases in the hugely diverse and technically complex spectrum of AI applications is rapidly devolving into a morass of competing efforts and differing approaches.
The challenge of principles-based regulation of AI became apparent to me while building Saihub.info (the Safe and Responsible AI Information Hub). In an effort to bring sense to confusing discussions of AI safety (which I now prefer to call ‘responsible AI’), Saihub.info sought to identify specific AI harms (e.g. misinformation vs. model vulnerability vs. existential risk) and evolving technical and regulatory approaches to each type of harm. However, it became apparent that this approach was bogging down in complexity6, and that attention is better directed towards application-specific, multi-dimensional approaches to responsible AI.7
Likewise, governments on both sides of the Atlantic are recognizing the ineffectiveness of a blanket principles-based approach to trustworthy AI from a regulatory perspective. The Trump Administration has pulled back from any significant regulation of AI. And in the EU—which has been the leader in AI regulation—senior officials are starting to express doubt about negative effects on EU AI competitiveness from the highly prescriptive rules under the EU AI Act.
Technical requirements for AI trust
Let’s return to the narrower, functional definition of ‘trust’ that I proposed above:
firm and evidence-supported belief in the ability and reliability of an AI system to meet defined requirements.
In this definition, “meet[ing] defined requirements” is intended to imply also not fulfilling some other unintended objective. This is a crucial component of AI alignment.
I was inspired to write this blog because I have started to see a variety of technical initiatives toward trust in AI along the lines of this definition. Let me provide three examples.
The first is the January 2025 Google DeepMind paper Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography. The paper explains how AI-based ‘Trusted Capable Model Environments’ (TCMEs) could serve as intermediaries to support communications requiring privacy that are infeasible with current techniques—e.g. machine learning researchers collaborating without disclosing unpublished advances, or a landlord checking for absence of damage at a property without viewing tenants’ private activities. The paper posits that a TCME “can provide privacy guarantees under specific scenarios where the model has no explicit way to leak knowledge” if it includes:
information flow control, including tamper-proof restrictions on how data flow in and out of the model, and on how specific data types are processed
statelessness, to ensure that models cannot self-modify by learning from private data
trustworthiness—i.e. alignment with user expectations (this is a rather fuzzy criterion, which should be further developed to avoid the pitfalls of current approaches to ‘trustworthy AI’ discussed above)
verifiability—i.e. the ability of users to verify state of model hardware and information flow (as well as software, presumably).
The authors acknowledge that these capabilities do not exist now and do not propose means to achieve them, but the TCME vision is a creative and exciting one.
The second example is the Safeguarded AI program of the UK Advanced Research + Invention Agency (ARIA). ARIA, a ‘moonshot’ agency inspired by the US Advanced Research and Projects Agency (ARPA)8, describes Safeguarded AI like this:
By combining scientific world models and mathematical proofs we will aim to construct a ‘gatekeeper’, an AI system tasked with understanding and reducing the risks of other AI agents. In doing so we’ll develop quantitative safety guarantees for AI in the way we have come to expect for nuclear power and passenger aviation.
The Safeguarded AI program has begun with basics, including developing mathematical methods and syntax for formally modelling the real-world domains in which AI models operate.
Safeguarded AI has a significantly broader mission than TCMEs, and therefore less narrow technical specifications. However, ARIA recognizes the need for technical focus on specific use cases. In a presentation last year on the Safeguarded AI program, program director David “davidad” Dalrymple noted: “To be clear, we don't have any viable pathway to get increasingly precise safety guarantees on fully general purpose AI agents.”
The third example addresses a component of AI trust technology—data integrity—rather than specific applications. In a recent essay for the Association for Computing Machinery, Bruce Schneier (everyone’s favorite expert on computer security) and Davi Ottenheimer write:
If you’ve ever taken a computer security class, you’ve probably learned about the three legs of computer security—confidentiality, integrity, and availability—known as the CIA triad. When we talk about a system being secure, that's what we’re referring to. All are important, but to different degrees in different contexts. In a world populated by artificial intelligence (AI) systems and artificial intelligent agents, integrity will be paramount.
What is data integrity? It’s ensuring that no one can modify data—that’s the security angle—but it’s much more than that. It encompasses accuracy, completeness, and quality of data—all over both time and space.
The essay examines key considerations for delivering data integrity for AI and for Web 3.0, and concludes that “it’s time for new integrity-focused standards to enable the trusted AI services of tomorrow”.
Local solutions and general frameworks
The choice in the previous section to explain by example is not just an expository tool—it is how we must approach trust in AI (and other aspects of responsible AI). The reason is that trust issues are so complex and technical as to require highly situation-specific implementations.
I have referred to this situation-specific approach as a ‘local’ approach in several recent blogs, including this one:
[R]esponsible AI … requires a detailed focus on local conditions. These local conditions include many features such as specific applications, types of users, training and inference data, AI models and other technical details, locations, and various other factors.
My hope for AI and society is that a thousand flowers will bloom of local projects that take a disciplined approach to trust in AI.
On the other hand, local implementations do not preclude global frameworks—see my posts on “Think global, act local”—and there appears to be a significant opportunity to build global technical frameworks to further trust in AI.
There are many precedents for global frameworks that support local implementations. A prominent example is the large framework of standards (usually in the format of RFCs) that govern the huge diversity of the Internet, some of it under the informal control of ICANN. In the start-up context, my family has an investment in StackOne, which provides an integration platform that allows AI agent providers and SaaS platforms to integrate easily with hundreds of other software products. Likewise, my own start-up LearnerShape delivers ‘open source learning infrastructure’ that can support a wide variety of edtech applications—especially ones using AI to assess content relevance—using standard software components.
There appears to be a similar opportunity to standardize software components for trust in AI. For example, there appears to be a major start-up opportunity for companies that provide flexible components for AI trust infrastructure (e.g. TCMEs like those in the first example above). Such ventures would require substantial research and development, but therein lies the opportunity (as for virtually any instance of disruptive technology). Now is the time to get started to build such components of the trust infrastructure required for a world of pervasive AI.
And if you’re interested in setting up such a venture, please reach out here on Substack. I’d love to be involved!
I have no quarrel with the concept of ‘trustworthy AI’, but the efforts to implement it have become too broad and non-specific to be effective.
‘Firm’ belief can be defined probabilistically, using methods such as Bayesian statistics, which formalize approaches for updating belief.
My definition reverses ‘ability’ and ‘reliability’ on the principle (which is not absolute) that the technical methods described in this blog should first identify how a trust-related capability can be delivered, and then investigate how to do so reliably.
Truth is often important, but it can be subjective, and it is not always a requirement for an AI system (e.g. one intended to produce creative writing or art). Where relevant, truth can be a defined requirement of an AI system, to a specified degree of certainty (e.g. in terms of bias and variance).
The world does urgently need ongoing work to identify AI harms and approaches to address them, and there are much better-resourced projects than Saihub.info working on these challenges. These include the January 2025 International AI Safety Report led by Yoshua Bengio and the MIT AI Risk Repository.
Saihub.info remains available for the time being, with occasional updates. If you are interested in carrying forward or discussing this work, please get in touch.
ARPA, established in 1958 in response to the Soviet launch of the Sputnik satellite, is now known as the Defense Advanced Research Projects Agency (DARPA), and is primarily focused on technologies relevant to the US military. ARIA has a more dual-purpose civilian/military focus like the original ARPA.
Another technical initiative related to trust in AI: https://arxiv.org/abs/2504.15499. About using hypervisors to restrict potential malicious AIs.
I'm drawn to note 5 "Truth is often important, but it can be subjective, and it is not always a requirement for an AI system (e.g. one intended to produce creative writing or art). Where relevant, truth can be a defined requirement of an AI system, to a specified degree of certainty (e.g. in terms of bias and variance)." There is so much to unpack there!