Inheriting the Receipts: Securing the AI Your Company Already Adopted

Table of contents
AI Security is Just Security at a New Speed
In December 2020, Timnit Gebru was fired from Google while co-leading the company's Ethical AI team after she declined to retract a paper she had co-authored, "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" Gebru went on to found the Distributed Artificial Intelligence Research (DAIR) institute and continue the work elsewhere.
The paper itself made five predictions:
- These systems would hallucinate fluently without understanding.
- They would amplify the biases baked into their training data.
- Training them would carry an enormous environmental cost.
- Their datasets would grow too large to audit.
- Control over them would concentrate in a handful of well-capitalized firms.
As Darren O'Connor points out in his recent Tumblr post on Gebru, every one of these predictions has materialized in the years since.
O'Connor articulated one of her structural arguments as such:
"The technology was being built by a small group of researchers who shared similar backgrounds, worked at similar companies, and were rewarded for shipping products faster than competitors. The incentive structure made it impossible for safety, ethics, and bias concerns to slow anything down."
These predictions also materialized somewhere the paper never looked: inside your environment. Your employees adopted what those teams shipped faster than your policies, your reviews, and your controls could handle. The receipts are already in the building. The work now is figuring out what to do with them.
Here is the part that should be reassuring but isn't yet: AI does not require a new security paradigm. The pillars that have carried every prior wave of technology adoption still apply, and they still do most of the work:
- Identity and Access Management (IAM) still decides who and what can reach a system.
- Data Loss Prevention (DLP) still decides what can leave it.
- The Secure Software Development Lifecycle (SDLC) still defines pipeline controls.
- Monitoring and Alerting still provides insight into anomalous and potentially malicious activity.
- Incident Response (IR) still defines how you react.
None of that has changed. The controls are the controls you have already been buying, staffing, and arguing about for decades.
The one genuinely new variable is velocity. Employees enabled assistants and copilots before procurement noticed they had been licensed. Engineers wired Model Context Protocol (MCP) servers into production systems before anyone had written a tool-use policy, an authorization model, or a rollback plan. Agents now act on behalf of humans, at machine speed, across systems whose controls were designed around the assumption that humans would act on their own behalf. The pillars are intact; however, the cadence has been compressed to the point where the old rhythms, such as quarterly reviews, annual policy refreshes, human pull requests and peer-reviews, no longer keep pace with the attack surface they were meant to cover.
That compression is where the bill comes due. The safety, ethics, and bias debt the model builders externalized has not gone away. It has been distributed, quietly, across the tens thousands of organizations that have adopted the output, which has resulted in massive additional technical debt.
Readers may notice the absence of Vulnerability Management in our list of pillars, which will likely be the sole subject of a follow-up blog post.
The above pillars map cleanly into the National Institute of Standards and Technology (NIST) AI Risk Management Framework (AI RMF — Govern, Map, Measure, Manage) and the Open Worldwide Application Security Project (OWASP) Large Language Model (LLM) Top 10. But the work is the work you already know how to do.
Phase 1: Governance and Visibility
The first step most organizations have skipped is the most basic one: you cannot govern what you cannot see, and most organizations cannot see their AI surface. The gap between what leadership believes is in use and what is actually running with corporate data is typically much wider than expected.
A useful AI inventory has three dimensions, and all three must be addressed before policy or controls have anything to bind to.
The first dimension is the AI itself. Sanctioned Software as a Service (SaaS) copilots are the easy case because they should show up in the procurement record. However, browser-extension copilots rarely cross IT's desk. Integrated Development Environment (IDE) plugins land on engineering laptops through package managers and personal accounts. Vendor models are increasingly embedded inside the SaaS platforms you already pay for, enabled by default, and governed by a contract amendment nobody likely read. In-house agentic systems and MCP servers tend to be built by the teams trying to solve business problems, typically without involving security teams. Each belongs in the inventory, and each tends to be discovered, not declared and documented.
The second dimension is the identities calling the AI. Named human users are the baseline. Service accounts wired into automation are the next layer. Non-Human Identities (NHIs), agents acting under delegated authority, often with broader scopes than the humans behind them, are the layer most environments have the most challenges enumerating, and they are the layer where most authorization decisions are now being made.
The third dimension is the data crossing the boundary: what is leaving, to which providers, under which contracts, and with what retention and training-use terms. The boundary is no longer a network egress point; it is a prompt-time decision made inside applications the IT and Security teams may not have even procured, deployed, or otherwise authorized.
An honest inventory across those three dimensions will not be flattering, and it is the precondition for everything downstream. Once the surface is visible, the question becomes how to govern it with the controls you already own.
Phase 2: Protect, Detect, and Respond
From a controls perspective, each of the five pillars extends to cover AI as just another consumer of identity, data, and code. None of them need to be reinvented. They just need to be rescoped.
IAM: Least privilege is not a human-only doctrine, and it never was. Agents and the tools they invoke are principals, and they should be treated as such. For example, issue scoped, short-lived tokens for tool calls rather than long-lived credentials wired into a config file. Provision agent identities deliberately, rotate their secrets on a defined cadence, and retire them when the use-case ends. The lifecycle that governs your human identities and their associated role-based access controls applies to the NHIs acting on their behalf, and any gap between the two becomes the path of least resistance for an attacker.
DLP: Classify and inspect at the prompt-and-egress boundary, not after sensitive data has already left the building inside a chat completion. The decision point is the moment a prompt is assembled and the moment a response returns, and that is where policy has to bind. Prevent sensitive data from reaching providers whose contracts do not cover it, including the ones embedded in platforms you already license. In short, redact at the boundary. Discovering after the fact that corporate intellectual property or PII was pasted into a model last quarter is a late, reactive finding, not a proactive security posture.
I was recently introduced to an application development team that had reached the point where more than 90% of their production code was written and submitted for review by AI. Much of the submitted reviews were also performed by AI; only occasionally were the humans the reviewers, let alone authors. The team's governance and controls had not caught up: the security gates intended to catch human-authored issues were built on the assumption that a human stood between the keyboard and the commit. That assumption no longer held.
SDLC: AI-assisted code is still code, and the controls that can catch human-authored issues have to also be executed on AI-generated code without exception. Static Application Security Testing (SAST) and Dynamic Application Security Testing (DAST), Software Composition Analysis (SCA), secret scanning, dependency review, and human code review apply to anything an AI-assistant produces, regardless of how confident the suggestion looks in the IDE. No AI-generated code should route around those controls, not through agent-opened pull requests, not through automated merges or fast-track exceptions. Always treat prompts, system instructions, and model versions as build artifacts under change control. A prompt that shapes production behavior is a configuration, and a configuration that is not versioned is not governed.
Monitoring and Alerting: Log all prompts, responses, and tool invocations wherever policy and contract permit, and route them into the same pipeline that already carries the rest of your system telemetry. Alert on jailbreak attempts, policy-violating outputs, anomalous tool-use patterns, and detected exfiltration traffic to AI providers' APIs. These are the same signals your analysts should already recognize, such as unusual volumes, off-hours activity, access to data the principal has no business touching. The only thing new is the class of principal.
IR: Your runbooks should already cover compromised accounts and unauthorized data egress. Extend them to compromised agents, jailbroken models running inside your environment, and prompt-injection-driven exfiltration. Each of these runbooks introduces an accountability question the older playbooks did not have to answer: when the actor is an agent acting on behalf of a human, did it operate inside its intended scope, and can you prove it? IR still owns response and remediation, but the evidence chain that supports those decisions needs to be in place before the incident starts. Write the playbooks assuming the Constrain Phase controls below are providing that chain. Decide what 'in scope' means in advance, in writing, and rehearse it before you need it.
Governance defines the rules of the road. The next question is how those rules are enforced at runtime, when an agent is in the middle of a tool call and there is no time to convene a review.
Phase 3: Constrain
Governance binds policy to identities, data, and code, but enforceability remains an ongoing challenge for many organizations. Enforcing it at agent speed requires controls that either carried significant overhead or did not need to exist before agents could act autonomously; controls that sit underneath the five pillars and run where the action happens. Privileged actions used to have a human in the loop somewhere before they completed. Agentic systems compress that window to milliseconds and decide on tool use, data movement, and downstream calls, potentially without pausing for a reviewer.
The following controls should be built into the pillar set to effectively manage AI's velocity.
Per-Tool-Call Attribution: Every action an agent takes carries forward the identity of the human, or the upstream agent, that delegated it. The chain of delegation should be recorded on each individual tool call, not just at session start, so a downstream system can answer the first question that matters during an AI-related incident: on whose authority was this action taken? No anonymous agent activity, no shared service-account fog. If the agent cannot prove who asked, the call does not run.
Broker-Mediated Egress: AI traffic should flow through a policy gateway that sits at the prompt boundary and enforces classification, contract scope, and tool-use policy before a request leaves the environment. The gateway is the place where "this data class cannot reach that destination" becomes an enforced decision rather than a policy memo. Inspection happens before the prompt is assembled and before the response is returned, not after data classified as sensitive has already been handed to a third-party.
Just-in-Time, Scoped Credentials for Agents: Agents should not hold standing privilege. They should request narrow, time-bound capabilities for the specific task at hand, and those capabilities expire when the task does. Standing tokens wired into a config file are the agentic equivalent of a domain admin account allowed to log into user endpoints leaving credentials in LSASS memory, and the blast radius scales with the agent's reach.
None of these constraints are exotic. They are runtime equivalents of controls your IAM and DLP SMEs or teams already understand, applied at the speed AI now operates.
Close
The teams that built the models were incentivized to quickly get them to market, which they did. Safety, ethics, and bias reviews were purposely never on their critical release paths. The debt those decisions deferred did not stay with the builders. It was redistributed quietly, unevenly, and likely without consent to the tens of thousands of organizations that adopted the models. Those models have been adopted as quickly as they were released, and that genie can't realistically be put back inside the bottle.
Pausing adoption is not on the table. The productivity gains in many cases are massive, many workflows are already rewired, and walking any of it back would likely cost more than most organizations can absorb. But there really isn't a need to pause adoption, which is why this blog was written.
Organizations have the ability to put the necessary guardrails around AI that the people who built it were never incentivized to originally build in. Analyze your attack surface. Govern it with the pillars you already operate. Constrain it at runtime, where the agents actually act. This should not require a new doctrine, a new team, or a new budget line. The work is not new. The speed is.