Autonomous agent accelerator: 6 ways to secure AI-powered workflows
How to have AI agents run your business without them going rogue.
Agentic AI is going mainstream.
Rabbit’s large action model (LAM), which operates from a wrist-mounted mobile device and appears to leverage OAuth to log in to existing accounts using voice commands, is just one example.
Obviously we’ve seen these types of things happen already. Hooking up non-deterministic processes to UIs or APIs has been possible for a while now.
But we are going to have to start re-thinking cybersecurity as we once knew it.
That’s because AI agents are going to do the “wrong” thing a nonzero percentage of the time.
How agentic AI will impact cybersecurity
The good news is, though, we have a way to describe security for this new paradigm: the Artificial Intelligence Risk Scoring System (AIRSS)! It is a probabilistic model for describing the behavior of AI systems as they relate to data confidentiality, integrity, and availability.
Check out that series for details on the approach, but below I’ll lay out how some scenarios involving AI agents and look at how security might be impacted.
Confidentiality
You instruct an OpenAI agent to write a social media post about your recently announced acquisition of a new company.
It searches your computer for data about the acquisition and locates a document listing individual people planned to be laid off.
The agent summarizes some of the key firings along with other data and posts the result to X.
Integrity
Emails sent by your company are having deliverability issues related to Domain-based Message Authentication, Reporting, and Conformance (DMARC). Your policy is currently
p=quarantine
, meaning messages failing validation are delivered to the recipient’s spam folder.You tell an agent to fix the problem. It decides to change your policy to
p=none
because the initial error suggested it was too strict.Even more customer domains begin rejecting emails you send because your policy is now too lax.
Availability
You accidentally lean on the push-to-talk button of your Rabbit while speaking with your engineering colleague about “tearing down some VMs.”
Rabbit obliges and deletes a bunch of virtual machines hosting business critical processes.
Agents are especially vulnerable to certain types of attacks
All of the above situations don’t even address malicious interactions with Agentic AI. These could be substantially more damaging than. Some challenges that will get even worse are:
Indirect prompt injection
An autonomous agent that hits an AI canary and ingests malicious instructions might not show any immediate outward signs of malfunctioning. But this may have just set off a ticking cyber time bomb whereby the agent later starts following the attacker’s commands.
Data poisoning
If AI agents are able to train continuously on information they ingest (essentially the current ChatGPT functionality on steroids), this could lead to rapid model collapse. Even a relatively small number of malicious data points could knock the agent off course into a death spiral.
Sponge attacks
If an AI agent is itself consuming cloud resources or has the ability to trigger their use, “denial of wallet” attacks could be economically devastating for the company operating it. By artificially increasing the amount of data an agent needs to process, a savvy attacker could potentially leave the operator with a massive bill, impact the availability of its systems, or both. This would be especially pronounced in the case of autonomous systems because of the fewer opportunities for human intervention.
Compensating controls
If, after weighing the risk and reward of deploying autonomous AI agents, you think it’s still worth it, consider the following additional controls:
1. Action allowlisting
You might just want to make certain things entirely “off limits” to AI agents, such as releasing certain types of data or deleting resources.
This is difficult to implement, however, because it’s not clear how an application would tell that it’s dealing with an AI agent or a human. A possible solution is:
2. Require CAPTCHA completion or multi-factor authentication before especially sensitive actions
Normal safeguards like “type the name of the resource to delete it” won’t work because an AI agent will quickly figure out and complete the action if it thinks it’s supposed to.
Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHAs) aren’t perfect because some research shows AI can complete them faster than humans. And even if not, an agent might convince another human (besides the user) to do so for them.
Multi-factor authentication (MFA) for certain actions might be a solution here. But like CAPTCHAs, continuously responding to prompts will get annoying quickly and cause some people to ask to disable the additional check…
…or just approve every request and fall victim to prompt bombing from an overly energetic agent.
3. Require proof of personhood
The Worldcoin cryptocurrency project requires biometric enrollment to validate that a certain user is in fact human. They call this “proof of personhood.” I have some concerns from a privacy perspective with this approach and don’t think it’s foolproof. With that said, perhaps a token from a trusted source demonstrating the user is human could help here.
Applications would need to be designed to forbid the passage of this token in any automated manner. But I am sure entrepreneurial and/or lazy people will find a way around this.
4. Rate limiting
While perhaps you might not want to outright block AI agents from performing specific actions, you probably do want to apply limits to how many:
requests they can make
rows they can modify
GB they can send
This can limit the damage from an errant agent directly - by limiting API or cloud service credits expended - and indirectly - by reducing the magnitude of impacts to data confidentiality, integrity, and availability.1
Unfortunately you have the same authentication problem here, whereby a sufficiently motivated agent can prevent the system from identifying it as such.
5. Use AI to make you confirm certain types of actions manually
Using AI to supervise other AI isn’t perfect, but there may be a use case here.
That’s because you won’t need to worry about prompt injection in this specific situation. If a malicious user has access to the interface, they’ll just confirm whatever action they wanted to have happen anyway.
So perhaps an entirely different AI model might be able to detect - and disable - an AI agent that has “gone rogue.”
6. Detailed audit logs
This won’t help you prevent problems ahead of time but it will allow you to troubleshoot after the fact. Especially if an AI agent is consistently misbehaving, this may help you get to the root of the problem quickly.
An interesting concept (proposed by none other than ChatGPT itself when I was brainstorming this post) would be to require the agent to log a justification for every action. By understanding the underlying reasoning, you would be able to get to the root of problems more quickly.
Thinking of deploying AI agents?
StackAware works with AI-powered companies to manage risks related to:
Cybersecurity
Compliance
Privacy
So if you need help setting up an AI governance, risk, and compliance program: