Chatbot checklist: 5 ways to avoid AI-powered fails

Protect your chatbot from becoming a PR nightmare.

Mar 30, 2024

The fintech company Klarna recently claimed to have addressed 2/3 of all customer service requests using its new chatbot in its first month. Which it says is equivalent to 700 full-time agents.

That represents enormous savings when it comes to repetitive tasks like answering basic questions and providing status updates.

So it’s no surprise companies are rolling AI chatbots out at full speed.

As with any new technology, though, there are bound to be hiccups. So I’ll go into three examples and then look at some ways to avoid common pitfalls.

Case Study #1: Amazon

Data attribute impacted: integrity

Peter Gostev did some excellent post-release testing of Amazon’s chatbots which gave some interesting responses, including a potential $500 discount on a MacBook Pro:

The lawyers who weighed in on this post didn’t come to a clear consensus, and I doubt that Peter would actually be able to get this deal sealed. With that said, there is at a minimum some reputation damage from things like this (also thanks to Peter):

Case study #2: Chevrolet of Watsonville

Data attributes impacted: integrity and availability

In late 2023, Chevrolet of Watsonville launched a GPT-powered chatbot without sufficient controls. The results were predictably embarrassing:

X users quickly tore apart this hapless chatbot by:

Allowing users free access to ChatGPT by not restricting response subjects.
Racking up API bills through un-throttled prompts.
Forcing it to recommend competitive car brands.
Tricking it into making ludicrous “deals.”
Committing other mischief.

Refer a friend

While it doesn’t seem like the dealership ended up selling a new car for one dollar, the next example shows it is conceivable it might have had to.

Case Study #3: Air Canada

Data attribute impacted: integrity

The basic facts of the case are:

In November 2022, a man inquired about bereavement fares from Air Canada’s AI chatbot after his grandmother’s death.
The chatbot stated Air Canada offered discount fares with a post-flight refund, contrary to the airline’s actual policy requiring pre-approval.
After flying from Vancouver to Toronto and back for $1200, the man requested the discount. Air Canada refused, telling him the chatbot was wrong.
He sued.
Air Canada claimed in the trial that the chatbot was a “separate legal entity,” absolving them of responsibility for its statements.
In early 2024, a judge disagreed. He ruled that Air Canada committed “negligent misrepresentation” and must honor the chatbot’s discount promise.

What this means from a risk perspective

While the last example is just from a single jurisdiction, test cases are bound to start popping up all over the place. And in many of them, companies are going to be found liable for what their AI applications say.

In addition to hallucinations like this as well as jailbreaking attempts from pranksters, there is also the risk of true prompt injection attacks from cyber criminals.

That means additional opportunities for disaster like:

Loss of regulated data like personally identifiable information.
Exposure of trade secrets or confidential business plans.
Reputation damage.

How to protect your AI chatbots

1. Establish clear business requirements

This sounds obvious, but it happens less frequently than you might imagine. Some key considerations include:

Clearly understanding the purpose of your chatbot. Is it making jokes? Or recommending medical providers? These have hugely different potentials for damage if the bot misspeaks.
Determining what your threshold is for stopping the chatbot interaction and:
- returning an error
- escalating to a live human
- moving to asynchronous communication (e.g. email)
Ensuring responses are grounded using retrieval-augmented generation (RAG). If people wanted a general purpose chatbot they wouldn’t be using yours. So having an accurate database against which you can conduct RAG will help here.

Here is a deeper dive on the business requirement topic.

2. Implement guardrails

OpenAI has an excellent “cookbook” describing how to do this with their models.
Tools like Microsoft Azure OpenAI Service can return a specific error when content filtering fails. Use this to trigger escalation steps.
Understand that different models (even from the same vendor) will have different safety layers, so consider experimenting with different versions to develop the optimal combination.
Consider rules-based filtering to prevent mention of competitors.
Experiment with system prompts to get the best results, but know these are not foolproof!

3. Use a neutral security policy

This article explains this concept in-depth, but basically a neutral security policy means that anyone authorized to interact with the chatbot must also authorized to see the underlying data exposed to a RAG process.

This is a neutral security policy:

Don’t rely on a system message to control access to sensitive data like a customer’s contact or account information.

That creates a tainted trust boundary:

Use a deterministic authorization layer instead.

4. Rate limit

“Denial of wallet” attacks can quickly drain your resources (and funds) if left unchecked. This can happen by attackers:

using long messages with a GPT-powered chatbot, draining API credits.
causing undesired auto-scaling of Infrastructure-as-a-Service (IaaS).
overloading the non-AI message processing logic.

Some things to consider doing:

Implement rate limiting controls at both the application (first line of defense) and infrastructure level (second line of defense) by regulating auto-scaling, if relevant.
On the application side of things, control both the number of tokens and the number of messages allowed. Attackers will probe your defenses methodically, but legitimate customers probably won’t want to engage in an endless discussion.
Adjust rate limits based on usage patterns and threat intelligence.
In early days of release, monitor closely and be prepared to trigger your escalation procedures more quickly to prevent downtime and customer frustration.

5. Red-team

The best way to identify vulnerabilities is by having your own hacker do it for you. Some recommendations:

Use a clear-box method, where the tester has “inside information” including an understanding of business requirements. In our experience from chatbot testing, this is the fastest way to get results and to quickly rule out what might seem like vulnerabilities but actually aren’t.
Ensure your testers are familiar with full range of jailbreaking and safety layer bypass methods. StackAware’s AI reference archive has a comprehensive list of these.
Integrate with traditional penetration testing to ensure comprehensive coverage. Attackers are certain to chain model, application, and infrastructure-level vulnerabilities together to achieve their goals. So get ahead of them.

If you want to see our AI penetration testing guide, subscribe to Deploy Securely and reply to the onboarding email.

About to launch a chatbot?

It’s an exciting time. But deploying a new AI-powered assistant can turn into a major headache if you aren’t careful.

StackAware can help.

We can penetration test your application as part of a broader risk assessment. And set you up for AI-powered success.

Book a call

Deploy Securely