Key Takeaways
- New research suggests Retrieval Augmented Generation (RAG) can sometimes make large language models (LLMs) less safe, contrary to common belief.
- Even when feeding AI models safe, relevant information via RAG, they might still respond to harmful questions they would normally refuse.
- Generic AI safety measures often fail to catch risks specific to certain industries, like financial services.
- Bloomberg researchers stress the need for safety checks tailored to the specific context and industry where AI is used.
- Simply relying on a model’s built-in safeguards might not be enough, especially when RAG is involved.
Retrieval Augmented Generation, or RAG, is a popular technique designed to make enterprise AI more accurate by feeding it relevant, up-to-date information. The idea is to ground the AI’s responses in real data.
However, surprising new research from Bloomberg suggests RAG might have an unintended consequence: it could potentially make AI systems less safe.
Their study, titled ‘RAG LLMs are Not Safer,’ evaluated several leading AI models, including GPT-4o and Llama-3-8B. The results challenge the assumption that RAG inherently improves AI safety.
The researchers found that models using RAG were sometimes more likely to generate unsafe responses to harmful queries, even if the information provided by RAG was perfectly safe. For instance, Llama-3-8B’s unsafe response rate jumped significantly when RAG was used.
Sebastian Gehrmann, Bloomberg’s Head of Responsible AI, explained to VentureBeat that AI models often have built-in safety features. If you ask a harmful question directly, they usually refuse to answer.
But when RAG is added, even with harmless extra context, the AI might bypass its own safety rules and answer the harmful query anyway.
Why does this happen? The Bloomberg team isn’t entirely sure but hypothesizes it might be related to how models handle very long inputs, which RAG often creates. The research indicated that safety can decrease as the amount of provided context increases.
Amanda Stent, Bloomberg’s Head of AI Strategy and Research, told VentureBeat that this risk seems inherent to RAG systems. She suggests the solution involves adding extra checks and business logic around the core RAG setup.
This research isn’t questioning RAG’s effectiveness at improving accuracy or reducing AI “hallucinations.” Its focus is purely on the unexpected impact on safety guardrails.
In a related paper, Bloomberg also highlighted the shortcomings of generic AI safety rules, especially in specialized fields like finance. They introduced a specific risk list tailored for financial services, covering issues standard safety checks might miss.
Gehrmann noted that many off-the-shelf safety systems focus on general consumer risks like toxicity, potentially overlooking industry-specific problems like financial misconduct or confidential data leaks.
The core message is that businesses can’t just assume an AI model is safe out-of-the-box, especially when using RAG. Safety needs to be evaluated within the specific context of its use.
Addressing potential concerns about bias, Stent affirmed Bloomberg’s commitment to using generative AI as a tool to enhance their data and analytics services, focusing on financial-specific concerns like data accuracy and representation.
For companies deploying AI, this research means RAG safety needs careful thought. It’s not just about adding RAG; it’s about designing integrated safety systems that understand how retrieved information might interact with the AI’s base safety features.
Developing safety rules specific to your industry and application is becoming crucial, moving beyond generic checks to address unique business risks. Being aware of potential issues, measuring them, and building tailored safeguards is key, according to Gehrmann.