by Lucas Mearian

Senior Reporter

How RAG makes generative AI tools even better

feature

Feb 20, 20244 mins

Augmented RealityGenerative AINatural Language Processing

Retrieval augmented generation, or 'RAG' for short, creates a more customized and accurate generative AI model that can greatly reduce anomalies such as hallucinations.

Credit: Shutterstock/a-image

As more organizations turn to generative artificial intelligence (genAI) tools to transform massive amounts of unstructured data and other assets into usable information, being able to find the most relevant content during the AI generation process is critical.

Retrieval augmented generation or “RAG” for short, is a technology that can do just that by creating a more customized genAI model that enables more accurate and specific responses to queries.

Large language models (LLMs), also called deep-learning models, are the basis of genAI technology; they’re pre-trained on vast amounts of unlabeled or unstructured data that, by the time a model is available for use, can be outdated and not specific to a task.

LLMs can consist of a neural network with billions or even a trillion or more parameters. RAG optimizes the output of an LLM by referencing (accessing) an external knowledge base outside of the information on which it was trained. In other words, RAG enables genAI to find and use relevant external information, often from an organization’s proprietary data sources or other content to which it’s directed.

It not only amplifies an LLM’s knowledge base “but also significantly improves the accuracy and contextuality of its outputs,” Microsoft explained in a blog.

RAG is essentially a design pattern that uses search functionality to retrieve pertinent data and add it to the prompt of a genAI model to better ground the generative output with factual and new information.

“RAG can be used for both retrieving public internet data as well as for retrieving data from private knowledge bases,” according to Gartner Research.

Patrick Lewis, a natural language processing research scientist with start-up Cohere, originally coined the term RAG in a paper published in 2020. Lewis pointed out that LLMs cannot easily expand or revise their memory, and they can’t straightforwardly provide insight into their predictions, leading to “hallucinations.”

Just last week, Slack unveiled AI-based tools for businesses and cited RAG as one way the company hopes to reduce halluciations in genAI results.

In addition to Cohere, more than a half dozen vendors provide native or stand-alone solutions for developers to build RAG-based applications for an LLM. They include Vectara, OpenAI, Microsoft Azure Search, Google Vertex AI, LangChain, LlamaIndex and Databricks.

“More and more the solutions around RAG — and enabling people to use that more effectively — are going to focus on tying into the right data that has business value as opposed to just the raw productivity improvements,” said Rick Villars, IDC group vice president of worldwide research.

With RAG, organizations can maximize the chances of producing accurate results based on factual inputs, said Avivah Litan, distinguished vice president analyst at Gartner. It also minimizes the chances of hallucinations, since outputs are grounded with retrieved data.

RAG also allows workers to find, summarize, and utilize the information they’re looking for faster by using the power of third-party LLMs applied to an organization’s own data. It also helps protect the organization from liability incurred when copyrighted or other IP protected materials get incorporated into LLM responses.

“This possibility is greatly reduced, because the prompt responses can be grounded in enterprise data,” Litan said.

One way to get better access to business information using RAG is with a vector database and graph technologies that can tap into proprietary data and allow an organization to truly dig into the business value, Villars said.

A vector database stores, indexes, and manage massive quantities of high-dimensional vector data efficiently; as a result, companies are spending money to develop them or add vector search capabilities to their existing SQL or NoSQL databases and genAI use cases and applications.

By 2026, more than 30% of enterprises are expected to adopt vector databases to ground their foundation models with relevant business data, according to Gartner Research. Gartner lists vector databases as “critical enabler” enterprise technology for 2024.

Popular uses for vector databases include product recommendations, similarity search, fraud detection and generative-AI-powered, question-and-answer applications, according to Gartner.

Vector databases can and often do serve as the backbone of RAG systems. The databases store and manage data typically derived from text, images, or sounds, which are converted into mathematical vectors.

^“The other part of that is back to app modernization, ” Villars said. “One of the biggest legacy install bases companies have today are old client-server apps and even early mobile and cloud apps built on Java. We have to modernize those to make them part of this AI story.”

by Lucas Mearian

Senior Reporter

Follow Lucas Mearian on X

Senior Reporter Lucas Mearian covers AI in the enterprise, Future of Work issues, healthcare IT and FinTech.

Show me more

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

How RAG makes generative AI tools even better

Retrieval augmented generation, or 'RAG' for short, creates a more customized and accurate generative AI model that can greatly reduce anomalies such as hallucinations.

More from this author

AI in the workplace is forcing younger tech workers to rethink their career paths

How many jobs are available in technology in the US?

After shooting, UnitedHealthcare comes under scrutiny for AI use in treatment approval

What are AI agents and why are they now so pervasive?

Will AI help doctors decide whether you live or die?

Q&A: Can chiplets save the US semiconductor industry?

Just what the heck does an ‘AI PC’ do?

Unemployment is near historic lows – why’s it so hard to get an IT job?

Show me more

Microsoft's Patch Tuesday updates: Keeping up with the latest fixes

For December’s Patch Tuesday, 74 updates and a zero-day fix for Windows

The Macy’s accounting disaster: CIOs, this could happen to you.

Podcast: Why tech leaders are looking at political power

Podcast: AI disrupts business leaderships, revives others

Podcast: What is the outlook for tech jobs in 2025?

Why Big Tech leaders are seeking political power

AI shakes up leaders, revives others

2025 Tech Job Market: Rainbows or gloom?

How RAG makes generative AI tools even better

Retrieval augmented generation, or 'RAG' for short, creates a more customized and accurate generative AI model that can greatly reduce anomalies such as hallucinations.

From our editors straight to your inbox

More from this author

AI in the workplace is forcing younger tech workers to rethink their career paths

How many jobs are available in technology in the US?

After shooting, UnitedHealthcare comes under scrutiny for AI use in treatment approval

What are AI agents and why are they now so pervasive?

Will AI help doctors decide whether you live or die?

Q&A: Can chiplets save the US semiconductor industry?

Just what the heck does an ‘AI PC’ do?

Unemployment is near historic lows – why’s it so hard to get an IT job?

Show me more

Microsoft's Patch Tuesday updates: Keeping up with the latest fixes

For December’s Patch Tuesday, 74 updates and a zero-day fix for Windows

The Macy’s accounting disaster: CIOs, this could happen to you.

Podcast: Why tech leaders are looking at political power

Podcast: AI disrupts business leaderships, revives others

Podcast: What is the outlook for tech jobs in 2025?

Why Big Tech leaders are seeking political power

AI shakes up leaders, revives others

2025 Tech Job Market: Rainbows or gloom?