RAG to riches: how to implement better GenAI in business

From a niche area of academic research to a huge business opportunity, generative AI has developed rapidly over the past twelve months. Accenture research has found that 98 percent of business leaders think that AI foundation models will be essential to their operations over the next three to five years, while generative AI will affect 40% of all work hours. The result from this could be huge – Goldman Sachs Research estimated that Generative AI could add more than $7 trillion to the global economy.

This potential is huge. However, it currently remains just that – potential. From the initial flush of interest around using ChatGPT, companies are now thinking through how they can use generative AI in practice and integrate this with their own systems and data. Without this integration, the potential inherent in generative AI will remain just that – potential. Few organisations have really unlocked the power of GenAI yet. Getting it from experimentation to production remains a challenge.

Part of the reason for this is that generative AI services, powered by large language models (LLMs) and trained on vast troves of Internet data, are still not yet specific enough for businesses.

On their own, LLMs are only as good as the training data used to create them and do not learn from their experiences. For example, the dataset used to train ChatGPT cut off in September 2021, so it does not have any newer historical data to include in responses. This means you may not be able to answer questions from your customers or employees, and this can get in the way of offering services that generate revenue.

Alongside this, generative AI does have a problem around hallucinations. When it responds to requests, generative AI systems rely on mathematical models to produce results. Where it doesn’t have an exact match, it can produce similar results that – while appearing to be accurate – are not real.

These results, dubbed ‘hallucinations,’ can affect users – in one legal case in the US, a lawyer entered evidence in an affidavit that contained six fabricated cases and references. This led to the case being thrown out of court, and the law firm involved investigated.

Retrieval augmented generation

This problem of hallucinations could affect how projects succeed over time. For example, a retailer that offers products that either don’t exist or are not in stock will quickly find customers shopping elsewhere, no matter how good the customer interaction is. To solve this problem, we must provide more context to the LLM by bringing in more of our own data.

LLMs are stateless – once they are trained, they can’t have any new information added to the set of data that was used for training. Retraining a LLM is expensive work too – OpenAI estimated the cost for training GPT-4 was $100million – so this is not an option for regular updates. Instead, we must think smarter about how LLMs use data.

Retrieval augmented generation, or RAG, uses additional sets of data to improve responses. Rather than relying on the initial training data to create responses, RAG uses additional data sets that can be added to the LLM. This allows companies to increase the accuracy and meaning within the responses provided to customers or users without LLM retraining.

In practice, RAG allows generative AI systems to search within historical data sets for results that are similar semantic matches to a user request. It does this through first turning sets of data into mathematical values, or embeddings, which represent their semantic meanings. This data can be structured (eg text) or unstructured (eg images or audio). These embeddings are then stored in a vector database.

When a user prompt comes in, this is transformed into a set of vector data and then compared against the company’s own vector data sets. This vector search looks for patterns that are like the original request, and then provides those results back to the LLM. These responses can then be combined into a response to the prompt by the LLM, and shared back to the user.

Effectively, RAG brings back semantically similar results and provides more context to the LLM which it uses to create a response and better results for users.

Because this is your data, you are providing more specific context, and you can also add that data to your own vector database over time. This helps to avoid the LLM hallucinating and adding wrong information to the response, which should therefore keep customers happy and improve results for users.

Context is king

With so much potential at stake, building generative AI into your applications is at the forefront of enterprise application design decisions. To use this technology effectively, you will have to consider the role of your own data and how to leverage this effectively as part of your GenAI strategy. With RAG, you can provide more context and improve your generative AI responses.

This should help you avoid AI hallucinations and deliver real results. By basing your approach on your own insights and information, you can meet the high expectations of what customers expect out of Generative AI, turning potential into reality. With so much at stake, you need to leverage the power of all the data in your organisation to build best in class experiences for all your customers.

RAG to riches: how to implement better GenAI in business

Retrieval augmented generation

Context is king

ServiceNow issue turns SaaS patching into a data-exposure test

Don’t just recover from ransomware; recover well, and you could cut your ransomware bill too

Anthropic’s Mythos rollout turns patch speed into a board test

AI coding agents turn software secrets into a governance problem

Vega CTO Eli Rozen on matching the speed of modern attacks

FSB calls for tighter controls on agentic AI in finance