Jeffrey Erickson | Senior Writer | November 21, 2024
General-purpose large language models, or LLMs, have become popular with the public because they can discuss a wide variety of topics and write term papers, thank you notes, and many other tasks. In business, however, these generic outputs won’t do. An LLM that’s expected to provide tech support for a particular gadget, for example, needs to draw on domain-specific knowledge.
There are currently two ways to help generative AI models deliver responses that reflect that sort of expertise: fine-tuning and retrieval-augmented generation, or RAG. Each comes with benefits and challenges. Let’s take a deeper look at these options to understand how they work and when to use them.
Key Takeaways
RAG, short for retrieval-augmented generation, is an architectural framework developed by researchers at Meta to help general-purpose AI models deliver outputs that are relevant and useful to organizations. RAG does this by giving a large language model, or LLM, access to an internal knowledge base that it can use to augment its original training data. The result is an AI system that combines the language fluency of an LLM with local data to deliver targeted, contextually appropriate responses. This approach, unlike AI model fine-tuning, works without modifying the underlying model itself.
Use RAG when it’s important for generative AI responses to provide up-to-date or organization-specific data that wasn’t part of the LLM’s training. For example, if a company has a large corpus of reliable information about its products or daily operations, a RAG architecture will provide that data to augment the prompts and responses that go through the LLM, making outputs more useful, verifiable, and precise. This can improve help desk automation, product availability checks in retail, or even healthcare as doctors’ notes can be quickly made available for patients or other clinicians.
Common benefits of RAG across industries include better and more complete data retrieval, enhanced customer support, and the ability to generate personalized content. By supplementing LLMs with current information, organizations can deploy AI agents to provide real-time and contextually relevant answers to user queries, minimizing the need for human intervention. The versatility of RAG allows it to adapt to a wide range of applications, including the following:
Fine-tuning a generative AI model means taking a general-purpose model, such as Claude 2 from Anthropic, Command from Cohere, or Llama 2 from Meta; giving it additional rounds of training on a smaller, domain-specific data set; and adjusting the model’s parameters based on this training. This tuning helps the model perform better on specific tasks because it’s been adapted to the nuances and terminology of a particular domain, such as coding or healthcare.
Choose fine-tuning when an LLM needs to be deft in a particular domain. With extra training, an LLM can better understand prompts and deliver outputs that reflect the nuances and terminology of a particular field. You’ll need access to a large data set or storehouse of documents curated for the training process, but fine-tuning is worth the effort because it allows for greater control over the style, tone, and manner of generated content. That can pay off in your marketing materials or customer interactions. Fine-tuning, like RAG, can also be helpful in medicine, coding, and other highly specialized domains.
Fine-tuning, the process of adapting a general AI model to a specific task or domain, is a powerful technique that can significantly improve results for a range of organizations, especially in cases where personalization and specialization are key. Here are some common use cases where it can be particularly effective:
Both fine-tuning and RAG make general-purpose LLMs more useful, but they do it in different ways. A simple analogy is that fine-tuning an LLM gives it a deeper understanding of a particular domain, such as medicine or education, while pairing the LLM with a RAG architecture gives it access to up-to-date, local data for its responses.
Why not use them together to get responses that are both nuanced and timely? It’s a growing trend and even comes with its own acronym: RAFT, for retrieval-augmented fine-tuning. With this hybrid approach, a model fine-tuned on specialized domain data is then deployed in a RAG architecture, where it uses its domain expertise to retrieve the most relevant information during response generation. The result is highly accurate, relevant, and context-aware outputs.
We’ll discuss RAFT further in a bit, but first let’s get a better understanding of the two approaches.
Both RAG and fine-tuning help an LLM move beyond generic responses drawn from its original, generalized training data sets. Fine-tuning involves putting an LLM through extra rounds of training using data sets that are specific to a particular domain or organization.
RAG also alters the responses of LLMs, but it doesn’t change the underlying model. Instead, a RAG system uses a local database or curated collection of documents to inform an LLM’s responses, often with up-to-the-minute details.
The limitations—and benefits—of these two approaches have, quite naturally, led to a growing trend to combine their strengths. The result is the hybrid approach called RAFT.
The choice between using a RAG architecture or a fine-tuning regime comes down to the resources you have and how you’ll use your LLM. As noted in the table below, most use cases will benefit from the effort to combine the two approaches—for most companies, once they’ve put in the effort to fine-tune, RAG is a natural addition. But here are six questions to ask to determine which to prioritize:
Use case requirements | RAG | Fine-tuning | RAFT |
---|---|---|---|
Responses must include local, up-to-date information. | yes |
no |
yes |
Responses must include a high level of explainability. | yes |
no |
yes |
Responses must reflect an organization’s deep domain knowledge. | yes |
yes |
yes |
The organization has access to a powerful neural network and GPU resources for AI training. | no |
yes |
yes |
Responses must reflect an organization’s tone and marketing language. | no |
yes |
yes |
The organization possesses a large, well-organized, up-to-date collection of documents for the AI to draw from and cite in its responses. | yes |
no |
yes |
The AI system has access to limited runtime resources. | no |
yes |
yes |
The organization possesses a large, curated data set and document store to train and fine-tune an AI. | yes |
no |
yes |
Whether you choose RAG or fine-tuning or both, Oracle specializes in helping organizations like yours make productivity gains with Oracle Cloud Infrastructure (OCI) Generative AI, a fully managed service that includes the power of OCI and a choice of open source or proprietary LLMs.
We make it easy to combine your LLM with RAG so you can get up-to-date responses grounded in your diverse knowledge bases. When it’s time to run your fine-tuning regime, Oracle AI infrastructure is a great choice. You’ll find superclusters that scale up to 65,536 GPUs—more than enough to run your most demanding training and inference workloads, such as LLM responses, computer vision, and predictive analytics.
General-purpose LLMs continue to improve, with a constant flow of new versions arriving from the likes of Anthropic, Cohere, Google, Meta, and many others. But no matter how deftly these AI models handle human language, they will always need a way to connect that skill set to the specific needs of business use cases. Fine-tuning and RAG are currently the two best methods for doing this. Look for them to continue evolving as AI models, hardware, and data architectures advance.
Your AI center of excellence should play a pivotal role in the rollout of RAG. Don’t have a CoE? Here’s how to get one up and running now.
Is RAG better than fine-tuning?
RAG and AI model fine-tuning are different, with their own benefits and costs. Both are popular methods of making generative AI models more useful, and each organization should choose the method that best fits its needs. Another popular option is to combine the two approaches, called RAFT, for retrieval-augmented fine-tuning.
What’s better than RAG?
RAG is simply a technique for helping an LLM deliver better responses by referencing a company’s data and documents. A method called GraphRAG has emerged as a way to further enhance LLM responses beyond what a RAG architecture can do on its own, but it adds architectural complexity and popular use cases have yet to emerge.
Fine-tuning an AI model is another method that can help an LLM offer more targeted or nuanced responses, and it can be combined with RAG to further improve the LLM’s performance.
Can RAG and fine-tuning be used together?
Yes. This hybrid approach offers a model fine-tuned on specialized domain data, then deployed in a RAG architecture so it can offer the latest or most relevant information in its responses.
What’s the difference between RAG and transfer learning?
RAG improves an LLM’s responses by accessing a local, up-to-date knowledge base. Transfer learning improves the responses of a general purpose AI model by accessing a separate AI model that has been fine-tuned to work in a particular domain.