As the wave of interest in Large Language Models (LLMs) surges, many developers and organisations are busy building applications harnessing their power. However, when the pre-trained LLMs out of the box don’t perform as expected or hoped, the question on how to improve the performance of the LLM application. And eventually we get to the point of where we ask ourselves: Should we use Retrieval-Augmented Generation (RAG) or model finetuning to improve the results?
Before diving deeper, let’s demystify these two methods:
RAG: This approach integrates the power of retrieval (or searching) into LLM text generation. It combines a retriever system, which fetches relevant document snippets from a large corpus, and an LLM, which produces answers using the information from those snippets. In essence, RAG helps the model to “look up” external information to improve its responses.
Finetuning: This is the process of taking a pre-trained LLM and further training it on a smaller, specific dataset to adapt it for a particular task or to improve its performance. By finetuning, we are adjusting the model’s weights based on our data, making it more tailored to our application’s unique needs.
Both RAG and finetuning serve as powerful tools in enhancing the performance of LLM-based applications, but they address different aspects of the optimisation process, and this is crucial when it comes to choosing one over the other.
Previously, I would often suggest to organisations that they experiment with RAG before diving into finetuning. This was based on my perception that both approaches achieved similar results but varied in terms of complexity, cost, and quality. I even used to illustrate this point with…