LlamaIndex: the ultimate LLM framework for indexing and retrieval | by Sophia Yang, Ph.D. | Jun, 2023


An Introduction to LlamaIndex

LlamaIndex, previously known as the GPT Index, is a remarkable data framework aimed at helping you build applications with LLMs by providing essential tools that facilitate data ingestion, structuring, retrieval, and integration with various application frameworks. The capabilities offered by LlamaIndex are numerous and highly valuable:

✅ Ingest from different data sources and data formats using Data connectors (Llama Hub).
✅ Enable document operations such as inserting, deleting, updating, and refreshing the document index.
✅ Support synthesis over heterogeneous data and multiple documents.
✅ Use “Router” to pick between different query engines.
✅ Allow for the hypothetical document embeddings to enhance output quality
✅ Offer a wide range of integrations with various vector stores, ChatGPT plugins, tracing tools, and LangChain, among others.
✅ Support the brand new OpenAI function calling API.

These are just a few examples of the extensive capabilities provided by LlamaIndex. In this blog post, we will explore some of the functionalities that I find exceptionally useful with LlamaIndex.

When developing an LLM application, it’s essential to enable LLM to interact with external data sources effectively. How to ingest data is the key here. The Llama Hub offers a wide range of over 100 data sources and formats, allowing LlamaIndex or LangChain to ingest data in a consistent manner.

LlamaHub. Source: https://llama-hub-ui.vercel.app/.

By default, you can pip install llama-hub and use it as a standalone package. You may also choose to use our download_loader method to individually download a data loader for use with LlamaIndex.

Here is an example where we load in a Wikipedia data loader from the llama-hub package. The consistent syntax is very nice.

from llama_hub.wikipedia.base import WikipediaReader

loader = WikipediaReader()
documents = loader.load_data(pages=['Berlin', 'Rome', 'Tokyo', 'Canberra', 'Santiago'])

Check the output:

Llama Hub also supports multimodal documents. For example, the ImageReader loader uses pytesseract or the Donut transformer model to extract text from an image.

Index, retriever, and query engine

Index, retriever, and query engine are three basic components for asking questions over your data or documents:

  • Index is a data structure that allows us to retrieve relevant information quickly for a user query from external documents. Index works by parsing documents into text chunks, which are called “Node” objects, and then building index from the chunks.
  • Retriever is used for fetching and retrieving relevant information given user query.
  • Query engine is built on top of index and retriever providing a generic interface to ask questions about your data.

Here is the simplest way to ask questions about your document. You create an index from the document first, and then use a query engine as the interface for your question:

from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()
response = query_engine.query("Who is Paul Graham.")

There are various types of index, retriever methods, and query engines that you can future read on the LlamaIndex docs. In the remainder of this article, I’d like to cover some of the cool features I find useful next.

Often times, once we create an index for our document, there might be a need to periodically update the document. This process can be costly if we were to recreate the embeddings for the entire document again. LlamaIndex index structure offers a solution by enabling efficient insertion, deletion, update, and refresh operations. For example, a new document can be inserted as additional nodes (text chunks) without the need to recreate nodes from previous documents:

# Source: https://gpt-index.readthedocs.io/en/latest/how_to/index/document_management.html
from llama_index import ListIndex, Document

index = ListIndex([])
text_chunks = ['text_chunk_1', 'text_chunk_2', 'text_chunk_3']

doc_chunks = []
for i, text in enumerate(text_chunks):
doc = Document(text, doc_id=f"doc_id_{i}")
doc_chunks.append(doc)

# insert
for doc_chunk in doc_chunks:
index.insert(doc_chunk)

With LlamaIndex, it’s easy to query multiple documents. This functionality is enabled through the `SubQuestionQueryEngine` class. When given a query, the query engine generates a “query plan” consisting of sub-queries against sub-documents, which are then synthesized to provide the final answer.

# Source: https://gpt-index.readthedocs.io/en/latest/examples/usecases/10q_sub_question.html

# Load data
march_2022 = SimpleDirectoryReader(input_files=["../data/10q/uber_10q_march_2022.pdf"]).load_data()
june_2022 = SimpleDirectoryReader(input_files=["../data/10q/uber_10q_june_2022.pdf"]).load_data()
sept_2022 = SimpleDirectoryReader(input_files=["../data/10q/uber_10q_sept_2022.pdf"]).load_data()
# Build indices
march_index = VectorStoreIndex.from_documents(march_2022)
june_index = VectorStoreIndex.from_documents(june_2022)
sept_index = VectorStoreIndex.from_documents(sept_2022)
# Build query engines
march_engine = march_index.as_query_engine(similarity_top_k=3)
june_engine = june_index.as_query_engine(similarity_top_k=3)
sept_engine = sept_index.as_query_engine(similarity_top_k=3)
query_engine_tools = [
QueryEngineTool(
query_engine=sept_engine,
metadata=ToolMetadata(name='sept_22', description='Provides information about Uber quarterly financials ending September 2022')
),
QueryEngineTool(
query_engine=june_engine,
metadata=ToolMetadata(name='june_22', description='Provides information about Uber quarterly financials ending June 2022')
),
QueryEngineTool(
query_engine=march_engine,
metadata=ToolMetadata(name='march_22', description='Provides information about Uber quarterly financials ending March 2022')
),
]
# Run queries
s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)
Run queries
response = s_engine.query('Analyze Uber revenue growth over the latest two quarter filings')

As you can see below, LlamaIndex decomposed a complex query into 2 subqueries and was able to compare the information from multiple documents to get the final answer.

Imagine you are building a bot to retrieve information from both Notion and Slack, how does the language model know which tool to use to search for information? LlamaIndex is like a clever helper that can find things for you, even if they are in different places. Specifically, LlamaIndex’s “Router” is a super simple abstraction that allows “picking” between different query engines.

In this example, we have two document indexes from Notion and Slack, and we create two query engines for each of them. After that, we put all the tools together and create a super tool called RouterQueryEngine, which picks which tool to use based on the description we gave to the individual tools. This way, when we ask a question about Notion, the router will automatically look for information from the Notion documents.

# Source: https://gpt-index.readthedocs.io/en/latest/use_cases/queries.html#routing-over-heterogeneous-data
from llama_index import TreeIndex, VectorStoreIndex
from llama_index.tools import QueryEngineTool
# define sub-indices
index1 = VectorStoreIndex.from_documents(notion_docs)
index2 = VectorStoreIndex.from_documents(slack_docs)
# define query engines and tools
tool1 = QueryEngineTool.from_defaults(
query_engine=index1.as_query_engine(),
description="Use this query engine to do…",
)
tool2 = QueryEngineTool.from_defaults(
query_engine=index2.as_query_engine(),
description="Use this query engine for something else…",
)
from llama_index.query_engine import RouterQueryEngine
query_engine = RouterQueryEngine.from_defaults(
query_engine_tools=[tool1, tool2]
)
response = query_engine.query(
"In Notion, give me a summary of the product roadmap."
)

There are many exciting use cases for this. Here is a complete example that uses the router to pick between SQL and a vector db: https://gpt-index.readthedocs.io/en/latest/examples/query_engine/SQLRouterQueryEngine.html.

Typically, when we ask a question about an external document, what we normally do is that we use text embeddings to create vector representations for both the question and the document. Then we use semantic search to find the text chunks that are the most relevant to the question. However, the answer to the question may differ significantly from the question itself. What if we could generate hypothetical answers to our question first and then find the text chunks that are most relevant to the hypothetical answer? That’s where hypothetical document embeddings (HyDE) come into play and can potentially improve output quality.

# Source: https://gpt-index.readthedocs.io/en/latest/examples/query_transformations/HyDEQueryTransformDemo.html

# load documents
documents = SimpleDirectoryReader('llama_index/examples/paul_graham_essay/data').load_data()
index = VectorStoreIndex.from_documents(documents)
query_str = "what did paul graham do after going to RISD"

#Now, we use HyDEQueryTransform to generate a hypothetical document and use it for embedding lookup.
hyde = HyDEQueryTransform(include_original=True)
hyde_query_engine = TransformQueryEngine(query_engine, hyde)
response = hyde_query_engine.query(query_str)
display(Markdown(f"<b>{response}</b>"))

#In this example, HyDE improves output quality significantly, by hallucinating accurately what Paul Graham did after RISD (see below), and thus improving the embedding quality, and final output.
query_bundle = hyde(query_str)
hyde_doc = query_bundle.embedding_strs[0]

OpenAI recently released the function calling capabilities to more reliably connect GPT’s capabilities with external tools and APIs. Check out my previous video to see exactly how it works.

LlamaIndex has quickly integrated this functionality and added a brand new OpenAIAgent. Check out this notebook to learn more.

What if there are way too many numbers of functions? Use the RetrieverOpenAIAgent! Check out this notebook.

LlmaIndex offers a wide range of integrations with various vector stores, ChatGPT plugins, tracing tools, and LangChain.

Source: https://imgflip.com/memegenerator.

How is LIamaIndex different from LangChain?

If you have used LangChain, you may wonder how is LlamaIndex different from LangChain. If you are not familiar with LangChain, check out my previous blog post and video. You will find striking similarities between LIamaIndex and LangChain in their functionalities including indexing, semantic search, retrieval, and vector databases. They both excel in tasks like question answering, document summarization, and building chatbots.

However, each of them has its unique areas of focus. LangChain, with its extensive list of features, casts a wider net, concentrating on the use of chains and agents to connect with external APIs. On the other hand, LlamaIndex has a narrower focus shining in the area of data indexing and document retrieval.

How to use LlamaIndex with LangChain?

Interestingly, LIamaIndex and LangChain aren’t mutually exclusive. In fact, you can use both in your LLM applications. You can use both LlamaIndex’s data loader and query engine and LangChain’s agents. I know a lot of people actually use both of these tools in their projects.

Here is an example where we used LlamaIndex to keep the chat history when using a LangChain agent. When we ask “what’s my name?” in the second round of conversation, the language model knows that “I am Bob” from the first round of conversation:

# source: https://github.com/jerryjliu/llama_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb
# Using LlamaIndex as a memory module
from langchain import OpenAI
from langchain.llms import OpenAIChat
from langchain.agents import initialize_agent
from llama_index import ListIndex
from llama_index.langchain_helpers.memory_wrapper import GPTIndexChatMemory
index = ListIndex([])
memory = GPTIndexChatMemory(
index=index,
memory_key="chat_history",
query_kwargs={"response_mode": "compact"},
# return_source returns source nodes instead of querying index
return_source=True,
# return_messages returns context in message format
return_messages=True
)
llm = OpenAIChat(temperature=0)
# llm=OpenAI(temperature=0)
agent_executor = initialize_agent([], llm, agent="conversational-react-description", memory=memory)

In summary, LlamaIndex is an incredibly powerful tool for enhancing the capabilities of Large Language Models with your own data. Its array of data connectors, advanced query interfaces, and flexible integration make it a vital component in the development of applications with LLMs.

Thank you Jerry Liu for the advice and feedback!

Photo by Danielle Barnes on Unsplash

. . .

By Sophia Yang on June 19, 2023

Sophia Yang is a Senior Data Scientist. Connect with me on LinkedIn, Twitter, and YouTube and join the DS/ML Book Club ❤️





Source link

Leave a Comment