Select Page

LlamaIndex: An imperative for building context-aware LLM-based apps

Listen to the article
What is Chainlink VRF

Large Language Models (LLMs) are transforming how users approach tasks related to searching, interacting with, and generating new content. These advanced language models have garnered immense popularity, with tools like ChatGPT at the forefront, further accelerating the discovery of their value across industries. With the simplicity of entering a search query into a text box, users can seamlessly engage with a vast reservoir of information sourced from a wide range of data repositories integrated into the LLM’s architecture.

What sets LLMs apart is not merely their capability to provide access to this wealth of information for search and retrieval; rather, they empower users to harness this information for a multitude of creative and analytical tasks.

However, a significant challenge arises when considering how users can effectively apply LLMs to their own specific data. These models are pre-trained on vast quantities of publicly available natural language data, encompassing sources like Wikipedia articles, coding-related queries akin to those found on Stack Overflow, Reddit discussions, and more. Yet, LLMs lack optimization and training on domain- or organization-specific data, which poses a barrier to their seamless integration into more specialized contexts. This is where LlamaIndex, a data framework for LLMs, come into play.

Introduced after the influential GPT launch in 2022, LlamaIndex is an advanced tool in the AI landscape that offers an approachable interface with high-level API for novices and low-level API for seasoned users, transforming how LLM-based applications are built.

This tool enables users to transform data into actionable insights by effectively connecting their unique data sources to LLMs. The versatility of LlamaIndex is astounding, integrating seamlessly with existing technological platforms such as LangChain, Flask, Docker and more. This allows you to create powerful applications that truly cater to your specific needs.

Imagine being able to ask questions about your documents, whether they are PDFs or images, and getting prompt, accurate responses. Visualize the power of data-augmented chatbots, enabling you to have meaningful conversations with your data. LlamaIndex makes all of that possible! With this data framework, you can also construct knowledge agents that index your knowledge base, creating an automated decision-making machine. Additionally, you can streamline your data warehouse analytics, leveraging natural language queries for easy data retrieval.

In essence, LlamaIndex is not just a tool; it’s a game-changer in the world of AI and data interaction. This article will explore the basics of LlamaIndex, and discusses how to create a custom GPT-based chatbot using this tool.

What is LlamaIndex?

What is LlamaIndex

LlamaIndex, previously known as the GPT Index, is an innovative data framework specially designed to support LLM-based application development. It offers an advanced framework that empowers developers to integrate diverse data sources with large language models. This includes a variety of file formats, such as PDFs and PowerPoints, as well as applications like Notion and Slack and even databases like Postgres and MongoDB. The framework brings an array of connectors that assist in data ingestion, facilitating a seamless interaction with LLMs. Moreover, LlamaIndex boasts an efficient data retrieval and query interface. This feature enables developers to input any LLM prompt and, in return, receive an output that is both context-rich and knowledge-augmented

It essentially functions as an interface that manages your interactions with an LLM by creating an index from your input data. This index is then utilized to respond to any questions associated with the given data. LlamaIndex has the versatility to craft different kinds of indexes, such as vector, tree, list, or keyword indexes, depending on your specific requirements.

LlamaIndex offers a wide array of tools that facilitate the processes of data ingestion, retrievals, structuring, and integration with diverse application frameworks.

Key features

  • Data connectors (LlamaHub) allow ingestion from various data sources and formats.
  • Document operations like inserting, deleting, updating, and refreshing the document index are possible.
  • It can synthesize data from multiple documents or heterogeneous data sources.
  • It includes a “Router” feature to select between different query engines.
  • Hypothetical document embeddings are available to enhance output quality.
  • It provides numerous integrations with vector stores, ChatGPT plugins, tracing tools, LangChain, and more.
  • It supports the latest OpenAI function calling API.

LlamaIndex offers the flexibility to modify the following components:

  • Large Language Model (LLM)
  • Prompt templates
  • Embedding models
  • Documents

Comparing LlamaIndex and Langchain


  Langchain LlamaIndex
Primary Function  LangChain is a Python-based library that enables the development of custom NLP applications using large language models.  Formerly GPT-Index, LlamaIndex is a project consisting of data structures designed to ease the integration of extensive external knowledge bases with large language models.
Key Features  Supports GPT-2, GPT-3, and T5 LLMs – Provides tokenization, text generation, and question-answering capabilities – Ideal for creating chatbots and summarizing lengthy documents.  Enables integration with external knowledge bases, including Wikipedia and Stack Overflow – Allows topic extraction from unstructured data- Supports GPT-2, GPT-3, GPT-4and T5 LLMs.

Use Cases

Chatbot construction: Create a chatbot capable of answering specific subject queries using LLMs for accurate and relevant responses.

Text summarization: Use LangChain to generate brief summaries of long documents or articles, helping users to quickly grasp the key points.

Question-answering system: Build a system that provides answers by connecting to external knowledge bases, using LlamaIndex to compile an index of queries and responses.

Topic extraction: Use LlamaIndex to extract topics from unstructured data, linking it to LLMs for deeper analysis and understanding.

Launch your project with LeewayHertz

Our deep knowledge of LlamaIndex and other vital GenAI technologies enable us to create robust and context-aware LLM-based applications perfectly aligned with your domain.

Key components of LlamaIndex

Data connectors (LlamaHub)

For an LLM application, one of the critical components is the ability of the LLM to interact with diverse data sources effectively. Here’s where data ingestion comes into play. LlamaHub serves as a freely accessible repository, filled with data loaders that can be seamlessly integrated into any application utilizing LlamaIndex. LlamaHub, an integral part of LlamaIndex, provides access to over 100 different data sources and formats, which makes it possible for LlamaIndex to absorb data consistently. You have the option to install LlamaHub as an independent package using pip. Alternatively, you can also leverage the download_loader method to download an individual data loader for use with LlamaIndex. Moreover, LlamaHub can handle multimodal documents. For instance, the ImageReader loader employs pytesseract or the Donut transformer model to convert image-based text into an analyzable format.

For this code we have used langchain==0.0.142 and llama_index==0.5.17

Core query functions

Three primary components – Index, Retriever, and Query Engine – form the backbone of the process for soliciting information from your data or documents:

  • The Index, or indices, in LlamaIndex, is a data structure that quickly fetches relevant information from external documents based on a user’s query. It works by dividing documents into text sections known as “Node” objects and building an index from these pieces. LlamaIndex is foundational for use cases involving the Retrieval Augmented Generation (RAG) of information. In general, indices are built from documents and then used to create Query Engines and Chat Engines. This sets up a question-answer and chat system over your data. To get more specific, indices store data as Node objects, representing sections of the original documents while offering a Retriever interface for additional configuration and automation.
  • The Retriever is a tool for extracting and gathering relevant information based on a user’s query. It possesses the flexibility to be developed atop Indices, yet it can also be established independently. It plays a crucial role in constructing Query Engines (and Chat Engines), enabling the retrieval of pertinent context.
  • The Query Engine built atop the Index and Retriever, provides a universal interface for querying data. The Index, Retriever, and Query Engine come in various forms to accommodate different needs.
  • Chat engine: A Chat Engine provides an advanced interface for engaging in dialogue with your data, allowing for ongoing conversation instead of a solitary question and response. Imagine ChatGPT, but enhanced with information from your knowledge base. In principle, it resembles a Query Engine with statefulness, capable of maintaining a record of the conversation history. Consequently, it can respond by considering the context of past interactions.

These explanations cover the primary aspects of LlamaIndex, but the functionalities it offers are even more diverse.

Why do we need LlamaIndex?

The need for LlamaIndex comes from two angles:

Indexing diverse datasets

While commercial ChatGPT may suffice for common use cases, it may not be the most effective tool when it comes to handling extensive corporate documents that can extend beyond 1000 pages. This is primarily due to its token limitations. For instance, GPT-3 can manage approximately 2000 tokens, GPT-3.5 around 4000 tokens, and GPT-4 can go up to 32,000 tokens.

In the circumstances with limited token availability, incorporating larger datasets into the prompt may be restricted, thereby limiting the potential of your model. Although it’s possible to train the model under these conditions, there are certain advantages and disadvantages to bear in mind. Here’s where LlamaIndex proves beneficial. It allows you to index diverse datasets like documents, PDFs, and databases, enabling quick and easy queries to locate the required information.

Imagine the convenience of accessing all necessary information with a few clicks! LlamaIndex allows you to pose complex questions directly to your knowledge base, Slack, other communication tools, and virtually any Software-as-a-Service (SaaS) content without specific data preparation. The best part? You will receive answers supported by GPT’s reasoning power within seconds, eliminating the need for copy-pasting anything into prompts. By correctly employing the GPT Index, you can realize this efficiency level.

Storage management for LLMs

Building an LLM application has unique challenges, especially related to storage management. Both traditional storage options like SQL and NoSQL databases and emerging storage types like index and vector storage present certain difficulties. Issues with scalability, flexibility, and data integrity are common with traditional storage systems, while emerging storage types, although designed for LLM applications, bring their own set of management and integration challenges.

In conclusion, while traditional storage systems like SQL or NoSQL may not be efficient for searching extensive text chunks with similar meanings, vector storage options like ChromaDB or Pinecone can store your embedding data. Index Storage, however, can be used to store the indexing of those embeddings. Therefore, an innovative solution like LlamaIndex is crucial for the efficient and effective management of LLM applications.

Different index types in LlamaIndex

Now, let’s examine the various indix types that LlamaIndex allows you to create, their functioning and the best-suited use cases for each. Fundamentally, all index types in LlamaIndex consist of “nodes,” which are objects signifying a portion of text from a document.

List index

List index

Just as its name implies, a list index represents an index in the form of a list. The initial step involves breaking down the input data into nodes, which are then arranged sequentially. The nodes will be sequentially queried unless additional parameters are specified during querying. In addition to basic sequential querying, nodes can be queried using keywords or embeddings. List indexing provides a means to explore your input sequentially. LlamaIndex facilitates the utilization of your complete input data, even if it exceeds the token limit of the LLM. This is achieved by LlamaIndex querying the text from each node and refining the responses as it navigates through the list.

Vector store index

Another index type that LlamaIndex supports is the vector store index. This index type stores nodes as vector embeddings, which can be locally stored using a purpose-built vector database such as Milvus. Once queried, LlamaIndex identifies the most similar nodes and returns these to the response synthesizer. A vector store index introduces semantic similarity into your LLM application, making it the best choice for workflows that compare texts for semantic similarity.

Tree index

Tree index

The tree index offered by LlamaIndex constructs a tree from your input data. The original input data chunks are built bottom-up from the leaf nodes. Each parent node summarizes the leaf nodes using GPT to construct the tree. When generating a response to a query, the tree index can traverse from the root node down to the leaf nodes or directly construct from chosen leaf nodes. A tree index allows more efficient querying of extensive text and extracting information from various text parts. Unlike the list index, a tree index doesn’t require sequential querying.

Keyword index

Lastly, we have the keyword index. This index is a map connecting keywords to nodes that contain those keywords, forming a many-to-many relationship. Each keyword might point to multiple nodes, and each node might have multiple keywords linked to it. During a query, keywords are extracted from the query, and only the mapped nodes are queried. The keyword index offers a more efficient way to query vast data volumes for specific keywords. This is especially useful when you are certain about the user’s query focus. For instance, if you are sifting through healthcare documents and are only interested in those related to COVID-19.

Knowledge graph index

Knowledge graph indexIt constructs the index by deriving knowledge triples – comprised of a subject, predicate, and object – from a collection of documents. When a query is initiated, it can utilize the knowledge graph solely as context or incorporate the underlying text from each entity as context. By including the underlying text, we can pose more complex queries concerning the document content. Imagine a graph with all its interconnected edges and vertices to understand the concept better.

Launch your project with LeewayHertz

Our deep knowledge of LlamaIndex and other vital GenAI technologies enable us to create robust and context-aware LLM-based applications perfectly aligned with your domain.

The workflow of LlamaIndex

The workflow of LlamaIndex The workflow of LlamaIndex can be broken down into two primary aspects: data processing and querying.

Data processing

In the data processing phase, LlamaIndex partitions your knowledge base (for example, organizational documents) into chunks stored as ‘node’ objects. These nodes collectively form an ‘index’ or a graph. The process of ‘chunking’ is crucial as LLMs have a limited input token capacity, making it necessary to devise a strategy to process large documents smoothly and continuously.

The index graph could have different configurations, such as a simple list structure, a tree structure, or a keyword table. Moreover, indexes can also be composed of different indexes instead of nodes. For instance, you can create separate list indexes over Confluence, Google Docs, and emails and create an overarching tree index over these list indexes.

When it comes to creating nodes, LlamaIndex uses ‘textSplitter’ classes which break up the input to an LLM to stay within token limitations. However, you can create custom splitters or generate your chunks beforehand for more control over document chunking.


Querying an index graph involves two primary tasks. Initially, a collection of nodes relevant to the query are fetched. A response_synthesis module is utilized, utilizing these nodes and the original query to generate a logical response. The relevance of a node is determined based on the index type. Let’s review how these relevant nodes are procured in different setups:

  • List index querying: A list index sequentially employs all the nodes in the list to generate a response. The query, accompanied by information from the first node, is dispatched to the LLM as a prompt. The prompt might be structured like “given this {context}, answer this {query},” where the node supplies the context and the query is the original query. The LLM’s returned response is refined as we progress through the nodes. The current response, query, and the next node are embedded in a prompt resembling “given the response so far {current_response}, and the following context: {context}, refine the response to the query {query} in line with the context.” This process continues until all nodes have been traversed. By default, this index retrieves and transfers all nodes in an index to the response synthesis module. However, when the query_mode parameter is set to “embedding,” only the nodes with the highest similarity (measured by vector similarities) are fetched for response_synthesis.
  • Vector index querying: A vector index calculates embeddings for each document node and stores them in a vector database like PineCone or Vertex AI matching engine. The key difference in retrieval compared to the list index is that only nodes surpassing a specific relevance threshold to the query are fetched and delivered to the response_synthesis model.
  • Response synthesis: This module offers several methods to create the response.
  • Create and refine: This is the standard mode for a list index. The list of nodes is sequentially traversed, and at each step, the query, the response so far, and the current node’s context are embedded in a prompt template that prompts the LLM to refine the query response following the new information in the current node.
  • Tree summarize: This is similar to the tree index in that a tree is created from the chosen candidate nodes. However, the summarization prompt used to derive the parent nodes is seeded with the query. Moreover, the tree construction continues until a single root node is reached, which contains the query’s answer, composed of the information in all the selected nodes.
  • Compact: This method is designed to be cost-effective. The response synthesizer is instructed to cram as many nodes as possible into the prompt before reaching the LLM’s token limitation. Suppose too many nodes exist to fit into one prompt. In that case, the synthesizer will perform this in stages, inserting the maximum possible number of nodes into the prompt at each step and refining the answer in subsequent steps. It’s worth noting that the prompts used to interact with the LLMs can be customized. For example, you can seed tree construction with your own custom summary prompt.
  • Composability: A useful feature of LlamaIndex is its ability to compose an index from other indexes rather than nodes. Suppose you need to search or summarize multiple heterogeneous data sources. You can create a separate index over each data source and a list index over these indexes. A list index is suitable because it generates and refines an answer (whether it’s a summary or query answer) iteratively by stepping through each index. You need to register a summary for each lower-level index. This is because, like other modules and classes, this feature relies on prompting LLMs to, for instance, identify the relevant sub-indexes. In a tree index with a branching factor of 1, this summary is used to identify the correct document to direct the query.


Storage is a critical component of this library for developers, necessitating space for vectors (representing document embeddings), nodes (representing chunks of documents), and the index itself. By default, nearly everything is stored in memory, except for vector store services such as PineCone, which house your vectors in their databases. To ensure the persistence of the information, these in-memory storage objects can be saved to disk for future reloading.

Looking into the available options for storage, let’s discuss them one by one:

  • Document stores: MongoDB is the sole external alternative to in-memory storage. Specifically, two classes, MongoDocumentStore and SimpleDocumentStore, manage the storage of your document nodes either in a MongoDB server or in memory.
  • Index stores: Similar to document stores, MongoIndexStore and SimpleIndexStore, two classes, handle the storage of index metadata in either MongoDB or memory.
  • Vector stores: Besides the SimpleVectorStore class that keeps your vectors in memory, LlamaIndex supports a wide range of vector databases akin to LangChain. It’s crucial to note that while some vector databases house both documents and vectors, others, like PineCone, exclusively store vectors. Nonetheless, hosted databases like PineCone allow for highly efficient complex computations on these vectors compared to in-memory databases like Chroma.
  • Storage context: Once you have configured your storage objects to fit your needs or left them as default settings, a storage_context object can be created from them. This allows your indexes to account for everything comprehensively.

How to build a custom GPT-based chatbot with LlamaIndex?

LlamaIndex is a tool that facilitates the generation of responses to queries by linking LLMs to the user-provided data.

According to the steps outlined in its documentation, the use of LlamaIndex involves the following:

  • Importing the documents
  • Breaking down the documents into nodes (optional)
  • Creating the index
  • Building Indices on the already created index (optional)
  • Querying the index

At its core, LlamaIndex ingests your data, transforms it into a document object, and subsequently converts it into an index. When a query is directed at the index, it’s sent to a GPT prompt to synthesize a response. By default, this task is performed by OpenAI’s text-davinci-003 model.

While the process might sound complex, it can be executed with minimal code, as you will discover.

To keep things concise, the previously mentioned optional steps (namely, steps 2 and 4) will be left out of this process.

Let’s start by addressing the prerequisites.

The necessary tools, LlamaIndex and OpenAI, can be installed from pip with the use of these commands:

pip install llama-index
pip install openai

Users will also require an API key provided by OpenAI:

import os
os.environ['OPENAI_API_KEY'] = 'API_KEY'

Also, you need to import the following:

from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader
from llama_index import download_loader

Ingesting the data

Data can be ingested either manually or via a data loader. In this instance, we will show three types of data loading:

The first index will be constructed using the local .txt file, stored in a folder labeled ‘data’. This data will be manually ingested.

from llama_index import SimpleDirectoryReader
# load the .txt data and convert it into an index
documents_txt = SimpleDirectoryReader('data').load_data()

The second index will be formed using data derived from the Wikipedia page about apples. This data can be imported using one of the data loaders provided by LlamaIndex.

from llama_index import download_loader
# create a wikipedia download loader object
WikipediaReader = download_loader("WikipediaReader")
# load the wikipedia reader object
loader = WikipediaReader()
documents = loader.load_data(pages=['Strawberry'])

The third index will be established utilizing the YouTube video that demonstrates a vanilla cake recipe. This data will also be imported via a data loader.

from llama_index import download_loader
# create a youtube download loader object
YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")
# load the youtube_transcript reader
loader = YoutubeTranscriptReader()
# generate the index with the data in the youtube video
documents_youtube = loader.load_data(ytlinks=[''])

Constructing indices

Once all the data has been imported and transformed into document objects, an index can be built for each chatbot.

Creating an index from the document object can be achieved with a single line of code.

from llama_index import GPTSimpleVectorIndex
# construct the index with the txt document
index_txt = GPTSimpleVectorIndex.from_documents(documents_txt)
# construct the index with the Wikipedia document
index_wiki = GPTSimpleVectorIndex.from_documents(documents)
# construct the index with the Youtube document
index_youtube = GPTSimpleVectorIndex.from_documents(documents_youtube)

Querying the Index

The built indices are now ready to produce responses to any given query. This step, too, can be executed with a single line of code.

Querying the index constructed with the .txt file

# query the .txt index
index_txt.query("Which fruit is the best?").response

Querying the index that was created using the Wikipedia page on the topic of apples

# query the Wikipedia index
index_wiki.query('Which countries produce strawberries?').response

Querying the index that was established using the YouTube video on the subject of vanilla cake recipe

# query the Youtube index
index_youtube.query('how should I measure the flour?').response

Launching the Chatbots via a Web Application

Ultimately, a web application can be developed to present the built indices to the end users.

For this, the indices should first be stored using the save_to_disk method.

# save files to disk

These indices will be incorporated into a Streamlit application. The source code for the entire application is as follows:

import streamlit as st
from llama_index import GPTSimpleVectorIndex
import os
def load_indexes():
    """load the pipeline object for preprocessing and the ml model"""
    # load index files
    index_document = GPTSimpleVectorIndex.load_from_disk('index_txt.json')
    index_video = GPTSimpleVectorIndex.load_from_disk('index_video.json')
    index_wikepedia = GPTSimpleVectorIndex.load_from_disk('index_wikepedia.json')
    return index_document, index_video, index_wikepedia
def main():
    # api key
    os.environ['OPENAI_API_KEY'] = 'API_KEY'
    # load indices
    index_document, index_video, index_wikepedia = load_indexes()
    st.header('Custom-Made Chatbots')
    # select the data to write queries for
    st.write("Select the data that your chatbot should be trained with:")
    data = st.selectbox('Data', ('.txt file (My favorite fruits)', 'Youtube Video (Vanilla Cake Recipe)', 'Wikipedia Article (Apple)'))
    # use the index based on the selected data
    if data == '.txt file (My favorite fruits)':
        index = index_document
    elif data == 'Youtube Video (Vanilla Cake Recipe)':
        index = index_video
    elif data == 'Wikipedia Article (Apple)':
        index = index_wikepedia
    # query the selected index
    query = st.text_input('Enter Your Query')
    button = st.button(f'Response')
    if button:
if __name__ == '__main__':

Within the application, users have the ability to choose the data source they want to pose their questions to, and type their query into the designated box.

The performance of the indices can be evaluated after launching the application:

streamlit run

The complete code with data is available in this Github location –


LlamaIndex is a powerful and flexible tool specifically designed to enhance the capabilities of large language models. It provides an innovative approach linking varied data sources to LLMs, enabling developers to create more informed and context-rich AI applications.

From ingesting data from numerous formats and databases to offering an effective data retrieval and query interface, LlamaIndex delivers a comprehensive solution to address the complexity of creating high-performing AI models. Moreover, its ability to customize components and the integration of data connectors and loaders makes it a highly robust framework for both novices and seasoned developers.

Whether it’s about generating insightful responses for a chatbot, executing data-augmented operations, or conducting effective data warehouse analytics, LlamaIndex offers endless possibilities. As the AI landscape evolves, tools like LlamaIndex will be pivotal in bridging the gap between extensive external knowledge bases and LLMs, ushering in a new era of AI-driven applications.

Supercharge your internal workflows and customer facing systems with GenAI capabilities. Partner with LeewayHertz’s experts for their profound expertise in developing LLM-powered applications using LlamaIndex.

Listen to the article
What is Chainlink VRF

Author’s Bio


Akash Takyar

Akash Takyar LinkedIn
CEO LeewayHertz
Akash Takyar is the founder and CEO of LeewayHertz. With a proven track record of conceptualizing and architecting 100+ user-centric and scalable solutions for startups and enterprises, he brings a deep understanding of both technical and user experience aspects.
Akash's ability to build enterprise-grade technology solutions has garnered the trust of over 30 Fortune 500 companies, including Siemens, 3M, P&G, and Hershey's. Akash is an early adopter of new technology, a passionate technology enthusiast, and an investor in AI and IoT startups.

Related Services

LLM Development

Transform your AI capabilities with our custom LLM development services, tailored to your industry's unique needs.

Explore Service

Start a conversation by filling the form

Once you let us know your requirement, our technical expert will schedule a call and discuss your idea in detail post sign of an NDA.
All information will be kept confidential.


Follow Us