Fundamentals of RAG (Retrieval-Augmented Generation)

shivani.moze · May 30, 2025, 6:57am

What is RAG?

RAG = Retrieval + Generation

Instead of just generating answers from a fixed model, RAG:

Retrieves relevant information (e.g., from documents, PDFs, or databases)
Augments the user prompt with the retrieved content
Generates a final response using a language model (LLM)
RAG = Retrieve(knowledge) + Generate(answer)

Main Idea:
Instead of relying only on the model’s memory (which is limited and can go outdated), we augment it by retrieving real-time, external information relevant to your query, then generate a response using that info.

Why is RAG Used?

1.LLMs hallucinate
2.LLMs can’t access private data
3.LLMs have context limits
4.Outdated info in the model

RAG offered solutions:

1.Adds factual grounding
2.Retrieves from your company’s knowledge base
3.Only fetches the top relevant chunks
4. Uses the latest docs as the source during retrieval

Step by step implementation of RAG:

Step 1. Import libraries:

import os
import faiss
import pdfplumber
from langchain_community.vectorstores import FAISS 
from langchain_text_splitters import RecursiveCharacterTextSplitter 
from langchain_sambanova import SambaNovaCloudEmbeddings
# from langchain_huggingface import HuggingFaceEmbeddings
from dotenv import load_dotenv , find_dotenv
import openai

Step 2. Document:

directory=os.getcwd()
pdf_files = [f for f in os.listdir(directory) if f.endswith(".pdf")]

Step 3.List the files available

def list_available_files():
    vectorstore_dir = os.path.join(os.getcwd(), "vectorstore_test")
    print(vectorstore_dir)
    if not os.path.exists(vectorstore_dir):
        os.makedirs(vectorstore_dir)  # Create directory if it doesn't exist
    # Get all stored vector stores (file names)
    available_files = [
        name for name in os.listdir(vectorstore_dir)
        if os.path.isdir(os.path.join(vectorstore_dir, name))
    ]
    return available_files

Step 4.List all the pdf files and extract the text from it:

def get_pdf_text(pdf_files):
    entire_text = ""
    try: 
        with pdfplumber.open(pdf_files) as pdf:
            for page in pdf.pages:
                entire_text += page.extract_text() + "\n" 
    except Exception as e:
        print(f"Error: {e}")        
    return entire_text

Step 5.After extracting the text or corpus split the data into chunks:

def get_chunks(entire_corpus):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=500, 
        chunk_overlap=50  
    )
    chunks = text_splitter.split_text(entire_corpus)
    return chunks

Step 6.Generate the embeddings from it:

def generate_embeddings(chunks,file_name):
    vector_store = FAISS.from_texts(
        texts=chunks , embedding=embeddings
    )
    vector_store.save_local(os.path.join(os.getcwd() ,"vectorstore" ,f"{file_name}"))

    print("vector stored created of: " , len(chunks) , "chunks")
    return vector_store

Step 7.Retrieve the data:

def retriever(user_query ,file_name):
    vector_store = FAISS.load_local(
                                folder_path = os.path.join(os.getcwd() ,"vectorstore" ,f"{file_name}") ,
                                embeddings=embeddings ,
                                allow_dangerous_deserialization=True,
                           )
    results = vector_store.similarity_search(user_query, k=5, fetch_k=25)
    return results

Step 8.Generate the response through SAMBANOVA_API:

def generate_sn_response(user_query,retrieved_results):
    sys_prompt = '''You will be provided with a question for which user is seeking answer. You need to provide the answer to the user based on the information you have given.'''
    prompt = f'''FEW THINGS TO KEEP IN MIND:
        1. You will be provided with the user query and relevant documents which you can refer to provide the answer.
        2. You have to strictly use the information received from the relevant documents to provide the answer.
        3. You can provide the answer in your own words but it should be based on the information received from the relevant documents.
        Fromat : 
        Question : {user_query} 
        Relevant Documents : {retrieved_results}
        Answer : 
        '''
    response = client.chat.completions.create(
        model='Meta-Llama-3.3-70B-Instruct',
        messages=[{"role":"system","content":sys_prompt},{"role":"user","content":prompt}],
        temperature =  0.1,
    )
    return response.choices[0].message.content

Main code:

# generate_embeddings(chunks,"vector_file")
file_name="RAG"
embeddings = SambaNovaCloudEmbeddings(model="E5-Mistral-7B-Instruct")
# Get the current directory
directory = os.getcwd()
# List all PDF files in the current directory
#pdf_files = [f for f in os.listdir(directory) if f.endswith(".txt")]
txt_file = '/Users/Desktop/RAG/xyz.txt'
#entire_text=get_pdf_text(pdf_files[0])
with open(txt_file,'r') as f:
    entire_text = f.read()
chunks=get_chunks(entire_text)
available_files=list_available_files()
if file_name in available_files:
    print("Vector store already exists.")
else:
    print("generating vector store")
    generate_embeddings(chunks, file_name)
client = openai.OpenAI(
    api_key=os.environ.get("SAMBANOVA_API_KEY"),
    base_url="https://api.sambanova.ai/v1",
)
while True:
    user_query=input("Enter query:" )
    relevant_document=retriever(user_query,file_name)
    final_ans= generate_sn_response(user_query,relevant_document)
    print(final_ans)

The above steps will help you understand how rag works behind the scenes, step by step.

Thank you, keep learning……..!!!