Generative AI Introduction 2024 — RAG Deep Dive

Generative AI Introduction 2024 — RAG Deep Dive¶

Previously, we discussed why we need to fine-tune models. Today, let’s delve into Retrieval-Augmented Generation (RAG).

📌 What is RAG?¶

Retrieval-Augmented Generation (RAG) primarily aims to address issues like outdated knowledge and hallucinations in purely generative models. RAG combines a language model with a knowledge base, allowing the model to “look up information” before generating an answer, rather than creating it from scratch.

✅ RAG vs. Fine-tuning: Decision-Making Questionnaire (10 Questions)¶

Imagine a project you’d like to work on, then read the following questions and tally your score.

Will your knowledge data be updated frequently? A. Yes (1 point) B. No (0 points)
Do you require a source for the answers (traceable basis)? A. Yes, I want users to see the original document for the answer (1 point) B. No, displaying the source is not necessary (0 points)
Is your budget limited, preventing long-term training or cloud fine-tuning? A. Yes (1 point) B. No, I have the budget and resources for fine-tuning (0 points)
Do you plan to apply the model in multiple domains (e.g., customer service, legal, technical)? A. Yes, flexible combination of various databases is needed (1 point) B. No, single task (0 points)
Will users potentially ask about knowledge the model hasn’t encountered during training? A. Yes (1 point) B. No, all questions and answers are included in the training set (0 points)
Is your data structured or well-formatted documentation (e.g., manuals, FAQs, SOPs)? A. Yes (1 point) B. No, it’s unstructured text or other formats (0 points)
Do you want to avoid retraining the model when adding new content in the future? A. Yes (1 point) B. No, it’s acceptable to retrain with every update (0 points)
Does the model’s response style, tone, and format require strong customization? A. No, flexible response format is acceptable (1 point) B. Yes, it must conform to brand tone, format, etc. (0 points)
Do you aim to quickly deploy an MVP (Minimum Viable Product)? A. Yes, I want to validate quickly (1 point) B. No, a longer development cycle is acceptable (0 points)
Does your application scenario involve data leakage risks (e.g., medical, financial)? A. Yes, it needs to run in a closed environment (0 points) B. No, external knowledge bases or open vector libraries can be queried (1 point)

Score Interpretation and Recommendations¶

Once you’ve calculated your score, let’s look at your hypothetical assessment and suggestions:

8–10 points: ✅ RAG For dynamic knowledge updates, traceability, low cost, and rapid deployment, RAG is the preferred choice.
5–7 points: ⚖️ Situation Dependent Consider a Hybrid Model. You might start with RAG and then fine-tune for high-frequency questions.
0–4 points: ✅ Fine-tuning When knowledge is static, a highly specific style is required, or offline processing is needed, fine-tuning is more suitable.

RAG Architecture Components¶

Retriever: Takes the user’s question or prompt and finds the most relevant information from a knowledge base (e.g., documents, web pages, FAQs). Common techniques: Dense Passage Retrieval (DPR), BM25, FAISS (vector search).
Generator: Uses the retrieved documents as context to generate the final answer. Common models: BART, T5, GPT, etc.

Operational Flow¶

User inputs question: “What is quantum entanglement?”
Retriever finds relevant passages from a large document corpus (e.g., Wikipedia, company’s internal knowledge base).
Generator uses the question and these passages as context to generate a natural language answer.

Features¶

Updatability: Knowledge comes from external databases and can be updated regularly, without being limited by model training time.
Controllability: You can specify that the model only queries a certain database (e.g., internal company documents).
Reduced Hallucination: Generated content supported by data is more trustworthy.

Implementation¶

1.Preparation of Data for Embedding: You’ll need one or more PDF or TXT files with a distinct theme (I used some personal introductions and accomplishments from LinkedIn). These will then be used to create embeddings.

save_directory = '...\\genAI_HW' # Specify output locationfile_paths = ['...\\genAI_HW\\dan_intro.txt',] # List of file names for embeddingembedding_manager = EmbeddingManager(save_directory) # Specify where outputs are storedembedding_manager.gemini_model_init()embedding_manager.create_embedding(file_paths) # Generate embeddings# SentenceTransformer('all-MiniLM-L6-v2') is used here

2.Calling the Gemini API with Embeddings: Next, you can call the previously used Gemini API, providing it with the embeddings to get a response.

def flow_for_answering(self, query):    ## Get optimized files    documents = self.load_documents(os.path.join(self.save_directory, 'processed_rag_text.json'))    #get embedding        embeddings = np.load(os.path.join(self.save_directory, 'embeddings.npy'))     best_document = self.retrieve_documents(query, embeddings, documents)  # Retrieve relevant documents based on the query    #fine tune prompt    query_prompt = self.add_prompt(query)    response = self.generate_response(self.chat_model, query_prompt, best_document)  # Generate response using the retrieved documents as context    response = self.extract_content(response)    self.historical_conversations.append({"user": query, "model": response})    return response

Example Conversation¶

Here’s the final Q&A:

User: Who is Dan? Model: Dan is a highly skilled data analyst with a strong online presence showcasing his expertise. For detailed information on his experience and skills, check his LinkedIn and blog.

User: Is he suitable for a data analyst role? Model: Based on his LinkedIn profile and blog, Dan demonstrates strong potential for a data analyst role. His skills and experience appear relevant.

As you can see, through RAG, Gemini obtained information about Dan and enhanced the response via prompt optimization. You can also observe that the Gemini model answers questions more cautiously (e.g., “appear relevant”). The above is a simple RAG implementation.

Generative AI Introduction 2024 — RAG Deep Dive