a
Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation for Newbies

Clients frequently approach us with AI projects. This typically leads to discussions about the idiosyncrasies of Large Language Models (LLM), prompt engineering, parameter-efficient fine-tuning, and reinforcement learning from human feedback. Usually, the conversation leads to Retrieval-Augmented Generation.

What is a Prompt-based Model?

Compared to Retrieval-Augmented Generation models, Prompt-based models are currently the most common form of generative AI model and power some of the most popular AI tools. When you begin interacting with a model like ChatGPT or Bing Chat powered by GPT-4, you start by issuing a prompt to elicit a response. To demonstrate, we’ll start with a prompt that a client of ours, Let’s Play Soccer (LPS), a chain of indoor soccer facilities, receives daily.  Typically, one of their customers will want to rent a field. 

“We are wondering how much it would cost to rent a field for an hour weekly/bi-weekly during a weekday early in the morning (around 6am) Facility: Timpanogos Indoor Soccer”

Even if the grammar isn’t perfect, it still offers enough metadata, including a facility name, to generate a response. If we feed this response into a bot like GPT-4, it spits out the following:

“The cost of reserving a field at Let’s Play Soccer depends on the facility, the time, and the frequency of your reservation. According to the Let’s Play Soccer websitehttps://www.letsplaysoccer.com, the prices vary from $50 to $150 per hour for different facilities. You can also get discounts for booking multiple hours or sessions in advance https://www.letsplaysoccer.com, etc…”

The completion is decent, considering the LLM model only knows what is fed into it from the LPS website. However, the response we just displayed is admittedly merely a paraphrase, because GPT-4 starts it starts talking in circles afterward. Alternatively, the model could provide a response that involves topics beyond the scope of the original prompt.  The LLM isn’t suited to succinctly answer the question because there’s nothing to prevent the logic from breaking like this.  This type of error is commonly referred to as a “hallucination.”

Let’s Play still wants to implement AI to answer queries like field rental costs, but we then run into the dilemma of designing the model to answer succinctly without divulging into a bizarre and clearly inhuman conversation. The solution is either try to build a multi-billion parameter model from scratch, which requires millions of GPU compute hours, or you can leverage Retrieval-Augmented Generation (RAG).

Bulk Texting for Business

As an advanced AWS partner, we bring unparalleled expertise to architect, deploy, and optimize cloud solutions tailored to your unique needs. 

What is Retrieval-Augmented Generation?

The Retrieval-Augmented Generation model draws responses from a dedicated data source. These external data sources can contain content related to expected queries the model will receive and can consist of various data types, whether imagery, text documents, or spreadsheets.  By limiting the scope of what it can respond with to prevent hallucinations, RAG improves upon the LLM model without having to incur the expense of building a new model.

Data Preparation

If your proprietary data is housed in a database, PDF, or Word document, you must extract it from these sources. Next, you’ll store this data in a vector database.

A vector database will store the records of your database table or document as rows.  Instead of having a scalar value as the index, the records will be indexed by vector embeddings that attempt to capture the meaning of the data in the record. This process involves using linear algebra to convert the prompt vector embedding into a dot product.  The dot product is then compared to the record’s vector embedding or an index with a similar enough dot product.

Returning to the Let’s Play Soccer example, they have a database table that contains rates for fields on specific days of the week by facility. These records are put into the vector database with a vector embedding that would be the equivalent of “renting a field at Timpanogos.”

Application Integration

RAG integration

When the prompt to rent a field comes in from the mobile app or website, we create a vector representation of the prompt before we communicate with the LLM. The system queries the vector database with the vector representation of the prompt to extract semantically similar info from the database table that contains the field prices. The responses from the vector database may not be ideal. They’ll come back with a similarity score in relationship to the prompt, so we may need to do some additional re-ranking based on Facility ID, field, day of week, and time of day.

Next, we take the first-ranked responses from the vector database and add the response to our original prompt. This augmented prompt now has the information from Let’s Play Soccer’s internal field price table and the original prompt. We now pass this augmented prompt to GPT-4:

“We are wondering how much it would cost to rent a field for an hour weekly/bi-weekly during a weekday early in the morning (around 6am) Facility: Timpanogos Indoor Soccer Field 1: Monday - Friday at 6 am mst is $50 per hour Field 2: Saturday - Sunday at 6 am mst is $100 per hour” 

The model will look for the text above in bold in a given prompt.  It is the additional content that we’ve pulled from the vector database. The model can now use the LPS information to generate a response that better answers the original question. Now GPT-4, using this supplementary data, spits out a customer-worthy completion prompt:

“Hi, thank you for your interest in renting a field at Timpanogos Indoor Soccer. Based on the information you provided, the cost of renting a field for an hour weekly/bi-weekly during a weekday early in the morning (around 6am) is as follows:

• If you rent Field 1 on a weekday, the cost is $50 per hour. If you rent it weekly, the total cost is $200 per month. If you rent it bi-weekly, the total cost is $100 per month.

• If you rent Field 2 on a weekend, the cost is $100 per hour. If you rent it weekly, the total cost is $400 per month. If you rent it bi-weekly, the total cost is $200 per month.

You can book your field online by visiting the [Timpanogos Indoor Soccer website] and filling out the reservation form. You can also call or text the facility manager at (801) 224-2500 to check the availability and confirm your booking. I hope this helps you find the best option for your team. If you have any other questions, feel free to ask me.”

The Challenge of Implementing Retrieval-Augmented Generation

Most startups and SMBs don’t want to spend large sums of money training and serving a new LLM model based on their custom data. Instead, they want to leverage their existing content to build a context-aware reasoning application that can answer customer inquiries using their data.

Using Retrieval-Augmented Generation (RAG), AI can leverage your existing data to cost-effectively make customers find answers to their queries more easily.  If you are interested in deploying an RAG solution and need help with implementation, learn more about our services here.

Get Started Today!

At AllCode, our mission is to leverage our unique skillset and expertise to deliver innovative, top-tier software solutions that empower businesses to thrive in our world’s rapidly-evolving technological landscape.

Work with an expert. Work with AllCode

Schedule a expert call

Related Articles

3 Ways Gen AI and AWS can Enhance Your Business

3 Ways Gen AI and AWS can Enhance Your Business

Amazon is on the cutting edge of new technologies. They have been increasingly experimenting with AI and learning algorithms, culminating in their most recent breakthroughs in Generative AI. Developers and technology enthusiasts have access to their innovations through the tools available on AWS.

Business Owner’s Guide to DevOps Essentials

Business Owner’s Guide to DevOps Essentials

As a business owner, it’s essential to maximize workplace efficiency. DevOps is a methodology that unites various departments to achieve business goals swiftly. Maintaining a DevOps loop is essential for the health and upkeep of deployed applications.

AWS Graviton and Arm-architecture Processors

AWS Graviton and Arm-architecture Processors

AWS launched its new batch of Arm-based processors in 2018 with AWS Graviton. It is a series of server processors designed for Amazon EC2 virtual machines. The EC2 AI instances support web servers, caching fleets, distributed data centers, and containerized microservices. Arm architecture is gradually being rolled out to handle enterprise-grade utilities at scale. Graviton instances are popular for handling intense workloads in the cloud.