What is a Prompt-based Model?
Compared to Retrieval-Augmented Generation models, Prompt-based models are currently the most common form of generative AI model and power some of the most popular AI tools. When you begin interacting with a model like ChatGPT or Bing Chat powered by GPT-4, you start by issuing a prompt to elicit a response. To demonstrate, we’ll start with a prompt that a client of ours, Let’s Play Soccer (LPS), a chain of indoor soccer facilities, receives daily. Typically, one of their customers will want to rent a field.
“We are wondering how much it would cost to rent a field for an hour weekly/bi-weekly during a weekday early in the morning (around 6am) Facility: Timpanogos Indoor Soccer”
Even if the grammar isn’t perfect, it still offers enough metadata, including a facility name, to generate a response. If we feed this response into a bot like GPT-4, it spits out the following:
“The cost of reserving a field at Let’s Play Soccer depends on the facility, the time, and the frequency of your reservation. According to the Let’s Play Soccer websitehttps://www.letsplaysoccer.com, the prices vary from $50 to $150 per hour for different facilities. You can also get discounts for booking multiple hours or sessions in advance https://www.letsplaysoccer.com, etc…”
The completion is decent, considering the LLM model only knows what is fed into it from the LPS website. However, the response we just displayed is admittedly merely a paraphrase, because GPT-4 starts it starts talking in circles afterward. Alternatively, the model could provide a response that involves topics beyond the scope of the original prompt. The LLM isn’t suited to succinctly answer the question because there’s nothing to prevent the logic from breaking like this. This type of error is commonly referred to as a “hallucination.”
Let’s Play still wants to implement AI to answer queries like field rental costs, but we then run into the dilemma of designing the model to answer succinctly without divulging into a bizarre and clearly inhuman conversation. The solution is either try to build a multi-billion parameter model from scratch, which requires millions of GPU compute hours, or you can leverage Retrieval-Augmented Generation (RAG).
As an advanced AWS partner, we bring unparalleled expertise to architect, deploy, and optimize cloud solutions tailored to your unique needs.
What is Retrieval-Augmented Generation?
The Retrieval-Augmented Generation model draws responses from a dedicated data source. These external data sources can contain content related to expected queries the model will receive and can consist of various data types, whether imagery, text documents, or spreadsheets. By limiting the scope of what it can respond with to prevent hallucinations, RAG improves upon the LLM model without having to incur the expense of building a new model.
Data Preparation
If your proprietary data is housed in a database, PDF, or Word document, you must extract it from these sources. Next, you’ll store this data in a vector database.
A vector database will store the records of your database table or document as rows. Instead of having a scalar value as the index, the records will be indexed by vector embeddings that attempt to capture the meaning of the data in the record. This process involves using linear algebra to convert the prompt vector embedding into a dot product. The dot product is then compared to the record’s vector embedding or an index with a similar enough dot product.
Returning to the Let’s Play Soccer example, they have a database table that contains rates for fields on specific days of the week by facility. These records are put into the vector database with a vector embedding that would be the equivalent of “renting a field at Timpanogos.”
Application Integration
When the prompt to rent a field comes in from the mobile app or website, we create a vector representation of the prompt before we communicate with the LLM. The system queries the vector database with the vector representation of the prompt to extract semantically similar info from the database table that contains the field prices. The responses from the vector database may not be ideal. They’ll come back with a similarity score in relationship to the prompt, so we may need to do some additional re-ranking based on Facility ID, field, day of week, and time of day.
Next, we take the first-ranked responses from the vector database and add the response to our original prompt. This augmented prompt now has the information from Let’s Play Soccer’s internal field price table and the original prompt. We now pass this augmented prompt to GPT-4:
“We are wondering how much it would cost to rent a field for an hour weekly/bi-weekly during a weekday early in the morning (around 6am) Facility: Timpanogos Indoor Soccer Field 1: Monday - Friday at 6 am mst is $50 per hour Field 2: Saturday - Sunday at 6 am mst is $100 per hour”
The model will look for the text above in bold in a given prompt. It is the additional content that we’ve pulled from the vector database. The model can now use the LPS information to generate a response that better answers the original question. Now GPT-4, using this supplementary data, spits out a customer-worthy completion prompt:
“Hi, thank you for your interest in renting a field at Timpanogos Indoor Soccer. Based on the information you provided, the cost of renting a field for an hour weekly/bi-weekly during a weekday early in the morning (around 6am) is as follows:
• If you rent Field 1 on a weekday, the cost is $50 per hour. If you rent it weekly, the total cost is $200 per month. If you rent it bi-weekly, the total cost is $100 per month.
• If you rent Field 2 on a weekend, the cost is $100 per hour. If you rent it weekly, the total cost is $400 per month. If you rent it bi-weekly, the total cost is $200 per month.
You can book your field online by visiting the [Timpanogos Indoor Soccer website] and filling out the reservation form. You can also call or text the facility manager at (801) 224-2500 to check the availability and confirm your booking. I hope this helps you find the best option for your team. If you have any other questions, feel free to ask me.”
The Challenge of Implementing Retrieval-Augmented Generation
Most startups and SMBs don’t want to spend large sums of money training and serving a new LLM model based on their custom data. Instead, they want to leverage their existing content to build a context-aware reasoning application that can answer customer inquiries using their data.
Get Started Today!
At AllCode, our mission is to leverage our unique skillset and expertise to deliver innovative, top-tier software solutions that empower businesses to thrive in our world’s rapidly-evolving technological landscape.
Work with an expert. Work with AllCode