Deploying a Vector Database with Elasticsearch
Now that our LLM is up and running, we now need to deploy a vector database to enable the more complex Retrieval Augmented Generation (RAG) workflow.
In a RAG architecture, documents are pre-populated into a vector database. When a user asks a question, a tool such as LangChain is used to query the vector database for documents that are related to that question, and they are provided to the LLM as context to help it answer the question.
RAG can help to greatly enhance the responses from an LLM by providing the relevant information about the topic the user is asking and not simply relying on the pool of knowledge the LLM was trained on.
In our workshop we will be utilizing Elasticsearch for our vector database.
Elasticsearch is a Red Hat partner and Red Hat has announced future integrations within OpenShift AI.
Manually Scaling GPU node
For our pipeline we will run at the end of this section, we will need an additional GPU. While this cluster is set to autoscale our GPU nodes, it does take approximately 20 minutes to fully provision a GPU node and setup the GPU drivers on that node. In order to avoid waiting for it to autoscale, we are going to manually scale the GPU now, so it is ready by the time we are ready to execute our pipeline.
-
From the
Administrator
perspective of the OpenShift Web Console, navigate toCompute
>MachineSets
. Select the MachineSet with theg5.2xlarge
Instance type. -
Click on the edit icon for
Desired count
, update the value to 2 and click save.
A new GPU node will begin the provisioning process. Continue with the rest of the instructions and the node should hopefully reduce the amount of time it takes for the pipeline to execute in the last part of this section.
Creating the Elasticsearch Instance
The Elasticsearch (ECK) Operator has already been installed on the cluster for you, so we will just need to create an Elasticsearch Cluster
instance.
-
With the
composer-ai-apps
namespace selected in the OpenShift Web Console, navigate toOperators
>Installed Operators
and select theElasticsearch (ECK) Operator
. -
Click the
+Create instance
option for anElasticsearch Cluster
-
Update the object name to
elasticsearch
. In addition to this, add the following yaml to the top levelspec:
of theElasticsearch
custom resource:spec: http: tls: selfSignedCertificate: disabled: true
and then click
Create
. -
Finally, we will need to restart our backend application pods to let it get the correct secret for the Elasticsearch instance. Previously the pod was started using a temporary secret, but since the Elasticsearch instance has been created, it will need to be restarted to load the valid secret.
To restart the pod, find the
quarkus-llm-router-*
pod in the OpenShift Web Console and delete it. A new one will automatically be created with the correct secret value.
Three new Elasticsearch pods will be created in the composer-ai-apps
namespace. Once they are Ready, you are ready to move on to the next step.
Populating Vector Database
Now that we have the elasticsearch cluster up and running, we must ingest documents into it.
For demonstration purposes we will be ingesting documentation for various Red Hat products. However, it is import to keep in mind that basically any data source can be used in the context of RAG, including the live web. If you have ever used the chatbot Perplexity AI, that is what it is doing behind the scenes. |
Executing the ingestion pipeline
-
From the OpenShift Web Console, while you have the
composer-ai-apps
namespace selected, navigate to thePipelines
>Pipelines
section and locate theingestion-pipeline
pipeline. Click on the…
on the right and clickStart
. -
Update the
GIT_REVISION
field tolab
and leave all of the other default options except for thesource
. Setsource
toVolumeClaimTemplate
and click start.The
lab
branch of the data ingestion pipeline is simply a reduced list of documents that will be ingested using the pipeline in order to speed up the process. If you wish to try this pipeline on your own or test some of the other product assistants, feel free to leveragemain
. -
A new pipeline run will begin that will build an image containing our ingestion pipeline and start it that pipeline in Data Science Pipelines. Wait for the
execute-kubeflow-pipeline
task to complete. -
Once the
execute-kubeflow-pipeline
task has executed, a new pipeline run will begin in Data Science Pipelines. To view the pipeline run, navigate to the OpenShift AI Dashboard and selectExperiments
>Experiments and runs
. Click on thedocument_ingestion
experiment. Be sure that you have thecomposer-ai-apps
Project selected. -
You should see a single run listed. Click on the run to view the execution of that run.
The data ingestion process may take several minutes to complete. You can follow the progress of the various steps by checking the Logs
for each step in the OpenShift AI Dashboard. You can also find the pods for the steps in the OpenShift Web Console. Some of the steps require a GPU which will trigger a new GPU node to be auto-scaled in your cluster, which can take several minutes.
Testing the vector database from Composer AI UI
Now that our vector database is up and running and we have ingested some documents.
We have already created several Assistants that are ready to use the documents we have ingested.
-
Navigate to the Composer AI UI and click the
Compare
button in the top right hand corner.The
Compare
interface provides the ability to query multiple assistants at the same time to allow a user to compare the results. This can be a useful feature for testing various assistants to see how the responses differ. -
Select the
Default Red Hat OpenShift AI Self Managed Assistant
for the left chat, and theDefault Assistant
we used before to test our LLM. Ask a question about OpenShift AI such as "How do I create a data connection?" and see how the responses change with our RAG workflow.