Deploying a Vector Database with Elasticsearch

Now that our LLM is up and running, we now need to deploy a vector database to enable the more complex Retrieval Augmented Generation (RAG) workflow.

In a RAG architecture, documents are pre-populated into a vector database. When a user asks a question, a tool such as LangChain is used to query the vector database for documents that are related to that question, and they are provided to the LLM as context to help it answer the question.

RAG can help to greatly enhance the responses from an LLM by providing the relevant information about the topic the user is asking and not simply relying on the pool of knowledge the LLM was trained on.

In our workshop we will be utilizing Elasticsearch for our vector database.

Elasticsearch is a Red Hat partner and Red Hat has announced future integrations within OpenShift AI.

Manually Scaling GPU node

For our pipeline we will run at the end of this section, we will need an additional GPU. While this cluster is set to autoscale our GPU nodes, it does take approximately 20 minutes to fully provision a GPU node and setup the GPU drivers on that node. In order to avoid waiting for it to autoscale, we are going to manually scale the GPU now, so it is ready by the time we are ready to execute our pipeline.

  1. From the Administrator perspective of the OpenShift Web Console, navigate to Compute > MachineSets. Select the MachineSet with the g5.2xlarge Instance type.

    MachineSets
  2. Click on the edit icon for Desired count, update the value to 2 and click save.

    MachineSet Desired Count

A new GPU node will begin the provisioning process. Continue with the rest of the instructions and the node should hopefully reduce the amount of time it takes for the pipeline to execute in the last part of this section.

Creating the Elasticsearch Instance

The Elasticsearch (ECK) Operator has already been installed on the cluster for you, so we will just need to create an Elasticsearch Cluster instance.

  1. With the composer-ai-apps namespace selected in the OpenShift Web Console, navigate to Operators > Installed Operators and select the Elasticsearch (ECK) Operator.

    Elasticsearch Operator
  2. Click the +Create instance option for an Elasticsearch Cluster

    Create Elasticsearch Cluster
  3. Update the object name to elasticsearch. In addition to this, add the following yaml to the top level spec: of the Elasticsearch custom resource:

    spec:
      http:
        tls:
          selfSignedCertificate:
            disabled: true

    and then click Create.

    Elasticsearch YAML
  4. Finally, we will need to restart our backend application pods to let it get the correct secret for the Elasticsearch instance. Previously the pod was started using a temporary secret, but since the Elasticsearch instance has been created, it will need to be restarted to load the valid secret.

    To restart the pod, find the quarkus-llm-router-* pod in the OpenShift Web Console and delete it. A new one will automatically be created with the correct secret value.

Three new Elasticsearch pods will be created in the composer-ai-apps namespace. Once they are Ready, you are ready to move on to the next step.

Populating Vector Database

Now that we have the elasticsearch cluster up and running, we must ingest documents into it.

For demonstration purposes we will be ingesting documentation for various Red Hat products. However, it is import to keep in mind that basically any data source can be used in the context of RAG, including the live web. If you have ever used the chatbot Perplexity AI, that is what it is doing behind the scenes.

Executing the ingestion pipeline

  1. From the OpenShift Web Console, while you have the composer-ai-apps namespace selected, navigate to the Pipelines > Pipelines section and locate the ingestion-pipeline pipeline. Click on the …​ on the right and click Start.

    Start Pipeline
  2. Update the GIT_REVISION field to lab and leave all of the other default options except for the source. Set source to VolumeClaimTemplate and click start.

    Volume Claim Template

    The lab branch of the data ingestion pipeline is simply a reduced list of documents that will be ingested using the pipeline in order to speed up the process. If you wish to try this pipeline on your own or test some of the other product assistants, feel free to leverage main.

  3. A new pipeline run will begin that will build an image containing our ingestion pipeline and start it that pipeline in Data Science Pipelines. Wait for the execute-kubeflow-pipeline task to complete.

    Pipeline Run
  4. Once the execute-kubeflow-pipeline task has executed, a new pipeline run will begin in Data Science Pipelines. To view the pipeline run, navigate to the OpenShift AI Dashboard and select Experiments > Experiments and runs. Click on the document_ingestion experiment. Be sure that you have the composer-ai-apps Project selected.

    View experiments
  5. You should see a single run listed. Click on the run to view the execution of that run.

    View run

The data ingestion process may take several minutes to complete. You can follow the progress of the various steps by checking the Logs for each step in the OpenShift AI Dashboard. You can also find the pods for the steps in the OpenShift Web Console. Some of the steps require a GPU which will trigger a new GPU node to be auto-scaled in your cluster, which can take several minutes.

Testing the vector database from Composer AI UI

Now that our vector database is up and running and we have ingested some documents.

We have already created several Assistants that are ready to use the documents we have ingested.

  1. Navigate to the Composer AI UI and click the Compare button in the top right hand corner.

    The Compare interface provides the ability to query multiple assistants at the same time to allow a user to compare the results. This can be a useful feature for testing various assistants to see how the responses differ.

  2. Select the Default Red Hat OpenShift AI Self Managed Assistant for the left chat, and the Default Assistant we used before to test our LLM. Ask a question about OpenShift AI such as "How do I create a data connection?" and see how the responses change with our RAG workflow.

    Compare Assistant