Create a dynamic corpus

You can use the dynamic corpus functionality to retrieve live data from dynamic sources, such as databases, APIs or other real-time data streams. This feature is particularly useful for scenarios where the AI agent needs to access the most current information. The AI agent can use this live data to answer user queries, providing up-to-date information and contextually accurate responses about your app objects and state.

Use case

You are developing an AI agent for a cloud service platform and you want it to provide real-time information about resources such as VMs, storage buckets and various services. By creating a dynamic corpus, Alan AI can pull this data from a Google Firebase database, enabling the AI agent to:

  • Present the current status of VMs, including their configuration and usage

  • Access the latest information about storage buckets

  • Provide real-time data on the availability and usage of cloud services

Prerequisites

Before you define a dynamic corpus for your AI agent, make sure you have signed up for Alan AI Studio and created a project for the AI agent. For details, see Sign up for Alan AI Studio.

Step 1. Retrieve data

The sample app for this tutorial stores its cloud infrastructure data in the Google Firebase Realtime database.

../../_images/firebase.png

To start building a dynamic corpus, you must first retrieve data on cloud infrastructure objects from the database.

  1. To the project in Alan AI Studio, add a new dialog script: Dynamic_corpus.

  2. To the dialog script, add the following code to fetch the data from the database using the built-in axios library:

    Dialog script
    // The Firebase Realtime Database URL
    const firebaseUrl = 'https://alan-ai-quickstart-default-rtdb.europe-west1.firebasedatabase.app/';
    
    // Retrieve data from the database
    async function getData(path) {
       try {
           // Construct the full URL for the request
           const url = `${firebaseUrl}.json`;
    
           // Make the GET request
           const response = await api.axios.get(url);
           console.log(response.data);
    
           // Return the data
           return response.data;
       } catch (error) {
           console.error('Error fetching data from Firebase Realtime Database:', error);
           throw error;
       }
    }
    
  3. In the code editor, to the left of the getData() function, click the Run icon; in the displayed window, click Call to check the function execution results:

    ../../_images/run-function.png

Step 2. Add the getInfrastructure() function with JSDoc comments

To enable the AI agent to access information about infrastructure objects, you need to add a function with JSDoc comments to the dialog script.

  • The function is used to get all the infrastructure data required for the AI agent to answer user queries.

  • The JSDoc comments explain how the function operates and provide the necessary details for the Alan AI Platform.

To add the function:

  1. To the dialog script, add the following code to get all infrastructure data:

    Dialog script
    /**
    * @tool: Returns an array containing virtual machines, storage buckets, and cloud services
    * @param:
    * @return: An array with a short description of objects with the following fields:
    * [
    *     {
    *         "cpuUsage": "Percentage of CPU usage",
    *         "diskSpace": "Used space/total space in GB",
    *         "id": "Unique identifier for the virtual machine",
    *         "lastBootTime": "ISO formatted date and time of the last boot",
    *         "memoryUsage": "Percentage of memory usage",
    *         "name": "Name of the virtual machine",
    *         "status": "Current status of the virtual machine (e.g., running, stopped)"
    *     },
    *     {
    *         "id": "Unique identifier for the storage bucket",
    *         "lastModified": "ISO formatted date and time of the last modification",
    *         "name": "Name of the storage bucket",
    *         "objectCount": "Number of objects stored in the bucket",
    *         "status": "Current status of the storage bucket (e.g., active, inactive)",
    *         "totalSize": "Total storage capacity of the bucket in GB or TB",
    *         "usedSize": "Amount of used storage in the bucket in GB or TB"
    *     },
    *     {
    *         "errorRate": "Percentage representing the error rate",
    *         "id": "Unique identifier for the cloud service",
    *         "lastCheck": "ISO formatted date and time of the last status check",
    *         "name": "Name of the cloud service",
    *         "responseTime": "Average response time in milliseconds",
    *         "status": "Current status of the cloud service (e.g., operational, degraded)",
    *         "uptime": "Percentage representing the service uptime"
    *     },
    *     ...
    * ]
    */
    async function getInfrastructureData() {
        const data = await getData();
        console.log(data);
        return data;
    }
    
  2. In the code editor, to the left of the getInfrastructureData() function, click the Run icon and in the displayed window, click Call to check the execution results:

    ../../_images/run-function2.png

Step 3. Add a query transform

To let the AI agent answer user queries about the cloud infrastructure, you need to teach it how to perform AI reasoning using the query transform.

  1. In the AI agent project, under Transforms, click Add.

  2. Create the infrastructure_queries transform with the following data:

    1. In the Instruction field, import getInfrastructureData function and provide general instructions on how to process VMs data. Then save the transform.

      Transform instruction
      #import getInfrastructureData
      
      When a question is asked, make a decision if the question relates to VMs, buckets, services not.
      If question does not relate to VMs, buckets, services or is too generic, generate null.
      If question relates to VMs, buckets, services, write an async function getRequestedData() that takes no parameters. getRequestedData() must call the provided functions to construct a JSON that will have all the necessary information to answer the question.
      
      ../../_images/query-transforms-general.png
    2. In the Examples section, add an example to answer the Show all infrastructure objects question. At the bottom of the view, click Add Row and create a transform example:

      Note

      To open an example in preview mode, in the top left corner of any cell, click the Magnifying glass icon.

      • At the top of the Input field, select the data format: json. In the field below, enter the JSON object obtained with the getInfrastructureData() function. You can copy it when running the function in Alan AI Studio or using the Alan AI Studio logs.

      • At the top of the Query field, select the data format: text. In the field below, enter the user query: Show all infrastructure objects.

      • At the top of the Result field, select the data format: javascript. In the field below, add steps in natural language to retrieve all objects data wrapped with <thinking></thinking> tags:

        Transform example
        <thinking>
        To return all objects info:
        1. Use getInfrastructureData() to get all objects data.
        2. Return the result.
        </thinking>
        
      • At the top of the Result field, click the Generate result button to automatically generate code for the instructions specified in the <thinking></thinking> block:

        ../../_images/dynamic-query-example.png
      • To test if the generated function works correctly, in the top right corner of the Result field, click the Run script button:

        ../../_images/dynamic-query-result.png
    3. In a similar way, add another example to ask a question: Show all buckets: add steps to retrieve buckets data wrapped with <thinking></thinking> tags and click the Generate result button to automatically generate code for the instructions specified in the <thinking></thinking> block.

      ../../_images/dynamic-query-example2.png

Step 4. Add the corpus() function

Finally, you need to add the corpus() function that will bring together all the elements to provide answers based on dynamic data.

  1. To the dialog script, add the following code:

    Dialog script
    corpus({
        title: `Infrastructure requests`,
        query: transforms.infrastructure_queries,
        priority: 2
    });
    
  2. Save the dialog script.

Validation

To make sure the dynamic corpus is correctly set up and that the AI agent can retrieve data from the database to answer user queries, use the Debugging Chat on the right to ask questions about cloud infrastructure objects. For example:

  • Show all cloud infrastructure objects

  • Show all VMs data

  • Show all buckets data

../../_images/dynamic-corpus-results.png