Dynamic corpus¶
You can define dynamic data corpuses for your AI agent.
Dynamic data corpuses allow you to retrieve semi-structured data (JSON) from different data sources, process it using transforms and utilize the retrieved data to answer user queries in natural language. Dynamic data corpuses can be connected with various external systems, such as corporate platforms, databases, API services and other specialized data sources, to provide users with accurate, timely and contextually relevant information.
Corpus parameters¶
To define a dynamic corpus, use the corpus()
function:
corpus({
title: `Infrastructure requests`,
description: `Corpus to answer user queries about infrastructure objects`,
input: project.objects,
query: transforms.vms_queries,
output: project.cleanObjects,
transforms: transforms.vms_answer,
priority: 1
});
Name |
Type |
Required/Optional |
Description |
---|---|---|---|
|
string |
Optional |
Corpus title. |
|
string |
Optional |
Corpus description. |
|
function |
Optional |
Function used to populate the |
|
function |
Required |
Transforms function used to process user queries and generate code to retrieve necessary data. |
|
function |
Optional |
Function used to process obtained data, before it is passed to the |
|
function |
Optional |
Transforms function used to process and format data obtained with the |
|
integer |
Optional |
Priority level assigned to the corpus. Corpuses with higher priority are considered more relevant when user requests are processed. For details, see Corpus priority. |
How the dynamic corpus works¶
The implementation of a dynamic corpus can vary depending on the specific business use case and scenario. Typically, the data flows through the following stages:
The user makes a request to the dynamic corpus.
Alan AI retrieves JSON data from an external system and applies the
query
function to it to handle the user’s request and generate the code needed to retrieve the relevant information. This process can involve:The input data passed to the query transform
Functions with JSDoc comments
[Optional] Alan AI may perform additional processing on the data using the
output
function.The data is passed to the
transforms
function. Alan AI applies thetransforms
function instructions to process and format the output for the user.The response data is presented to the user.
Example of use¶
Assume you have a JSON object that lists virtual machines (VMs) in a cloud environment. You want to use this data as a dynamic source so users can ask questions about the VMs, and the AI agent can provide a formatted response in natural language.
To do this, perform the following steps:
Retrieve VM data: add a function that retrieves the VM data from the data source.
Add a query transform: instruct the AI agent on how to generate code that will get the necessary data to answer user queries.
Add a data formatting transform: instruct the AI agent on how to format the output of the corpus data.
Add a dynamic corpus: define a dynamic corpus.
[Advanced] Clean up the input data: process the input data before it is passed to the formatting transform.
Step 1. Retrieve data¶
To retrieve JSON data from the data source, you will typically make an API call to the data provider. To keep things simple, we will add a JSON object defining VMs data directly to the dialog script.
In the dialog script, create the project.infrastructure
variable:
project.infrastructure = {
"vms": [
{
"name": "prod-web-server",
"cpu": 4,
"memoryGB": 8,
"diskGB": 256,
"location": "US-West",
"status": "Running",
"lastUpdated": "2024-08-26T12:00:00Z",
"createdBy": "AdminUser",
"notes": "No issues reported."
},
{
"name": "prod-db-server",
"cpu": 2,
"memoryGB": 4,
"diskGB": 128,
"location": "EU-Central",
"status": "Stopped",
"lastUpdated": "2024-08-26T12:00:00Z",
"createdBy": "AdminUser",
"notes": "No issues reported."
},
{
"name": "stage-app-server",
"cpu": 8,
"memoryGB": 16,
"diskGB": 512,
"location": "Asia-East",
"status": "Running",
"lastUpdated": "2024-08-26T12:00:00Z",
"createdBy": "AdminUser",
"notes": "No issues reported."
}
]
}
Step 2. Add a query transform¶
With a query transform, you can instruct the AI agent on how to process the input data and generate code that returns the information needed to answer user queries.
In this example, we will instruct the AI agent using functions added to transforms.
Note
Each function used in transforms must have an explanation formatted as a JSDoc comment preceding the function code.
To the dialog script, add the
getAllVMs()
function with the function description:/** * @tool: Get virtual machines for a group. * @param: None * @return: Array with virtual machines descriptions. * [ * { * "name": "stage-app-server", * "cpu": 8, * "memoryGB": 16, * "diskGB": 512, * "location": "Asia-East", * "status": "Running/Stopped", * "lastUpdated": "2024-08-26T12:00:00Z", * "createdBy": "AdminUser", * "notes": "No issues reported." * }, * ... * ] */ function getAllVms() { const data = project.infrastructure; const objects = []; if (data.vms && Array.isArray(data.vms)) { objects.push(...data.vms); } return objects; }
In the AI agent project, under Transforms, create the
vms_queries
transform with the following data:In the Instruction field, import
getAllVms
function and provide general instructions on how to process VMs data. Then save the transform.#import getAllVms When a question is asked, make a decision if the question relates to VMs or not. If question does not relate to VMs or is too generic, generate null. If question relates to VMs, write an async function getRequestedData() that takes no parameters. getRequestedData() must call the provided functions to construct a JSON that will have all the necessary information to answer the question.
In the Examples section, add an example to answer the
Show all VMs
question. At the bottom of the view, click Add Row and create a transform example:Note
To open an example in preview mode, in the top left corner of any cell, click the magnifying glass icon.
At the top of the Input field, select the data format: json.
At the top of the Query field, select the data format: text. In the field below, enter the user query:
Show all VMs
.At the top of the Result field, select the data format: javascript. In the field below, add steps in natural language to retrieve all VMs data wrapped with
<thinking></thinking>
tags:<thinking> To return all VMs info: 1. Use getAllVms() to get all VMs data. 2. Return the result. </thinking>
At the top of the Result field, click the Generate result button to automatically generate code for the instructions specified in the
<thinking></thinking>
block:To test if the generated function works correctly, in the top right corner of the Result field, click the Run script button:
In a similar way, add another example to ask a question:
Show VMs with the Stopped status
: add steps to retrieve stopped VMs data wrapped with<thinking></thinking>
tags and click the Generate result button to automatically generate code for the instructions specified in the<thinking></thinking>
block.
Step 3. Add a data formatting transform¶
With a data formatting transform, you can define the output format for the AI agent response.
In the AI agent project, under Transforms, create the vms_answers
transform with the following data:
In the Instruction field, provide general instructions on how to format the VMs data. Then save the transform.
The input contains sample JSON with VM data, the query contains a set of sample user questions, the result field contains the formatted answer to be provided.
In the Examples section, add an example:
At the top of the Input field, select the data format: json.
At the top of the Query field, select the data format: text. In the field below, enter the user query:
Show all VMs
.At the top of the Result field, select the data format: markdown. In the field below, add the VM description formatted in Markdown:
Here is a list of all VMs: | Name | CPU | Memory (GB) | Disk (GB) | Location | Status | Last Updated | Created By | Notes | |----------------------|-----|-------------|-----------|-------------|---------|-----------------------|------------|----------------------| | **prod-web-server** | 4 | 8 | 256 | US-West | Running | 2024-08-26T12:00:00Z | AdminUser | No issues reported. | | **prod-db-server** | 2 | 4 | 128 | EU-Central | Stopped | 2024-08-26T12:00:00Z | AdminUser | No issues reported. | | **stage-app-server** | 8 | 16 | 512 | Asia-East | Running | 2024-08-26T12:00:00Z | AdminUser | No issues reported. |
In a similar way, add another example for a query:
Show all running VMs
:
Step 4. Add a dynamic corpus¶
To define a dynamic corpus, add the corpus()
function with the following parameters to the dialog script:
corpus({
title: `Infrastructure requests`,
description: `Corpus to answer user queries about infrastructure objects`,
query: transforms.vms_queries,
transforms: transforms.vms_answers,
priority: 1
});
Now, you can ask the AI agent questions like:
Show all VMs
Show all stopped VMs
Show all running VMs
and so on.
Note
To adjust the generated code and output data for new queries, open the necessary transform, in the top right corner, click History and click the add icon to the right of the necessary query row. The query will be added to transform examples. Here, you can edit it as described above.
Step 5. Clean up the input data¶
Note
This step is required if you want to process the data retrieved with the query
function before sending it to formatting transform.
Assume we only want to remove auxiliary VM fields from the answer: lastUpdated
, createdBy
and notes
.
To do this:
In the dialog script, create a set of fields you want to exclude:
const excludeFields = new Set([ "lastUpdated", "createdBy", "notes" ]);
Add the
cleanObjects()
function that will return the VM data without excluded fields and save it toproject.cleanObjects
:function cleanObjects(obj) { if (typeof obj !== 'object' || obj === null) { return obj; } const result = {}; for (const key in obj) { if (obj.hasOwnProperty(key) && !excludeFields.has(key)) { result[key] = cleanObjects(obj[key]); } } console.log(result); return result; } project.cleanObjects = cleanObjects;
Update the dynamic corpus to include the
output
parameter with theproject.cleanObjects
function:corpus({ title: `Infrastructure requests`, query: transforms.vms_queries, output: project.cleanObjects, transforms: transforms.vms_answers, priority: 1 });
Now, you can ask the AI agent questions like:
Show all VMs
Show all stopped VMs
Show all running VMs
The AI agent will use the cleaned data to provide a response.