Create a static corpus¶
With Alan AI, you can create a static corpus for your AI agent. To build such a data corpus, Alan AI automatically crawls static data sources like web pages, PDF documents, text, CSV files and markdown files and creates a knowledge memory for the AI agent. The AI agent then uses this memory to answer user queries, providing clear, well-formatted responses that may include:
Formatted text
Lists
Images
Diagrams
Formulas
Code snippets
Links to the original source and so on
Use case¶
You are developing an AI agent for a cloud service platform. Your goal is to enable the AI agent to assist users with questions about managing resources like VMs and buckets. To achieve this, you need to build a static corpus by crawling the platform’s documentation that covers these topics.
Prerequisites¶
Before you define a static corpus for your AI agent, make sure you have signed up for Alan AI Studio and created a project for the AI agent. For details, see Sign up for Alan AI Studio.
Defining a static corpus¶
To define a static corpus:
To the dialog script in Alan AI Studio, add the
corpus()
function:corpus({ title: `Cloud documentation`, urls: [ `https://cloud.google.com/compute/docs/overview`, `https://cloud.google.com/compute/docs/images`, `https://cloud.google.com/compute/docs/disks`, `https://cloud.google.com/storage/docs/buckets`], depth: 1, maxPages: 30, priority: 1, });
Here, the data corpus uses the following parameters:
title
: corpus nameurls
: URLs of the web pages you want to crawldepth
: crawling depth that determines how many levels deep the crawler should go to retrieve contentmaxPages
: maximum number of pages the crawler will retrievepriority
: corpus priority level
Save the dialog script.
Validation¶
To make sure the pages have been crawled and the data corpus is successfully created:
At the bottom of Alan AI Studio, open logs and make sure the
corpus
task is marked asready
.At the top of Alan AI Studio, click Crawler Tasks and make sure the crawler task status is
complete
.In the code editor, to the left of the
corpus()
function, click the Magnifying glass icon. Use the Corpus Explorer to examine what content has been added to the knowledge memory.In the Debugging Chat on the right, ask questions about VMs and buckets, for example:
What is a regional persistent disk, and when should it be used?
How do I list all FreeBSD images?
What are the limitations in naming buckets?