Data corpuses¶
With Alan AI, you can easily create an AI agent that integrates both static and dynamic data sources to respond to user queries. Alan AI uses advanced techniques like semantic search, question answering and AI-driven code generation to provide accurate and relevant responses while maintaining a natural, multi-turn conversation flow.
You can set up data corpuses for your AI agent if you already have a pool of information that you want to use for the following purposes:
Handle user requests
Offer 24/7 automated support
Onboard new customers and employees
Provide instructions, training and so on
When building a Q&A AI agent, you can combine diverse types of data sources. The Q&A service accepts the following data formats:
Web pages
Plain text
PDF
CSV
JSON data
Alan Al provides several tools to define and work with data corpuses:
Learn how to crawl static data sources such as websites, web pages and PDF files.
Learn how to crawl dynamic data sources: APIs, databases and so on.
Learn how to crawl dynamically loaded data and specific page sections with the Puppeteer crawler.
Understand how Alan AI crawls documents and what the depth
parameter means in different types of corpuses.
Learn how to prioritize one data corpus above the others.
Learn how to filter data corpuses based on criteria like user roles, product versions or individual preferences.
Learn how to include and exclude specific documents from data corpuses.
Find out how to crawl websites and web pages that require basic authentication.
Discover how to manage data crawling tasks in Alan AI Studio.
Learn how to review and examine what data sources and content your Q&A AI agent utilizes to converse with users.