Data corpuses¶
With Alan AI, you can easily create an Agentic Interface that integrates both static and dynamic data sources to respond to user queries. Alan AI uses advanced techniques like semantic search, question answering and code generation to provide accurate and relevant responses while maintaining a natural, multi-turn conversation flow.
You can set up data corpuses for your Agentic Interface if you already have a pool of information that you want to use for the following purposes:
- Handle user requests 
- Offer 24/7 automated support 
- Onboard new customers and employees 
- Provide instructions, training and so on 
When building a Q&A Agentic Interface, you can combine diverse types of data sources. The Q&A service accepts the following data formats:
- Web pages 
- Plain text 
- PDF 
- CSV 
- JSON data 
Alan Al provides several tools to define and work with data corpuses:
Learn how to crawl static data sources such as websites, web pages and PDF files.
Learn how to crawl dynamic data sources: APIs, databases and so on.
Learn how to crawl dynamically loaded data and specific page sections with the Puppeteer crawler.
Understand how Alan AI crawls documents and what the depth parameter means in different types of corpuses.
Learn how to prioritize one data corpus above the others.
Learn how to filter data corpuses based on criteria like user roles, product versions or individual preferences.
Learn how to include and exclude specific documents from data corpuses.
Find out how to crawl websites and web pages that require basic authentication.
Discover how to manage data crawling tasks in Alan AI Studio.
Learn how to review and examine what data sources and content your Q&A Agentic Interface utilizes to converse with users.