Train Your AI — NexaDesk Docs

Training is what makes your chatbot actually useful. NexaDesk processes your content, breaks it into semantic chunks, creates vector embeddings, and uses retrieval-augmented generation (RAG) to answer questions accurately.

Training Sources

NexaDesk supports several types of training data:

Enter a URL and NexaDesk will crawl the page (and optionally follow links) to index the content.

Go to Knowledge Base > Add Source > Website
Enter the URL (e.g., https://yoursite.com/products)
Choose crawl depth:
- Single page — Only the specified URL
- Crawl subpages — Follow links within the same domain (up to 50 pages)
Click Start Training

Upload PDF files containing product catalogs, manuals, or policy documents.

Go to Knowledge Base > Add Source > File Upload
Drag and drop or browse for PDF files (max 10MB each)
NexaDesk extracts text, splits into chunks, and indexes the content

Paste or type custom content directly — useful for FAQs, policies, or anything not available as a URL or file.

Go to Knowledge Base > Add Source > Text
Give the entry a title
Paste or write your content
Save

Add specific question-answer pairs for precise control over responses.

Go to Knowledge Base > Add Source > Q&A
Enter the question and the desired answer
The chatbot will match similar visitor questions to your answer

Managing Training Data

In the Knowledge Base section, you can:

View all sources — See the status of each training source (active, processing, failed)
Re-train — Update a source after the original content has changed
Delete — Remove a source and its associated embeddings
View chunks — Inspect how NexaDesk split your content into training segments

Training Tips

Be specific — Product pages with detailed descriptions produce better answers than generic landing pages
Cover edge cases — Add Q&A pairs for questions the AI gets wrong
Update regularly — Re-train sources when your content changes (pricing, features, policies)
Check quality — Test your chatbot after training to verify answer accuracy

How Training Works Internally

Content is fetched and cleaned (HTML stripped, boilerplate removed)
Text is split into overlapping chunks (~500 tokens each)
Each chunk is embedded using an embedding model
Embeddings are stored in a vector index
At query time, the visitor's question is embedded and the most relevant chunks are retrieved
The AI generates an answer using the retrieved chunks as context