Import documents / files

Written By Stanislas

Last updated 9 days ago

Upload files from your computer to build your knowledge base. Swiftask automatically extracts text from documents using OCR technology, processes them into searchable chunks, and makes them available to AI agents and teammates across your workspace.

Whether you need to upload PDFs, Word documents, spreadsheets, or code files, file import makes it easy to centralize your organization's knowledge in one place. Upload once, use everywhereβ€”across Chat, Agents, and Projects.


Overview

The import documents feature lets you upload files directly from your computer to your knowledge base. Swiftask automatically processes your filesβ€”extracting text from images and PDFs using Mistral OCR technology, parsing structured data, and indexing everything for instant AI-powered search and retrieval.

You can upload multiple file types including documents (PDF, DOCX, TXT), spreadsheets (CSV, XLSX), code files (JSON, MD, etc.), and more. Each file is processed, chunked, and stored as a data source that agents and team members can access.


Prerequisites

To import documents to your knowledge base, you need:

  • A Swiftask account (sign up at swiftask.ai)

  • Access to the Knowledge section

  • Files on your computer (supported formats: PDF, DOCX, XLSX, CSV, TXT, JSON, code files, etc.)

  • Permission to create data sources in your workspace

File import is available to all Swiftask users.


Supported file types

Swiftask accepts a wide range of file formats:

Documents: PDF, DOCX, DOC, TXT

Spreadsheets: XLSX, XLS, CSV

Data formats: JSON, XML, MD

Code files: Python, JavaScript, and other programming language files

File size: Check your plan limits for maximum file size per upload


Step-by-step guide

1. Navigate to Knowledge

Click Knowledge in the left sidebar. You'll see the Knowledge interface with all your data sources and folders.

2. Click the Import button

In the Knowledge section, locate and click the Import button. A dropdown menu appears with multiple import options.

As shown in the image above, you'll see several import options:

  • Import from my computer – Upload files directly from your device

  • Import from Google Drive – Sync files from Google Drive

  • Import from SharePoint – Connect to SharePoint documents

  • Import from Dropbox – Sync files from Dropbox

  • Import a website – Scrape content from a website

  • Import a web page – Import a single web page

  • Import a sitemap – Import multiple pages from a sitemap

For this guide, select Import from my computer.

3. Configure your file import

The file import configuration screen opens. This is where you'll upload your files and configure how Swiftask processes them.

As shown in the image above, the screen is divided into two sections:

Left section: File upload

  • Name – Enter a descriptive name for your data source (e.g., "My Files for knowledge base")

  • File(s) – Drop your files here or click to browse. You can select one or multiple files (PDF, DOC, DOCX, PPT, PPTX, XLS, XLSX, CSV, TXT, MD, JSON, code files)

Right section: Optional fields

  • Select embedding model – Choose the AI model used to create vector embeddings of your content (default: OpenAI Text Embedding 3 Small). This determines how your content is indexed and searched

  • Chunk size – Set the number of tokens or characters in each chunk. The default is 512, but you can modify it if needed. Smaller chunks provide more precise retrieval; larger chunks provide more context

  • Metadata – Add custom metadata as JSON to help the AI understand your data. This field is optional but useful for adding context

  • Text splitter – Choose the method for splitting text content. The default is markdown, but you can select other options from the dropdown

  • Warning! – A cost indicator shows the credit consumption: "For File(s), the cost is 500 tokens per page"

The screen also mentions that Swiftask uses Mistral OCR technology to convert scanned documents and images into markdown text.

4. Upload your files

Option 1: Drag and drop

Drag files from your computer directly into the Drop your file here area.

Option 2: Browse and select

Click Click to browse or drag files here, then select files from your file system.

You can upload multiple files at once. Each file appears in the upload area with its name and a delete icon (trash) if you need to remove it.

5. Configure optional settings (if needed)

Embedding model: Leave the default (OpenAI Text Embedding 3 Small) unless you have specific requirements.

Chunk size: Keep the default (512) for most use cases. Adjust only if you need more granular or broader context retrieval.

Metadata: Add custom metadata if you want to provide additional context to the AI. For example:

{ "department": "HR", "year": "2025", "document_type": "policy" } 

Text splitter: Keep the default (markdown) unless your document requires a specific splitting method.

6. Save and process

Once your files are uploaded and configured, click the Save button at the bottom of the screen.

Swiftask processes your files in the background:

  • Text is extracted from PDFs and images using OCR

  • Content is split into chunks based on your chunk size setting

  • Chunks are embedded using the selected embedding model

  • The data source is indexed and made searchable

You'll see a progress indicator. Once complete, your data source appears in the Knowledge section.

7. View your imported documents

After processing, navigate back to the Knowledge section. Your new data source appears in the list.

As shown in the image above, the imported data source displays:

Left panel: Content list

  • A list of all content imported from this data source

  • Search bar to find specific items by name

  • Each item shows a checkbox and an eye icon for viewing details

Right panel: Details

  • Create agent – Create a custom AI agent powered by this knowledge source

  • Chat with this datasource – Instantly chat with your data using an AI agent

  • Shared with – Shows who has access (e.g., "Karen Lee" as OWNER)

  • Used by agents – Lists agents using this data source (or "This data source is not used by any agent yet")

  • Indexation status – Shows "Up to date" when indexing is complete

  • Embedding model – Displays the model used (e.g., "OpenAI - Text Embedding 3 Small")

  • Name – The data source name (e.g., "test")

  • Category – Shows the type (e.g., "File(s)")

  • Url – Displays the source URL if applicable

  • Text Splitter – Shows the splitting method used (e.g., "recursive")

  • Creation date and Last updated – Timestamps for tracking changes

  • Indexing status – Expandable section showing indexing progress (e.g., "Indexing 1 data source(s): success")

8. Preview document content (optional)

To preview the extracted content from your document, click the eye icon next to any item in the content list.

As shown in the image above, a Details modal opens displaying:

Left section: Extracted content preview

  • A preview of the content extracted and indexed in the vector database

  • Technical note: "This is a preview of the content extracted and indexed in the vector database. For technical users, it can be used to preview the chunk structure."

  • The actual text content from your document, showing how it was chunked (e.g., "----------CHUNK 1----------" followed by the extracted text)

Right section: Metadata

  • Length – Number of characters in the chunk (e.g., "3456")

  • Creation date – When the chunk was created (e.g., "25 Jan 2026")

  • Last updated – When the chunk was last modified (e.g., "25 Jan 2026 22:28")

  • Link – A clickable link to the original file (e.g., "layout-parser-paper-with-table.pdf")

This preview helps you verify that Swiftask correctly extracted and processed your document content.


How file processing works

Automatic OCR and text extraction

When you upload a PDF or image, Swiftask automatically extracts text using Mistral OCR technology. This means:

  • PDFs: All text content is extracted, whether the PDF is text-based or scanned

  • Images: Text embedded in screenshots, photos, or scanned documents is automatically extracted

  • No manual steps needed: Extraction happens automatically in the background

Document parsing

Swiftask parses your files to understand their structure and content:

  • Documents (DOCX, TXT, PDF): Text is extracted and analyzed

  • Spreadsheets (CSV, XLSX): Rows, columns, and data relationships are recognized

  • Data files (JSON, XML): Structured data is parsed and made available for queries

  • Code files: Code structure and syntax are preserved

Chunking and embedding

Your documents are split into chunks and converted into vector embeddings:

  • Chunking: Content is divided into smaller pieces based on your chunk size setting (default: 1024 tokens)

  • Embedding: Each chunk is converted into a vector representation using the selected embedding model

  • Indexing: Vectors are stored in a searchable database that agents can query


Practical use cases

Build an HR knowledge base

Upload employee handbooks, policies, and benefits documentation. Create an HR agent that answers employee questions using your actual policies.

Centralize product documentation

Upload product manuals, troubleshooting guides, and FAQs. Build a technical support agent that provides accurate answers based on your documentation.

Organize research and reports

Upload market research, competitor analysis, and internal reports. Create a research agent that analyzes data and generates insights.

Store legal and compliance documents

Upload contracts, compliance guidelines, and legal policies. Build an agent that references specific clauses and regulations.


Tips & best practices

Use clear, descriptive names: Name your data sources clearly so you can identify them later. Instead of "Document 1," use "HR Hiring Standards 2025" or "Product Manual v3.2."

Organize with folders: Group related documents in folders to keep your knowledge base organized.

Keep chunk size at default (1024): The default chunk size works well for most use cases. Adjust only if you have specific retrieval needs.

Add metadata for context: Use the metadata field to add context that helps AI understand your documents better.

Update sources regularly: If your documentation changes, re-upload or replace the data source. Outdated information leads to incorrect agent responses.

Test after importing: After importing, test your agents by asking questions that should reference the new content. Verify accuracy.


Troubleshooting

File didn't upload

Cause: File format not supported or file size exceeds limit

Solution:

  • Check that your file is one of the supported types (PDF, DOCX, XLSX, CSV, TXT, JSON, etc.)

  • Verify the file is within your plan's size limit

  • Try again with a different file

Text wasn't extracted correctly

Cause: Poor image quality or corrupted file

Solution:

  • Ensure scanned documents are clear and high-resolution

  • Try re-saving the file in a different format

  • Contact support if the issue persists

Indexing failed

Cause: Processing error or insufficient credits

Solution:

  • Check your credit balance in Settings β†’ Usage & Billing

  • Retry the upload

  • Contact support if the error continues

Agent can't find information

Cause: Content not properly indexed or chunk size too large

Solution:

  • Verify the indexation status shows "Up to date"

  • Try reducing chunk size for more granular retrieval

  • Check that the agent is connected to the correct data source


What happens next

Once you've imported your documents, you're ready to:

  • Create an agent – Build a custom AI agent powered by your knowledge source

  • Chat with your data – Use the "Chat with this datasource" feature to instantly query your documents

  • Share with teammates – Give team members access to your knowledge base

  • Connect to agents – Link your data source to existing agents to enhance their capabilities

Your documents are now indexed, searchable, and ready to power AI-driven workflows across your workspace.


Additional resources

  • Knowledge base – Introduction – Learn what knowledge base is and why it matters

  • Import website and web pages – Add web content to your knowledge base

  • Import from Google Drive – Sync files from cloud storage

  • Permissions and access – Control who can see and use your knowledge base

  • Creating an agent – Build an agent and connect it to your knowledge base


Ready to import your first documents? Click Knowledge in the sidebar, then Import β†’ Import from my computer. Select your files, configure your settings, and let Swiftask handle the rest.