Swiftask
  • Quick start
  • Key concepts
    • AI Tools Hub
    • Agents
    • Knowledge base
    • Skills
    • Projects
    • Automation
  • AI tools hub
    • Introduction
    • Chat interface
    • Tokens
    • List of AI features
    • AI suggestions
    • FAQ
  • Agents
    • Introduction
    • Create an agent step by step
    • How to evaluate your agent
    • Multi-agents
    • Widget
    • Share agent
    • FAQ
  • Knowledge base
    • Introduction
    • Data connectors
      • Rich text
      • PDF File
      • Azure Document Loader
      • YouTube
      • Apify Dataset
      • PowerPoint File
      • Excel File
      • DOCX File
      • SQL Database
      • REST API
      • JSON File
      • CSV File
      • SQL Database Query
      • Website
      • Webpage
      • Sitemap
      • Dropbox files
      • Google drive files
    • Create a knowledge base
    • Attach Knowledge base to your agent
    • Share knowledge base
    • FAQ
  • Skills
    • Introduction
    • Skills library
      • Webpage Content Parsing
      • GitLab File Creation
      • Browsing with Perplexity
      • Open API
      • Retriever data from external sources
      • GitHub pull request diff retriever
      • GitHub pull request comment
      • Export table to Excel
      • Export text to PDF
      • GitHub file content
      • GitHub pull request info
      • OpenDataSoft
      • Agent as Skill
      • Swiftask AI recommandation
      • LinkedIn Share
      • Prismic migration create
      • Github create file
    • Create a new skill
    • Attach skill to your agent
    • FAQ
  • Projects
    • Introduction
    • Create a project
    • Generate task
    • Task AI chat
    • Organize chat in project
    • Project agent
    • FAQ
  • Automation
    • Introduction
    • Create an automation
  • Workspace admin
    • Introduction
    • Invite collaborators to join your workspace
    • Referral
    • Subscription renewal and Credit explanation
    • Purchase credits
    • Share agent
    • Subscription Pro plan/Team plan & token distribution
    • Create a project
    • Cancel subscription /Manage payment method
    • Personnal data security
    • SSO For enterprise
  • Use cases & Tutorials
    • Chat with multi-AI
    • Chat with PDF file
    • Import data - Webpage
    • How to generate an image on Swiftask
    • Import data (Azure Document Loader) - PDF
    • How to generate videos on Swiftask
    • Transform your ideas into videos with LUMA AI
    • Upgrade subscription plan
    • How to create an agent? step by step
    • Create AI agents for your business
    • Integrate external API in your agent
    • Create a professional landing page in 5 minutes
    • How to automate your blog content creation with an AI agent
    • How to evaluate your AI agent
    • How to create a Community Manager agent
  • Developer
    • List of AI and agents accessible via API
    • Access AI and agent through API
    • OpenAI SDK
  • Support & Social network
  • Changelog
Powered by GitBook
On this page
  1. Knowledge base
  2. Data connectors

Webpage

Analyzes a specific web page to gather textual information.

PreviousWebsiteNextSitemap

Last updated 3 months ago

Detailed Explanation

  1. Name:

    • This field allows you to assign a specific name to your webpage data source, helping you identify it easily within your project.

    • Example: You might name it "E-commerce Product Page" if the data pertains to products listed on an online store.

  2. URL(s):

    • This field is for specifying one or more URLs from which you want to scrape data. You can enter multiple URLs separated by commas.

    • Example: If you want to scrape data from multiple URLs, you would enter:

      Copyhttps://example.com, https://another-example.com
  3. Fetch Content from Page Sub-URLs?:

    • This option allows you to specify whether you want to scrape the main page as well as its subpages (nested links).

    • Example: If you check this option, the scraper will also gather data from any linked pages within the specified URLs.

  4. Enable Pagination:

    • This option allows you to enable pagination for the scraping process, which is useful for websites that display data across multiple pages.

    • Example: If a webpage lists products across several pages, enabling pagination will allow the scraper to collect data from all those pages.

  5. Page Key:

    • This parameter is used to specify the key for pagination in the URL, which is crucial for navigating through pages.

    • Example: If your pagination uses a query parameter like ?page=1, you would enter:

      Copypage
  6. Max Pages:

    • This field specifies the maximum number of pages to scrape if pagination is enabled. Setting this helps control the volume of data collected.

    • Example: If you want to scrape a maximum of 5 pages, you would enter:

      Copy5
  7. Chunk Size:

    • This field specifies the number of tokens or characters in each chunk of data processed. The default value is set to 1024, but you can modify it as needed.

    • Example: If you expect a large volume of data, you might set the chunk size to 512 for easier processing.

  8. Cost Information:

    • This section provides details about the cost associated with importing words from the specified web pages.

    • Example: If it states "Cost per words: 0.035 tokens" and "Remaining words: 1279900685 Words," this indicates how many tokens will be charged for each word processed and the total number of words remaining for processing.