Webpage

Analyzes a specific web page to gather textual information.

Detailed Explanation

  1. Name:

    • This field allows you to assign a specific name to your webpage data source, helping you identify it easily within your project.

    • Example: You might name it "E-commerce Product Page" if the data pertains to products listed on an online store.

  2. URL(s):

    • This field is for specifying one or more URLs from which you want to scrape data. You can enter multiple URLs separated by commas.

    • Example: If you want to scrape data from multiple URLs, you would enter:

      Copyhttps://example.com, https://another-example.com
  3. Fetch Content from Page Sub-URLs?:

    • This option allows you to specify whether you want to scrape the main page as well as its subpages (nested links).

    • Example: If you check this option, the scraper will also gather data from any linked pages within the specified URLs.

  4. Enable Pagination:

    • This option allows you to enable pagination for the scraping process, which is useful for websites that display data across multiple pages.

    • Example: If a webpage lists products across several pages, enabling pagination will allow the scraper to collect data from all those pages.

  5. Page Key:

    • This parameter is used to specify the key for pagination in the URL, which is crucial for navigating through pages.

    • Example: If your pagination uses a query parameter like ?page=1, you would enter:

      Copypage
  6. Max Pages:

    • This field specifies the maximum number of pages to scrape if pagination is enabled. Setting this helps control the volume of data collected.

    • Example: If you want to scrape a maximum of 5 pages, you would enter:

      Copy5
  7. Chunk Size:

    • This field specifies the number of tokens or characters in each chunk of data processed. The default value is set to 1024, but you can modify it as needed.

    • Example: If you expect a large volume of data, you might set the chunk size to 512 for easier processing.

  8. Cost Information:

    • This section provides details about the cost associated with importing words from the specified web pages.

    • Example: If it states "Cost per words: 0.035 tokens" and "Remaining words: 1279900685 Words," this indicates how many tokens will be charged for each word processed and the total number of words remaining for processing.

Last updated