Webpage
Analyzes a specific web page to gather textual information.
Last updated
Analyzes a specific web page to gather textual information.
Last updated
Name:
This field allows you to assign a specific name to your webpage data source, helping you identify it easily within your project.
Example: You might name it "E-commerce Product Page" if the data pertains to products listed on an online store.
URL(s):
This field is for specifying one or more URLs from which you want to scrape data. You can enter multiple URLs separated by commas.
Example: If you want to scrape data from multiple URLs, you would enter:
Fetch Content from Page Sub-URLs?:
This option allows you to specify whether you want to scrape the main page as well as its subpages (nested links).
Example: If you check this option, the scraper will also gather data from any linked pages within the specified URLs.
Enable Pagination:
This option allows you to enable pagination for the scraping process, which is useful for websites that display data across multiple pages.
Example: If a webpage lists products across several pages, enabling pagination will allow the scraper to collect data from all those pages.
Page Key:
This parameter is used to specify the key for pagination in the URL, which is crucial for navigating through pages.
Example: If your pagination uses a query parameter like ?page=1
, you would enter:
Max Pages:
This field specifies the maximum number of pages to scrape if pagination is enabled. Setting this helps control the volume of data collected.
Example: If you want to scrape a maximum of 5 pages, you would enter:
Chunk Size:
This field specifies the number of tokens or characters in each chunk of data processed. The default value is set to 1024, but you can modify it as needed.
Example: If you expect a large volume of data, you might set the chunk size to 512 for easier processing.
Cost Information:
This section provides details about the cost associated with importing words from the specified web pages.
Example: If it states "Cost per words: 0.035 tokens" and "Remaining words: 1279900685 Words," this indicates how many tokens will be charged for each word processed and the total number of words remaining for processing.