ProfoundBot

ProfoundBot is a user-initiated HTTP agent. When you supply a URL through the Profound platform or API, it fetches that public web page and returns its content as Markdown and/or HTML.

Summary


Operator	Profound (tryprofound.com)
User-Agent	`ProfoundBot/1.0 (+https://docs.tryprofound.com/bots)`
Trigger	User-initiated
Schedule	None
Resources fetched	The single URL the customer supplies
Network origin	AWS by default; enrolled domains are fetched from a fixed set of dedicated egress IPs (see Network origin and dedicated IPs below)

What it does

When you supply a public webpage URL to read the content from, ProfoundBot makes a bounded set of HTTP requests to that URL’s origin to fetch the page, then returns the content as Markdown, HTML or both. When a page is rendered with a headless browser, that page’s own subresources (scripts, stylesheets, images) load as they would in any browser. ProfoundBot doesn’t follow links to other pages.

What it doesn’t do

Crawl. Each customer action fetches exactly one URL. The bot does not recursively follow links or build an index of your site
Submit forms, follow login flows, or access authenticated areas
Persist a long-running index of your site. Fetched content is delivered to you and is not republished
Reach private or internal network targets

Request behavior

Each user action triggers a small, bounded interaction scoped to the single requested URL.


Redirects	Followed
Caching	None: each customer invocation issues fresh requests
Headers	`User-Agent: ProfoundBot/1.0 (+https://docs.tryprofound.com/bots)`

Network origin and dedicated IPs

By default, ProfoundBot egresses from a dynamic IP range. Identify it by its User-Agent. You can enroll a domain (a per-domain setting, off by default) so that ProfoundBot fetches it from a fixed set of dedicated egress IPs. This is useful if you prefer to allowlist Profound by IP rather than by User-Agent. When a domain is enrolled, requests egress from one of these IPs:

54.71.251.60
54.185.59.110
100.22.234.65

These dedicated IPs are also available in a machine-readable JSON format.

To enroll a domain for dedicated-IP fetching, go to your Profound account Settings → Web Scrape and toggle the Primary Domain Scraping via Static IPs setting to On.

`robots.txt` handling

ProfoundBot is user-initiated and fetches a single page per user action rather than crawling. Support for honoring Disallow directives in robots.txt is planned for a future release. To prevent ProfoundBot from accessing your site in the meantime, see How to block section.

How to identify the bot

Use the exact User-Agent header:

User-Agent: ProfoundBot/1.0 (+https://docs.tryprofound.com/bots)

All Profound bots use User-Agent strings that start with Profound, so a prefix match is a good way to identify all Profound traffic, current and future. For enrolled domains, you can also match on the dedicated egress IPs listed in the Network origin and dedicated IPs section.

How to block

Block ProfoundBot using any of the following methods:

A WAF or CDN rule matching User-Agent containing ProfoundBot
A WAF or CDN rule matching User-Agent containing the prefix Profound
For enrolled domains, a firewall or WAF rule blocking the dedicated egress IPs (54.71.251.60, 54.185.59.110, 100.22.234.65)

Reporting abuse

If you observe behavior that doesn’t match this documentation, such as recursive crawling, request rates inconsistent with user-initiated single-page fetches, or activity from an unrecognized Profound* User-Agent, report it to security@tryprofound.com with example log lines. Profound treats these reports as security issues and responds to all of them.

General

Agent Analytics

Bots

Single Sign-On (SSO)

Summary

What it does

What it doesn’t do

Request behavior

Network origin and dedicated IPs

`robots.txt` handling

How to identify the bot

How to block

Reporting abuse

​Summary

​What it does

​What it doesn’t do

​Request behavior

​Network origin and dedicated IPs

​robots.txt handling

​How to identify the bot

​How to block

​Reporting abuse

Summary

What it does

What it doesn’t do

Request behavior

Network origin and dedicated IPs

`robots.txt` handling

How to identify the bot

How to block

Reporting abuse