Skip to main content
ProfoundBot is a user-initiated HTTP agent. When you supply a URL through the Profound platform or API, it fetches that public web page and returns its content as Markdown and/or HTML.

Summary

OperatorProfound (tryprofound.com)
User-AgentProfoundBot/1.0 (+https://docs.tryprofound.com/bots)
TriggerUser-initiated
ScheduleNone
Resources fetchedThe single URL the customer supplies
Network originAWS by default; enrolled domains are fetched from a fixed set of dedicated egress IPs (see Network origin and dedicated IPs below)

What it does

When you supply a public webpage URL to read the content from, ProfoundBot makes a bounded set of HTTP requests to that URL’s origin to fetch the page, then returns the content as Markdown, HTML or both. When a page is rendered with a headless browser, that page’s own subresources (scripts, stylesheets, images) load as they would in any browser. ProfoundBot doesn’t follow links to other pages.

What it doesn’t do

  • Crawl. Each customer action fetches exactly one URL. The bot does not recursively follow links or build an index of your site
  • Submit forms, follow login flows, or access authenticated areas
  • Persist a long-running index of your site. Fetched content is delivered to you and is not republished
  • Reach private or internal network targets

Request behavior

Each user action triggers a small, bounded interaction scoped to the single requested URL.
RedirectsFollowed
CachingNone: each customer invocation issues fresh requests
HeadersUser-Agent: ProfoundBot/1.0 (+https://docs.tryprofound.com/bots)

Network origin and dedicated IPs

By default, ProfoundBot egresses from a dynamic IP range. Identify it by its User-Agent. You can enroll a domain (a per-domain setting, off by default) so that ProfoundBot fetches it from a fixed set of dedicated egress IPs. This is useful if you prefer to allowlist Profound by IP rather than by User-Agent. When a domain is enrolled, requests egress from one of these IPs:
  • 54.71.251.60
  • 54.185.59.110
  • 100.22.234.65
To enroll a domain for dedicated-IP fetching, go to your Profound account Settings → Web Scrape and toggle the Primary Domain Scraping via Static IPs setting to On.

robots.txt handling

ProfoundBot is user-initiated and fetches a single page per user action rather than crawling. Support for honoring Disallow directives in robots.txt is planned for a future release. To prevent ProfoundBot from accessing your site in the meantime, see How to block section.

How to identify the bot

Use the exact User-Agent header:
User-Agent: ProfoundBot/1.0 (+https://docs.tryprofound.com/bots)
All Profound bots use User-Agent strings that start with Profound, so a prefix match is a good way to identify all Profound traffic, current and future. For enrolled domains, you can also match on the dedicated egress IPs listed in the Network origin and dedicated IPs section.

How to block

Block ProfoundBot using any of the following methods:
  • A WAF or CDN rule matching User-Agent containing ProfoundBot
  • A WAF or CDN rule matching User-Agent containing the prefix Profound
  • For enrolled domains, a firewall or WAF rule blocking the dedicated egress IPs (54.71.251.60, 54.185.59.110, 100.22.234.65)

Reporting abuse

If you observe behavior that doesn’t match this documentation, such as recursive crawling, request rates inconsistent with user-initiated single-page fetches, or activity from an unrecognized Profound* User-Agent, report it to security@tryprofound.com with example log lines. Profound treats these reports as security issues and responds to all of them.