The Profound Sheets Sitemap Importer is a user-initiated HTTP agent that fetches a website’s sitemap for its URLs to be imported into a Profound Sheet. This bot only runs when an authenticated Profound user clicks + Create new → Sitemap inside the Sheets section of the Profound platform and supplies a site URL.
Summary
| |
|---|
| User-Agent string | ProfoundSheetsSitemapImporter/1.0 |
| Trigger | User-initiated |
| Schedule | None |
| Resources fetched | sitemap.xml, sitemap-index variants, robots.txt, and any sitemap files those documents reference |
| Network origin | AWS, no fixed IP allow-list available |
What it does
When initiated with a website origin URL, the Sitemap Importer makes a small, bounded set of HTTP GET requests to locate and read a site’s sitemap. It tries the following URLs in sequence:
https://{origin}/sitemap.xml
https://{origin}/sitemap_index.xml
https://{origin}/sitemap-index.xml
https://{origin}/robots.txt (parsed only for Sitemap: directives, which are added to the candidate list)
- any child sitemaps referenced by a sitemap index
If you enter a URL ending in .xml, the importer also tries that URL directly.
The first response that parses as a valid XML sitemap is used. The URLs it contains are returned to you for import. The importer does not store, cache, redistribute, or republish sitemap content.
What it doesn’t do
- Fetch HTML pages, images, scripts, stylesheets, or any other assets
- Submit forms, follow login flows, or access authenticated areas
- Run JavaScript or use a headless browser
- Crawl recursively beyond sitemap-index to child-sitemap links
- Persist a long-running index of your site
Request behavior
Each import action triggers a small burst of requests scoped to a single origin.
| |
|---|
| Per request timeout | 10 seconds |
| Redirects | Followed manually, up to 5 hops |
| Concurrency | Sequential (not in parallel) |
| Caching | None. Each user invocation issues fresh requests |
User-Agent: ProfoundSheetsSitemapImporter/1.0
Accept: application/xml,text/xml,application/xhtml+xml,text/plain;q=0.9,*/*;q=0.8
Volume and frequency
Traffic from the Importer is driven entirely by customer activity. There’s no continuous crawl. A typical interaction involves a handful of requests, after which the Importer doesn’t contact an origin again until another import action targets the same site.
robots.txt handling
The importer reads robots.txt to discover sitemap URLs declared via Sitemap: directives. It doesn’t evaluate Disallow rules against sitemap files, because the sitemap protocol treats those files as publicly discoverable.
To prevent the importer from accessing your site, see How to block section.
How to identify the bot
Use the exact User-Agent header:
User-Agent: ProfoundSheetsSitemapImporter/1.0
All Profound bots use User-Agent strings that start with Profound, so a prefix match on Profound is a good way to identify all Profound traffic, current and future.
How to block
Block the importer using any of the following methods:
- A WAF or CDN rule matching User-Agent equal to
ProfoundSheetsSitemapImporter/1.0
- A WAF or CDN rule matching User-Agent containing the prefix
Profound
- Returning a 4xx response to the bot’s User-Agent on
/sitemap.xml, /sitemap_index.xml, /sitemap-index.xml, and /robots.txt
Reporting abuse
If you observe behavior that doesn’t match this documentation, such as requests for resources other than sitemap and robots.txt files, request rates that look like a crawl, or activity from an unrecognized Profound* User-Agent, report it to security@tryprofound.com with example log lines.
Profound treats these reports as security issues and responds to all of them.