Crawler Verification Process
Understanding who’s accessing your website is crucial for security and analytics accuracy. Profound Agent Analytics platform employs robust verification methods to ensure that crawlers claiming to be from major AI and search platforms are genuine.Why Verification Matters
Accurate crawler identification is essential for:- Protecting your website from malicious actors
- Ensuring data accuracy in your analytics
- Managing resource allocation effectively
- Maintaining security compliance
- Optimizing content delivery for legitimate AI platforms
Verification Methods
Primary Verification Techniques
Our platform employs multiple verification methods to ensure accuracy:-
Reverse DNS Lookup
- Verifies the crawler’s hostname matches the claimed organization
- Provides an additional layer of authenticity checking
- Used by established platforms like Google and Apple
-
IP Range Validation
- Confirms the crawler originates from the organization’s known IP ranges
- Particularly effective for platforms like OpenAI and You.com
- Updated regularly to maintain accuracy
-
Heuristic Detection
- Analyzes crawler behavior patterns
- Identifies characteristic signatures
- Helps verify crawlers without published verification methods
Platform-Specific Verification
Fully Verified Platforms
-
Google
- Googlebot: Reverse DNS verification
- Storebot-Google: Reverse DNS verification
- Google-Extended: Reverse DNS verification
-
Microsoft Bing
- BingSearch: Reverse DNS verification
-
Apple
- Applebot: Reverse DNS verification
- Applebot-Extended: Reverse DNS verification
-
OpenAI
- OAI-SearchBot: IP range verification
- ChatGPT-User: IP range verification
- GPTBot: IP range verification
-
You.com
- YouBot: IP range verification
Partially Verified Platforms
Some platforms use common cloud provider IPs or don’t publish verification methods, making complete verification challenging:-
Anthropic
- ClaudeBot: Primarily Amazon AWS IPs
- Verification method: Heuristic detection
-
Bytedance
- Bytespider: Mixed cloud provider IPs
- Verification method: Behavioral analysis
-
Perplexity
- PerplexityBot: No published verification method
- Verification method: Pattern matching and heuristics
Stay Updated
Our platform continuously updates verification methods as:- New AI platforms emerge
- Existing platforms modify their crawler infrastructure
- Additional verification methods become available
- Security requirements evolve