Crawler Verification Process

Understanding who’s accessing your website is crucial for security and analytics accuracy. Profound Agent Analytics platform employs robust verification methods to ensure that crawlers claiming to be from major AI and search platforms are genuine.

Why Verification Matters

Accurate crawler identification is essential for:
  • Protecting your website from malicious actors
  • Ensuring data accuracy in your analytics
  • Managing resource allocation effectively
  • Maintaining security compliance
  • Optimizing content delivery for legitimate AI platforms

Verification Methods

Primary Verification Techniques

Our platform employs multiple verification methods to ensure accuracy:
  1. Reverse DNS Lookup
    • Verifies the crawler’s hostname matches the claimed organization
    • Provides an additional layer of authenticity checking
    • Used by established platforms like Google and Apple
  2. IP Range Validation
    • Confirms the crawler originates from the organization’s known IP ranges
    • Particularly effective for platforms like OpenAI and You.com
    • Updated regularly to maintain accuracy
  3. Heuristic Detection
    • Analyzes crawler behavior patterns
    • Identifies characteristic signatures
    • Helps verify crawlers without published verification methods

Platform-Specific Verification

Fully Verified Platforms

  • Google
    • Googlebot: Reverse DNS verification
    • Storebot-Google: Reverse DNS verification
    • Google-Extended: Reverse DNS verification
  • Microsoft Bing
    • BingSearch: Reverse DNS verification
  • Apple
    • Applebot: Reverse DNS verification
    • Applebot-Extended: Reverse DNS verification
  • OpenAI
    • OAI-SearchBot: IP range verification
    • ChatGPT-User: IP range verification
    • GPTBot: IP range verification
  • You.com
    • YouBot: IP range verification

Partially Verified Platforms

Some platforms use common cloud provider IPs or don’t publish verification methods, making complete verification challenging:
  • Anthropic
    • ClaudeBot: Primarily Amazon AWS IPs
    • Verification method: Heuristic detection
  • Bytedance
    • Bytespider: Mixed cloud provider IPs
    • Verification method: Behavioral analysis
  • Perplexity
    • PerplexityBot: No published verification method
    • Verification method: Pattern matching and heuristics

Stay Updated

Our platform continuously updates verification methods as:
  • New AI platforms emerge
  • Existing platforms modify their crawler infrastructure
  • Additional verification methods become available
  • Security requirements evolve

Important Note About Data Updates

We continuously monitor and improve our verification processes as the AI crawler landscape evolves. As we enhance our detection methods and crawler identification techniques, you may notice changes in your historical and current analytics data. These updates reflect our commitment to providing the most accurate and reliable crawler identification possible. If you observe any significant changes in your data, it’s likely due to improvements in our verification system. We recommend regularly reviewing your analytics dashboard to stay informed about the latest insights into AI crawler behavior on your site.