# robots.txt for https://www.c.technischeunie.nl/ # Optimized: Allows search engines and LLMs, blocks sensitive/internal paths # -------------------------------------------------- # Known Search Engine Crawlers # -------------------------------------------------- User-agent: Googlebot User-agent: Bingbot User-agent: Applebot User-agent: DuckDuckBot User-agent: DotBot User-agent: AhrefsBot User-agent: TwitterBot User-agent: AdsBot-Google User-agent: AdsBot-Google-Mobile User-agent: bitlybot User-agent: Startpagina-Linkchecker User-agent: Googlebot-Image User-agent: Pinterestbot User-agent: Semrushbot User-agent: SiteAuditBot # -------------------------------------------------- # Known LLM Crawlers # -------------------------------------------------- User-agent: GPTBot # OpenAI / ChatGPT User-agent: ChatGPT-User # Bing browsing User-agent: ClaudeBot # Anthropic User-agent: PerplexityBot # Perplexity.ai User-agent: Google-Extended # Google's LLM use User-agent: CCBot # Common Crawl # -------------------------------------------------- # Disallowed Paths (Sensitive/Internal) # -------------------------------------------------- Disallow: /admin/ Disallow: /login/ Disallow: /private/ Disallow: /tmp/ Disallow: /cgi-bin/ # -------------------------------------------------- # Disallowed: Dynamic URLs with Parameters # -------------------------------------------------- Disallow: /*?* Disallow: /*&* Disallow: /*sort=* Disallow: /*filter=* # -------------------------------------------------- # Sitemap # -------------------------------------------------- Sitemap: https://www.c.technischeunie.nl/sitemap.xml # -------------------------------------------------- # Default Fallback for Other Bots # -------------------------------------------------- User-agent: * Disallow: /admin/ Disallow: /login/ Disallow: /private/ Disallow: /tmp/ Disallow: /cgi-bin/ Disallow: /*?* Disallow: /*&* Disallow: /*sort=* Disallow: /*filter=* Allow: /