# Product Registry — LLMs & Crawler Guidance # Location: /llms.txt # Version: 1.0 # Last-Updated: 2025-09-16 # Base-URL: https://productregistry.org/ # --- Identification & Contact --- Agent-Policy: welcome Contact: support@productregistry.org Documentation: https://productregistry.org/ # human overview Robots-Txt: https://productregistry.org/robots.txt # must also be respected # --- Discovery --- Sitemap-Index: https://productregistry.org/sitemap.xml Sitemap: https://productregistry.org/sitemaps/products.xml Sitemap: https://productregistry.org/sitemaps/merchants.xml WellKnown: https://productregistry.org/.well-known/ # --- Canonical URL Patterns --- # Public, long-lived pages with stable links: Pattern: /product/{product_id} ; Type=ProductPage ; Canonical=true Pattern: /merchant/{merchant_domain} ; Type=MerchantListing ; Canonical=true Pattern: / ; Type=Index # --- Access Rules --- Allow: /$ Allow: /product/ Allow: /merchant/ Allow: /sitemap.xml Allow: /sitemaps/ Allow: /.well-known/ Disallow: /api/ # --- Rate Limits & Concurrency --- # Follow stricter of robots.txt and these hints: Max-Requests-Per-Minute: 60 Max-Parallel-Requests: 4 Retry-After-429: 120 # seconds; exponential backoff advised Prefetch: disallowed # do not prefetch linked pages aggressively # --- Freshness & Caching --- Honor-Headers: ETag, Last-Modified, Cache-Control Default-Cache-TTL: 86400 # seconds (24h) if headers missing Change-Frequency: /product/*: daily /merchant/*: weekly /: monthly # --- Content Semantics --- Identifiers: GTIN: supported UPC: supported EAN: supported SKU: supported ModelNumber: supported Product-Identity: Key: product_id # stable, opaque registry ID Canonical-Param: product_id Merchant-Identity: Key: merchant_domain # FQDN, case-insensitive # --- Structured Data --- # Pages embed JSON-LD via # 3) Iterate merchants with keyset pagination: # GET https://productregistry.org/merchant # If "Link: <.../merchant?after={cursor}>; rel=next" present, continue. # 4) De-duplicate: # Normalize query by removing UTM parameters listed above. # --- Future Expansion (reserved keys) --- Reserved: Feed-JSON: https://productregistry.org/feeds/products.json GraphQL: https://productregistry.org/api/graphql Bulk-Export: https://productregistry.org/exports/products.ndjson