Scanner policy

How the Aidō Lighthouse scanner works, what it accesses, and how to allow or block it.

What the scanner is

Aidō Lighthouse scans e-commerce sites to measure how accessible they are to AI shopping agents — software that browses, compares, and buys on behalf of users. Scans run only when initiated by a site owner or authorised representative. We do not scan unsolicited.

The scanner looks at three things: whether an AI agent can find your products (Discoverability), whether it can understand what it finds (Understandability), and whether it can actually complete a purchase (Transactability). It reads pages. It does not submit forms, make purchases, or change anything on the site.

Bot identity

Every request carries a signed, stable identity. The machine-readable manifest is at /.well-known/agent-identity.json.

Agent-Id lighthouse.aido-labs.co
User-Agent AidoLighthouse/1.0 (+https://aido-lighthouse.com/scanner-policy; AI-readiness-scanner)
Agent-Purpose product-discovery / checkout-assessment / payment-signal-detection
Signature algorithm Ed25519 verified
Current key ID 2026-05

Request headers

Each request carries these headers. WAF operators can use them to identify and allow the scanner without relying on IP ranges alone:

Agent-Id: lighthouse.aido-labs.co Agent-Purpose: product-discovery Agent-Trace-Id: <uuid per scan> Agent-Policy: https://aido-lighthouse.com/scanner-policy Agent-Signature: keyid="lighthouse.aido-labs.co/keys/2026-05", alg="Ed25519", created="<unix timestamp>", nonce="<random>", sig="<base64url Ed25519 signature>"

Signature verification

The signed payload is METHOD\nURL\ncreated\nnonce. To verify: fetch the manifest, find the key with the matching kid, and check the signature against that payload using Ed25519. The public key is in the manifest.

Rate limits and behaviour

Request rate ≤ 1 request/second per domain
Pages per scan ≤ 50 pages per domain per scan
Concurrent scans 1 per domain
robots.txt Respected always
noindex / nofollow Respected always
Form submission Never
Purchases Never
Authenticated endpoints Only public pages

Infrastructure IP ranges

The scanner runs on Google Cloud Run in us-central1 (Iowa). Egress IPs come from Google's published Cloud ranges.

Google maintains the full IP list at https://www.gstatic.com/ipranges/cloud.json. For WAF allowlisting, filter to scope: us-central1. Use the Agent-Id header as a secondary signal — other GCP services share these ranges.

What data is collected

  • Page HTML, HTTP response headers, and structured data (JSON-LD, schema.org markup)
  • Protocol endpoint responses (robots.txt, llms.txt, /.well-known/* endpoints, openapi.json)
  • Payment method signals detected from public page content
  • Bot detection signals encountered during the scan (WAF provider, challenge type)

No credentials, session data, or personal information is collected. Scan results are kept for 90 days and visible only to the account holder who ran the scan.

Blocking or opting out

To block the scanner, add this to your robots.txt:

User-agent: AidoLighthouse Disallow: /

The scanner respects this without exception. To remove your site from the scan queue entirely, email info@aido-labs.co.

Contact

Abuse reports get a response within one business day.