# Siteline — Full LLM Context

> Siteline is a public scanner that evaluates whether a website presents a usable path for AI agents acting on behalf of humans. It scores four SNAP pillars: Signal, Navigate, Absorb, and Perform. This file contains everything an LLM needs to answer compliance and readiness questions without additional API calls.

For the directory version, see: https://siteline.to/llms.txt

---

## Product Overview

Siteline is a free diagnostic tool. A user enters a URL, and Siteline returns:
- A headline grade (A through F)
- A numeric score (0 to 100)
- Four pillar scores (Signal, Navigate, Absorb, Perform)
- A likely agent failure mode
- Prioritized findings with remediation guidance
- Confidence levels per check (high, medium, low)
- A remediation tier (self-fix, workshop, audit, or none)

Siteline does not generate implementation code, browser automation, or protocol layers. It is diagnostic-only.

Built by Snap Synapse: https://snapsynapse.com/

---

## SNAP Scoring Model

### Grade Bands

| Grade | Score Range | Label |
|-------|------------|-------|
| A | 90-100 | Agent-Usable |
| B | 78-89 | Mostly Usable |
| C | 64-77 | Needs Clearer Paths |
| D | 45-63 | Hard for Agents |
| F | 0-44 | Agent-Blocked or Unusable |

### Pillar Weights

The overall score is a weighted average of four pillar scores:

| Pillar | Internal Name | Weight |
|--------|--------------|--------|
| Signal | access | 0.30 |
| Absorb | readability | 0.25 |
| Navigate | navigability | 0.25 |
| Perform | actionHandoff | 0.20 |

### Status Values

Each check produces one of five statuses:

| Status | Score | Meaning |
|--------|-------|---------|
| pass | 100 | Fully meets the requirement |
| warn | 65 | Partially meets, room for improvement |
| fail | 20 | Does not meet the requirement |
| blocked | 0 | Cannot be evaluated (access denied) |
| not_applicable | null | Skipped (excluded from weighted average) |

---

## Checks and Criteria

### Pillar 1: Signal (Access) — Weight 0.30

#### Check 1: Server Reachability — Weight 0.60

Tests whether the site is reachable by a non-browser fetcher and common AI agent user-agents (ClaudeBot, GPTBot, CCBot, Amazonbot).

- **Pass:** Page returns successfully to a non-browser fetcher and at least one common agent UA.
- **Warn:** Page responds but some agent UAs are blocked, or response is inconsistent.
- **Fail:** Page is unreachable or times out.
- **Blocked:** Page returns 403 or bot-protection response to non-browser clients.

HTTPS is a baseline requirement. HTTP-only sites short-circuit to grade F with score 0.

#### Check 2: Public Machine Policy Clarity — Weight 0.40

Checks whether the site provides a clear public machine policy via robots.txt and other machine-policy files.

- **Pass:** robots.txt exists and does not create obvious conflicts for public access. Enhanced if .well-known/mcp.json or .well-known/security.txt are also present.
- **Warn:** robots.txt exists but is unclear or contradictory.
- **Fail:** No robots.txt or obviously broken.

Agentic enrichment: Presence of .well-known/mcp.json (MCP server discovery) and .well-known/security.txt (RFC 9116) strengthen the machine policy signal.

### Pillar 2: Absorb (Readability) — Weight 0.25

#### Check 3: Initial HTML Content — Weight 0.60

Checks whether the initial HTTP response contains meaningful human-readable content.

- **Pass:** Real page content present in initial HTML (500+ characters of meaningful text).
- **Warn:** Content exists but is thin (150-500 characters).
- **Fail:** Minimal shell or placeholder content only (under 150 characters).

#### Check 4: Semantic Structure — Weight 0.40

Evaluates HTML semantic elements: main tag, heading hierarchy, labeled links, labeled buttons, labeled forms.

- **Pass:** Clear semantic structure (4+ signals present). Enhanced if llms-full.txt or OpenAPI spec is also available.
- **Warn:** Some structure present but inconsistent (2-3 signals). Presence of llms-full.txt or OpenAPI can upgrade to pass.
- **Fail:** Semantic structure too weak (0-1 signals). Not rescued by agentic resources.

Agentic enrichment: Presence of llms-full.txt or an OpenAPI specification indicates the site provides structured machine-readable content beyond HTML.

### Pillar 3: Navigate (Navigability) — Weight 0.25

#### Check 5: Site Identity Signals — Weight 0.35

Checks for title tag, meta description, canonical URL, and structured data (JSON-LD).

- **Pass:** Title, description, canonical, and at least one structured data type present.
- **Warn:** Some identity signals present but incomplete.
- **Fail:** Very little machine-readable identity context.

#### Check 6: Public Task Routing — Weight 0.40

Checks for key public navigation paths: About, Services/Products, Pricing, FAQ, Contact/Booking.

- **Pass:** 4+ route types detected.
- **Warn:** 3-4 route types detected.
- **Fail:** 0-2 route types detected.

#### Check 7: Machine Discovery Paths — Weight 0.25

Evaluates machine-traversable discovery resources.

- **Pass:** 3+ discovery resources accessible (from: RSS/Atom feed, sitemap.xml, llms.txt, agents.json, api/v1/index.json).
- **Warn:** 1-2 discovery resources accessible.
- **Fail:** No machine-traversable discovery path available.

Agentic enrichment: llms.txt, agents.json, and an API manifest count as discovery paths alongside feeds and sitemaps.

### Pillar 4: Perform (Action Handoff) — Weight 0.20

#### Check 8a: Next-Step Clarity — Weight 0.55

Counts handoff signals: contact form, booking link, inquiry path, purchase path, docs/FAQ path.

- **Pass:** 3+ handoff signals present.
- **Fail:** 0-1 handoff signals.
- **Warn:** 2 handoff signals, or duplicate routes create ambiguity.

Agentic enrichment: MCP server declaration (from .well-known/mcp.json) counts as a handoff signal. A changelog endpoint also counts.

#### Check 8b: Form and CTA Interpretability — Weight 0.45

Evaluates whether forms have labels, CTAs use explicit text, and contact methods are legible.

- **Pass:** All three signals present.
- **Warn:** Two signals present, or CTAs use generic text.
- **Fail:** Forms or CTAs too ambiguous to interpret safely.

---

## Agentic Resources Checked

Siteline probes for the following machine-readable resources and factors them into scoring:

| Resource | Pillar Affected | Check Affected |
|----------|----------------|----------------|
| robots.txt | Signal | publicMachinePolicy |
| .well-known/mcp.json | Signal | publicMachinePolicy |
| .well-known/security.txt | Signal | publicMachinePolicy |
| RSS/Atom feed | Navigate | machineDiscoveryPaths |
| sitemap.xml | Navigate | machineDiscoveryPaths |
| llms.txt | Navigate | machineDiscoveryPaths |
| agents.json | Navigate | machineDiscoveryPaths |
| api/v1/index.json | Navigate | machineDiscoveryPaths |
| llms-full.txt | Absorb | semanticStructure |
| openapi.json or openapi.yaml | Absorb | semanticStructure |
| .well-known/mcp.json (server tools) | Perform | nextStepClarity |
| api/v1/changelog.json | Perform | nextStepClarity |

---

## Two-Layer Scoring Model (V2.1)

Siteline uses two independent layers to determine the final grade. Both constrain the result.

### Layer 1: SNAP Fundamentals (the floor)

The original 4-pillar rubric described above. Eight checks, weighted scoring. Determines the raw score (0-100) and maps to a grade (A/B/C/D/F). This layer answers: can agents passively use this site?

### Layer 2: Agentic Enablement (the ceiling)

Eleven machine-readable resources are probed and qualitatively assessed. Each resource scores:
- **0** — absent
- **1** — present but weak (e.g., empty sitemap, generic robots.txt)
- **2** — present and useful (e.g., structured llms.txt with headings and links, robots.txt with AI directives)

The total quality score (0-22) maps to an agentic level that caps the maximum achievable grade:

| Level | Quality Points | Grade Cap | Meaning |
|-------|---------------|-----------|---------|
| 0 | 0-2 | D (63) | Passively usable but not built for agents |
| 1 | 3-6 | D (63) | Minimal agentic signals |
| 2 | 7-11 | C (77) | Moderate enablement |
| 3 | 12-16 | B (89) | Strong enablement |
| 4 | 17-22 | A (100) | Comprehensive — the gold standard |

### Content Divergence Cap

Layer 2 also enforces a content-consistency check. The scanner probes whether the origin serves materially different content when the request signals agent intent (via `Accept: text/markdown`). It measures containment: what fraction of the HTML's core text words survive into the markdown response. If containment falls below 60%, the agentic level is capped at Level 1 (grade cap D/63), regardless of quality points.

Legitimate CDN conversions (e.g. Cloudflare's Markdown for Agents) strip navigation and boilerplate but preserve the page's substantive content — typically 70%+ containment. Content cloaking replaces the content entirely, producing containment well below 50%. A site that serves fundamentally different information to agents than to humans cannot be trusted as agent-enabled.

When divergence is detected, the scan result includes `agenticEnablement.contentDivergenceCap: 1`.

### How the layers interact

The final score is: `min(Layer 1 raw score, Layer 2 grade cap)`

- A site with perfect SNAP but no agentic resources → D (Layer 2 caps)
- A site with full agentic resources but broken HTML → whatever SNAP produces (Layer 1 caps)
- A site that's blocked (403) with full agentic resources → F (access-block cap, applied first)
- A site with content cloaking (divergent markdown) → D at most (capped to Level 1)

The scan result includes an `agenticEnablement` object with `level`, `totalQuality`, `maxQuality`, and per-resource `details`.

---

## Likely Failure Modes

The scanner identifies one of five failure modes based on the evaluation:

1. **Blocked before reading** — Access is denied at the server level.
2. **Can reach but cannot read well** — Site is accessible but content is too thin.
3. **Can read but cannot route confidently** — Content exists but navigation signals are weak.
4. **Can understand but lacks a clear handoff** — Structure is good but no clear next step for a human.
5. **Usable baseline present** — Site meets minimum requirements across all pillars.

---

## Remediation Tiers

Based on findings, Siteline classifies the recommended remediation approach:

| Tier | When | Cost |
|------|------|------|
| self-fix | Access and infrastructure problems the owner can resolve directly | Free |
| workshop | Content structure and taxonomy problems need facilitated intervention | $1,500-$3,000 |
| audit | Protocol gaps, agent-facing issues need multi-page assessment | $297-$497 |
| none | Usable baseline present, no intervention needed | N/A |

---

## API Endpoints

### Scan: GET /api/scan?url={url}

Runs a scan and returns a structured evaluation. One scan per domain per calendar day. Repeat requests return the cached result.

Rate limit: 10 per IP per hour.

Query parameters:
- url (required): The URL to scan
- debug=1 (optional): Include fetch timing and internal diagnostics

Returns: Full scan result object (see schema below).

### Result Lookup: GET /api/result?id={resultId or domain}

Retrieves a stored scan result by result ID (e.g., example-com-20260321) or by domain (e.g., example.com). Returns the most recent result for domain lookups.

Rate limit: 60 per IP per minute.

### OG Image: GET /api/og?domain={domain}&grade={grade}&score={score}&label={label}

Generates a 1200x630 branded grade card image. Also accepts ?resultId={id} for automatic lookup.

Rate limit: 30 per IP per minute.

### Rate Limits: GET /api/limits

Returns all rate limits, SSRF protection policy, response headers, and endpoint links as structured JSON. Call this once to understand the entire API surface.

No rate limit on this endpoint.

### Email Capture: POST /api/email-capture

Email-gated PDF export of scan results.

Rate limit: 12 per IP per hour, 1 PDF per 60 seconds.

### Result Page: GET /results/{domain}

Server-rendered HTML with full OG meta tags for social sharing. Also accepts /results/{domain-slug-YYYYMMDD}.

---

## Scan Result Schema

Every scan result includes these top-level fields:

- ok: true (always true for successful scans)
- normalizedURL: The canonical URL that was scanned
- domain: The hostname
- scannedAt: ISO 8601 timestamp
- resultId: Stable identifier (format: domain-slug-YYYYMMDD)
- grade: A, B, C, D, or F
- score: 0-100
- label: Human-readable grade label
- likelyFailureMode: One of five failure modes
- pillars: Object with access, navigability, readability, actionHandoff scores (0-100 each)
- provenance: Rubric version, scanner version, pages fetched, checks run, agent UAs tested
- page: Status code, timeout, bot-blocked status
- robots: URL, status code, accessibility
- informationalResources: Detection status for llms.txt, agents.json, and agentic resources
- agenticEnablement: Level (0-4), totalQuality, maxQuality, per-resource details, and contentDivergenceCap (present only when cloaking detected)
- evaluation: Per-check status and reason for every check in every pillar
- summary: Headline and detail text
- findings: Array of findings with id, status, reason, title, impact, remediation, group, confidence, evidence
- findingGroups: Findings organized into blockers, coreIssues, opportunities
- informational.contentDivergence: Whether the origin serves different content via Accept: text/markdown (tested, divergent, similarity/containment score, note)
- remediationTier: self-fix, workshop, audit, or none

Full JSON Schema: https://siteline.to/schemas/scan-result.schema.json

---

## MCP Server

Siteline provides a Model Context Protocol server with stdio transport.

Install: npx siteline mcp

Tools:
- scan_url: Run a scan on a URL and return the full result
- self_scan: Scan Siteline itself (returns A/100)
- describe_rubric: Return the full SNAP scoring rubric
- explain_score: Explain how a specific score was calculated

MCP discovery: https://siteline.to/.well-known/mcp.json

---

## CLI

Usage: node bin/siteline.js scan {url} [--json] [--debug]

Options:
- --json: Output raw JSON instead of formatted text
- --debug: Include fetch timing and diagnostics

Batch scanning: node scripts/batch-scan.js {file or urls} [--json] [--csv] [--label {name}]

---

## Rate Limits and Security

Siteline conforms to the Graceful Boundaries specification (Level 4): https://github.com/snapsynapse/graceful-boundaries

- SSRF protection: Private IPs, loopback, cloud metadata, internal hostnames, non-standard ports are all blocked.
- One scan per domain per calendar day (global, not per-user).
- All rate limits documented at /api/limits (proactive discovery).
- All 429 responses include structured retry guidance with retryAfterSeconds and why.
- All successful responses include proactive RateLimit headers.
- Content consistency: The scanner probes for content divergence by requesting each page with `Accept: text/markdown` and measuring containment — the fraction of the HTML's core text that survives into the markdown response. Legitimate CDN conversions (e.g. Cloudflare Markdown for Agents) score 70%+ containment. Below 60%, the site is flagged for header-based content cloaking and its agentic enablement level is capped at Level 1 (grade cap D). Agent-specific guidance should use transparent, auditable mechanisms (semantic markup, dedicated discovery files like llms.txt) rather than header-based content switching.

---

## Discovery Files

| File | URL | Purpose |
|------|-----|---------|
| robots.txt | /robots.txt | AI crawler directives |
| sitemap.xml | /sitemap.xml | Page discovery |
| llms.txt | /llms.txt | LLM briefing (directory) |
| llms-full.txt | /llms-full.txt | LLM briefing (full inline) |
| agents.json | /agents.json | Agent capability manifest |
| .well-known/agents.json | /.well-known/agents.json | Agent capability manifest (well-known) |
| .well-known/mcp.json | /.well-known/mcp.json | MCP server discovery |
| .well-known/security.txt | /.well-known/security.txt | Security policy (RFC 9116) |
| openapi.yaml | /openapi.yaml | OpenAPI 3.1 specification |
| api/v1/openapi.json | /api/v1/openapi.json | OpenAPI 3.1 specification (JSON) |
| api/v1/index.json | /api/v1/index.json | API endpoint manifest |
| api/v1/changelog.json | /api/v1/changelog.json | Version history |

---

## Organization

- Product: Siteline
- Built by: Snap Synapse (https://snapsynapse.com/)
- Production URL: https://siteline.to/
- GitHub: https://github.com/snapsynapse/siteline
- LinkedIn: https://www.linkedin.com/company/snap-synapse
- Paid audit: https://snapsynapse.com/services/siteline#full-audit
- Workshop: https://snapsynapse.com/services/siteline#ontology-workshop