# Access patterns for automated consumers

This page is the authoritative guidance for clients that consume a
Vulnerability-Lookup instance programmatically — vulnerability scanners,
mirror builders, research crawlers, agent frameworks, and so on. The same
content is also exposed in machine-readable form at
`/.well-known/api-policy.json` on every instance.

## TL;DR

- **Sync via the API + pub/sub stream.** Use `since=` for catch-up, the
  stream for real-time updates, and targeted lookups for interactive use.
- **Do not enumerate** the API to mirror the dataset.
- **Identify your client** with a meaningful `User-Agent` that includes a
  contact URL or email address.
- **Bulk dumps**, when an instance publishes them, are an *optional* open-data
  convenience, not a synchronization mechanism.

## Canonical sync path: API + stream

The API exposes the primitives needed to keep an external store in sync
without re-downloading the world.

### Incremental pulls with `since=`

Endpoints under `/api/vulnerability/`, `/api/gcve/`, and the per-source
listings accept a `since=YYYY-MM-DD` query parameter that returns only
vulnerabilities reported on or after that date. A typical sync loop:

```bash
# First run — pick a reasonable starting date
$ curl 'https://example.org/api/vulnerability/?since=2024-01-01'

# Subsequent runs — pass the date of your last successful pull
$ curl 'https://example.org/api/vulnerability/?since=2026-05-04'
```

Pagination metadata (`{"metadata": {"count": N, "page": N, "per_page": N}}`)
is documented in [API v1](api-v1.md).

### Real-time updates via the stream

The pub/sub stream pushes new and updated records as they land, without
polling. The HTTP surface is a Server-Sent Events endpoint:

```
GET /pubsub/subscribe/{topic}
X-API-KEY: <your token>
```

The default topic list is `vulnerability`, `comment`, `bundle`,
`sighting`, but operators choose which topics to expose via
`config/stream.json`. The HTTP endpoint is only registered when
`pubsub_bp` is set to `true` in that file — instances that do not run a
public subscription endpoint will respond with a 404 here, and
`/.well-known/api-policy.json` will report `sync.stream_available: false`.

See the [streaming documentation](streaming.md) for the wire format,
channel semantics, and a Python client example.

Combine both: stream for the live feed, `since=` to catch up after an
outage or for the initial backfill window.

### Targeted lookups

For interactive use cases, `/api/vulnerability/<id>` and the cross-source
correlation endpoints are the right tool. Don't loop them across an ID
space to simulate a bulk export — that's enumeration.

## Bulk dumps: an optional open-data convenience

Some operators publish NDJSON exports (the public CIRCL instance, for
example, publishes them at <https://vulnerability.circl.lu/dumps/>). These
are produced by `bin/dump.py`, which writes files to disk; serving them is
an operational choice made by the operator, **not a feature of
Vulnerability-Lookup itself**. Other instances may not publish dumps at all.

Dumps exist as an **open-data convenience** — for archival, ad-hoc
analysis, dataset research, and similar one-shot uses. They are
explicitly **not** a synchronisation mechanism, and they are **not** the
intended way to bootstrap a Vulnerability-Lookup instance. New instances
ingest data through the [feeders](architecture.md), and external
consumers stay in sync through the API. Polling a dump on a schedule is
worse for the publisher than a well-behaved API client using `since=`, and
yields stale data between runs — if you find yourself doing this, switch
to the API.

Whether the current instance publishes dumps is advertised in the
`bulk_dumps` block of `/.well-known/api-policy.json`.

## Identification

Identify yourself. A `User-Agent` like:

```
my-mirror/1.4 (+https://example.org/contact; ops@example.org)
```

…lets operators reach out before they have to start blocking. Default
language SDK User-Agents (`python-requests/...`, `Go-http-client/...`) are
treated as anonymous and may be rate-limited or blocked first when load
becomes a problem.

## Rate limits

Read endpoints under `/api/vulnerability` can be rate-limited per
instance. Operators set two independent values in `config/website.py`:

| Setting | Applies when | Bucket key |
| --- | --- | --- |
| `API_READ_RATE_LIMIT_ANON` | no `X-API-KEY` header | client IP |
| `API_READ_RATE_LIMIT_AUTH` | `X-API-KEY` header present | the API key |

Both default to `None` — meaning **no enforced limit** — and that is the
posture of the public CIRCL instance today. The
`/.well-known/api-policy.json` document advertises the actual state via
`rate_limits.enforced` and, when enabled, the configured limits and the
keying scheme.

Bucketing authenticated callers by API key (rather than by IP) means a
shared corporate egress IP isn't punished for one client's behaviour —
each key gets its own budget.

When enforcement is on, rate-limited responses carry standard
`X-RateLimit-Limit`, `X-RateLimit-Remaining` and `X-RateLimit-Reset`
headers, and 429 responses include `Retry-After`. Self-hosted instances
can describe their posture in human-readable form via the
`RATE_LIMITS_POLICY` setting.

## Discovery surfaces

Every instance exposes the same information through several surfaces, so
clients can pick whichever fits:

| Path | Audience | Format |
| --- | --- | --- |
| `/.well-known/api-policy.json` | Machine clients | JSON, structured |
| `/llms.txt` | LLM agents | Markdown, concise |
| `/robots.txt` | Crawlers | Robots Exclusion + `Policy:` link |
| `/.well-known/security.txt` | Security researchers | RFC 9116 |
| `/about` | Humans | HTML |

Every API response also carries an `X-API-Policy-Version` header and a
`Link` header pointing at `api-policy.json`.

## Operator configuration

The values surfaced by all of the above are configured per-instance in
`config/website.py`:

| Setting | Purpose |
| --- | --- |
| `API_POLICY_VERSION` | Bumped on breaking shape changes to `api-policy.json`. |
| `API_POLICY_EXPIRES` | Optional fixed expiry (ISO 8601). Defaults to a one-year rolling expiry. |
| `BULK_DUMPS_URL` | Set to a public dumps index URL, or leave `None` if this instance does not publish dumps. |
| `RATE_LIMITS_POLICY` | Free-form description of the rate-limit posture (shown verbatim in the policy). |
| `API_READ_RATE_LIMIT_ANON` | Limit string for unauthenticated `/api/*` reads (e.g. `"60 per minute"`); `None` to disable. |
| `API_READ_RATE_LIMIT_AUTH` | Limit string for authenticated `/api/*` reads, bucketed per API key; `None` to disable. |
| `SECURITY_POLICY_URL` | Responsible-disclosure policy referenced from `security.txt`. |
| `SECURITY_ENCRYPTION_URL` | OpenPGP key URL for the security contact. |
| `ROBOTS_DISALLOWED_AGENTS` | List of bot User-Agents to deny in `robots.txt`. |

See `config/website.py.sample` for the full annotated block.
