Access patterns for automated consumers#

This page is the authoritative guidance for clients that consume a Vulnerability-Lookup instance programmatically — vulnerability scanners, mirror builders, research crawlers, agent frameworks, and so on. The same content is also exposed in machine-readable form at /.well-known/api-policy.json on every instance.

TL;DR#

Sync via the API + pub/sub stream. Use since= for catch-up, the stream for real-time updates, and targeted lookups for interactive use.
Do not enumerate the API to mirror the dataset.
Identify your client with a meaningful User-Agent that includes a contact URL or email address.
Bulk dumps, when an instance publishes them, are an optional open-data convenience, not a synchronization mechanism.

Canonical sync path: API + stream#

The API exposes the primitives needed to keep an external store in sync without re-downloading the world.

Incremental pulls with `since=`#

Endpoints under /api/vulnerability/, /api/gcve/, and the per-source listings accept a since=YYYY-MM-DD query parameter that returns only vulnerabilities reported on or after that date. A typical sync loop:

# First run — pick a reasonable starting date
$ curl 'https://example.org/api/vulnerability/?since=2024-01-01'

# Subsequent runs — pass the date of your last successful pull
$ curl 'https://example.org/api/vulnerability/?since=2026-05-04'

Pagination metadata ({"metadata": {"count": N, "page": N, "per_page": N}}) is documented in API v1.

Real-time updates via the stream#

The pub/sub stream pushes new and updated records as they land, without polling. The HTTP surface is a Server-Sent Events endpoint:

GET /pubsub/subscribe/{topic}
X-API-KEY: <your token>

The default topic list is vulnerability, comment, bundle, sighting, but operators choose which topics to expose via config/stream.json. The HTTP endpoint is only registered when pubsub_bp is set to true in that file — instances that do not run a public subscription endpoint will respond with a 404 here, and /.well-known/api-policy.json will report sync.stream_available: false.

See the streaming documentation for the wire format, channel semantics, and a Python client example.

Combine both: stream for the live feed, since= to catch up after an outage or for the initial backfill window.

Targeted lookups#

For interactive use cases, /api/vulnerability/<id> and the cross-source correlation endpoints are the right tool. Don’t loop them across an ID space to simulate a bulk export — that’s enumeration.

Bulk dumps: an optional open-data convenience#

Some operators publish NDJSON exports (the public CIRCL instance, for example, publishes them at https://vulnerability.circl.lu/dumps/). These are produced by bin/dump.py, which writes files to disk; serving them is an operational choice made by the operator, not a feature of Vulnerability-Lookup itself. Other instances may not publish dumps at all.

Dumps exist as an open-data convenience — for archival, ad-hoc analysis, dataset research, and similar one-shot uses. They are explicitly not a synchronisation mechanism, and they are not the intended way to bootstrap a Vulnerability-Lookup instance. New instances ingest data through the feeders, and external consumers stay in sync through the API. Polling a dump on a schedule is worse for the publisher than a well-behaved API client using since=, and yields stale data between runs — if you find yourself doing this, switch to the API.

Whether the current instance publishes dumps is advertised in the bulk_dumps block of /.well-known/api-policy.json.

Identification#

Identify yourself. A User-Agent like:

my-mirror/1.4 (+https://example.org/contact; ops@example.org)

…lets operators reach out before they have to start blocking. Default language SDK User-Agents (python-requests/..., Go-http-client/...) are treated as anonymous and may be rate-limited or blocked first when load becomes a problem.

Rate limits#

Read endpoints under /api/vulnerability can be rate-limited per instance. Operators set two independent values in config/website.py:

Setting	Applies when	Bucket key
`API_READ_RATE_LIMIT_ANON`	no `X-API-KEY` header	client IP
`API_READ_RATE_LIMIT_AUTH`	`X-API-KEY` header present	the API key

Both default to None — meaning no enforced limit — and that is the posture of the public CIRCL instance today. The /.well-known/api-policy.json document advertises the actual state via rate_limits.enforced and, when enabled, the configured limits and the keying scheme.

Bucketing authenticated callers by API key (rather than by IP) means a shared corporate egress IP isn’t punished for one client’s behaviour — each key gets its own budget.

When enforcement is on, rate-limited responses carry standard X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset headers, and 429 responses include Retry-After. Self-hosted instances can describe their posture in human-readable form via the RATE_LIMITS_POLICY setting.

Discovery surfaces#

Every instance exposes the same information through several surfaces, so clients can pick whichever fits:

Path	Audience	Format
`/.well-known/api-policy.json`	Machine clients	JSON, structured
`/llms.txt`	LLM agents	Markdown, concise
`/robots.txt`	Crawlers	Robots Exclusion + `Policy:` link
`/.well-known/security.txt`	Security researchers	RFC 9116
`/about`	Humans	HTML

Every API response also carries an X-API-Policy-Version header and a Link header pointing at api-policy.json.

Operator configuration#

The values surfaced by all of the above are configured per-instance in config/website.py:

Setting	Purpose
`API_POLICY_VERSION`	Bumped on breaking shape changes to `api-policy.json`.
`API_POLICY_EXPIRES`	Optional fixed expiry (ISO 8601). Defaults to a one-year rolling expiry.
`BULK_DUMPS_URL`	Set to a public dumps index URL, or leave `None` if this instance does not publish dumps.
`RATE_LIMITS_POLICY`	Free-form description of the rate-limit posture (shown verbatim in the policy).
`API_READ_RATE_LIMIT_ANON`	Limit string for unauthenticated `/api/*` reads (e.g. `"60 per minute"`); `None` to disable.
`API_READ_RATE_LIMIT_AUTH`	Limit string for authenticated `/api/*` reads, bucketed per API key; `None` to disable.
`SECURITY_POLICY_URL`	Responsible-disclosure policy referenced from `security.txt`.
`SECURITY_ENCRYPTION_URL`	OpenPGP key URL for the security contact.
`ROBOTS_DISALLOWED_AGENTS`	List of bot User-Agents to deny in `robots.txt`.

See config/website.py.sample for the full annotated block.

Access patterns for automated consumers

Contents