Access patterns for automated consumers#

This page is the authoritative guidance for clients that consume a Vulnerability-Lookup instance programmatically — vulnerability scanners, mirror builders, research crawlers, agent frameworks, and so on. The same content is also exposed in machine-readable form at /.well-known/api-policy.json on every instance.

TL;DR#

  • Sync via the API + pub/sub stream. Use since= for catch-up, the stream for real-time updates, and targeted lookups for interactive use.

  • Do not enumerate the API to mirror the dataset.

  • Identify your client with a meaningful User-Agent that includes a contact URL or email address.

  • Bulk dumps, when an instance publishes them, are an optional open-data convenience, not a synchronization mechanism.

Canonical sync path: API + stream#

The API exposes the primitives needed to keep an external store in sync without re-downloading the world.

Incremental pulls with since=#

Endpoints under /api/vulnerability/, /api/gcve/, and the per-source listings accept a since=YYYY-MM-DD query parameter that returns only vulnerabilities reported on or after that date. A typical sync loop:

# First run — pick a reasonable starting date
$ curl 'https://example.org/api/vulnerability/?since=2024-01-01'

# Subsequent runs — pass the date of your last successful pull
$ curl 'https://example.org/api/vulnerability/?since=2026-05-04'

Pagination metadata ({"metadata": {"count": N, "page": N, "per_page": N}}) is documented in API v1.

Real-time updates via the stream#

The pub/sub stream pushes new and updated records as they land, without polling. The HTTP surface is a Server-Sent Events endpoint:

GET /pubsub/subscribe/{topic}
X-API-KEY: <your token>

The default topic list is vulnerability, comment, bundle, sighting, but operators choose which topics to expose via config/stream.json. The HTTP endpoint is only registered when pubsub_bp is set to true in that file — instances that do not run a public subscription endpoint will respond with a 404 here, and /.well-known/api-policy.json will report sync.stream_available: false.

See the streaming documentation for the wire format, channel semantics, and a Python client example.

Combine both: stream for the live feed, since= to catch up after an outage or for the initial backfill window.

Targeted lookups#

For interactive use cases, /api/vulnerability/<id> and the cross-source correlation endpoints are the right tool. Don’t loop them across an ID space to simulate a bulk export — that’s enumeration.

Bulk dumps: an optional open-data convenience#

Some operators publish NDJSON exports (the public CIRCL instance, for example, publishes them at https://vulnerability.circl.lu/dumps/). These are produced by bin/dump.py, which writes files to disk; serving them is an operational choice made by the operator, not a feature of Vulnerability-Lookup itself. Other instances may not publish dumps at all.

Dumps exist as an open-data convenience — for archival, ad-hoc analysis, dataset research, and similar one-shot uses. They are explicitly not a synchronisation mechanism, and they are not the intended way to bootstrap a Vulnerability-Lookup instance. New instances ingest data through the feeders, and external consumers stay in sync through the API. Polling a dump on a schedule is worse for the publisher than a well-behaved API client using since=, and yields stale data between runs — if you find yourself doing this, switch to the API.

Whether the current instance publishes dumps is advertised in the bulk_dumps block of /.well-known/api-policy.json.

Identification#

Identify yourself. A User-Agent like:

my-mirror/1.4 (+https://example.org/contact; ops@example.org)

…lets operators reach out before they have to start blocking. Default language SDK User-Agents (python-requests/..., Go-http-client/...) are treated as anonymous and may be rate-limited or blocked first when load becomes a problem.

Rate limits#

Read endpoints under /api/vulnerability can be rate-limited per instance. Operators set two independent values in config/website.py:

Setting

Applies when

Bucket key

API_READ_RATE_LIMIT_ANON

no X-API-KEY header

client IP

API_READ_RATE_LIMIT_AUTH

X-API-KEY header present

the API key

Both default to None — meaning no enforced limit — and that is the posture of the public CIRCL instance today. The /.well-known/api-policy.json document advertises the actual state via rate_limits.enforced and, when enabled, the configured limits and the keying scheme.

Bucketing authenticated callers by API key (rather than by IP) means a shared corporate egress IP isn’t punished for one client’s behaviour — each key gets its own budget.

When enforcement is on, rate-limited responses carry standard X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset headers, and 429 responses include Retry-After. Self-hosted instances can describe their posture in human-readable form via the RATE_LIMITS_POLICY setting.

Discovery surfaces#

Every instance exposes the same information through several surfaces, so clients can pick whichever fits:

Path

Audience

Format

/.well-known/api-policy.json

Machine clients

JSON, structured

/llms.txt

LLM agents

Markdown, concise

/robots.txt

Crawlers

Robots Exclusion + Policy: link

/.well-known/security.txt

Security researchers

RFC 9116

/about

Humans

HTML

Every API response also carries an X-API-Policy-Version header and a Link header pointing at api-policy.json.

Operator configuration#

The values surfaced by all of the above are configured per-instance in config/website.py:

Setting

Purpose

API_POLICY_VERSION

Bumped on breaking shape changes to api-policy.json.

API_POLICY_EXPIRES

Optional fixed expiry (ISO 8601). Defaults to a one-year rolling expiry.

BULK_DUMPS_URL

Set to a public dumps index URL, or leave None if this instance does not publish dumps.

RATE_LIMITS_POLICY

Free-form description of the rate-limit posture (shown verbatim in the policy).

API_READ_RATE_LIMIT_ANON

Limit string for unauthenticated /api/* reads (e.g. "60 per minute"); None to disable.

API_READ_RATE_LIMIT_AUTH

Limit string for authenticated /api/* reads, bucketed per API key; None to disable.

SECURITY_POLICY_URL

Responsible-disclosure policy referenced from security.txt.

SECURITY_ENCRYPTION_URL

OpenPGP key URL for the security contact.

ROBOTS_DISALLOWED_AGENTS

List of bot User-Agents to deny in robots.txt.

See config/website.py.sample for the full annotated block.