Access patterns for automated consumers#
This page is the authoritative guidance for clients that consume a
Vulnerability-Lookup instance programmatically — vulnerability scanners,
mirror builders, research crawlers, agent frameworks, and so on. The same
content is also exposed in machine-readable form at
/.well-known/api-policy.json on every instance.
TL;DR#
Sync via the API + pub/sub stream. Use
since=for catch-up, the stream for real-time updates, and targeted lookups for interactive use.Do not enumerate the API to mirror the dataset.
Identify your client with a meaningful
User-Agentthat includes a contact URL or email address.Bulk dumps, when an instance publishes them, are an optional open-data convenience, not a synchronization mechanism.
Canonical sync path: API + stream#
The API exposes the primitives needed to keep an external store in sync without re-downloading the world.
Incremental pulls with since=#
Endpoints under /api/vulnerability/, /api/gcve/, and the per-source
listings accept a since=YYYY-MM-DD query parameter that returns only
vulnerabilities reported on or after that date. A typical sync loop:
# First run — pick a reasonable starting date
$ curl 'https://example.org/api/vulnerability/?since=2024-01-01'
# Subsequent runs — pass the date of your last successful pull
$ curl 'https://example.org/api/vulnerability/?since=2026-05-04'
Pagination metadata ({"metadata": {"count": N, "page": N, "per_page": N}})
is documented in API v1.
Real-time updates via the stream#
The pub/sub stream pushes new and updated records as they land, without polling. The HTTP surface is a Server-Sent Events endpoint:
GET /pubsub/subscribe/{topic}
X-API-KEY: <your token>
The default topic list is vulnerability, comment, bundle,
sighting, but operators choose which topics to expose via
config/stream.json. The HTTP endpoint is only registered when
pubsub_bp is set to true in that file — instances that do not run a
public subscription endpoint will respond with a 404 here, and
/.well-known/api-policy.json will report sync.stream_available: false.
See the streaming documentation for the wire format, channel semantics, and a Python client example.
Combine both: stream for the live feed, since= to catch up after an
outage or for the initial backfill window.
Targeted lookups#
For interactive use cases, /api/vulnerability/<id> and the cross-source
correlation endpoints are the right tool. Don’t loop them across an ID
space to simulate a bulk export — that’s enumeration.
Bulk dumps: an optional open-data convenience#
Some operators publish NDJSON exports (the public CIRCL instance, for
example, publishes them at https://vulnerability.circl.lu/dumps/). These
are produced by bin/dump.py, which writes files to disk; serving them is
an operational choice made by the operator, not a feature of
Vulnerability-Lookup itself. Other instances may not publish dumps at all.
Dumps exist as an open-data convenience — for archival, ad-hoc
analysis, dataset research, and similar one-shot uses. They are
explicitly not a synchronisation mechanism, and they are not the
intended way to bootstrap a Vulnerability-Lookup instance. New instances
ingest data through the feeders, and external
consumers stay in sync through the API. Polling a dump on a schedule is
worse for the publisher than a well-behaved API client using since=, and
yields stale data between runs — if you find yourself doing this, switch
to the API.
Whether the current instance publishes dumps is advertised in the
bulk_dumps block of /.well-known/api-policy.json.
Identification#
Identify yourself. A User-Agent like:
my-mirror/1.4 (+https://example.org/contact; ops@example.org)
…lets operators reach out before they have to start blocking. Default
language SDK User-Agents (python-requests/..., Go-http-client/...) are
treated as anonymous and may be rate-limited or blocked first when load
becomes a problem.
Rate limits#
Read endpoints under /api/vulnerability can be rate-limited per
instance. Operators set two independent values in config/website.py:
Setting |
Applies when |
Bucket key |
|---|---|---|
|
no |
client IP |
|
|
the API key |
Both default to None — meaning no enforced limit — and that is the
posture of the public CIRCL instance today. The
/.well-known/api-policy.json document advertises the actual state via
rate_limits.enforced and, when enabled, the configured limits and the
keying scheme.
Bucketing authenticated callers by API key (rather than by IP) means a shared corporate egress IP isn’t punished for one client’s behaviour — each key gets its own budget.
When enforcement is on, rate-limited responses carry standard
X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset
headers, and 429 responses include Retry-After. Self-hosted instances
can describe their posture in human-readable form via the
RATE_LIMITS_POLICY setting.
Discovery surfaces#
Every instance exposes the same information through several surfaces, so clients can pick whichever fits:
Path |
Audience |
Format |
|---|---|---|
|
Machine clients |
JSON, structured |
|
LLM agents |
Markdown, concise |
|
Crawlers |
Robots Exclusion + |
|
Security researchers |
RFC 9116 |
|
Humans |
HTML |
Every API response also carries an X-API-Policy-Version header and a
Link header pointing at api-policy.json.
Operator configuration#
The values surfaced by all of the above are configured per-instance in
config/website.py:
Setting |
Purpose |
|---|---|
|
Bumped on breaking shape changes to |
|
Optional fixed expiry (ISO 8601). Defaults to a one-year rolling expiry. |
|
Set to a public dumps index URL, or leave |
|
Free-form description of the rate-limit posture (shown verbatim in the policy). |
|
Limit string for unauthenticated |
|
Limit string for authenticated |
|
Responsible-disclosure policy referenced from |
|
OpenPGP key URL for the security contact. |
|
List of bot User-Agents to deny in |
See config/website.py.sample for the full annotated block.