cpe-guesser 2.0 released - Multi-Source CPE Imports, Better Ranking, and Greater Autonomy Beyond NVD

cpe-guesser 2.0 released

Overview

Version 2.0 brings major improvements to CPE import, ranking, and CVE v5 data handling. This release focuses on better import performance, broader format support, improved search relevance, and more robust indexing for vendor and product matching.

A notable change in this release is that cpe-guesser is no longer limited to NVD as its only practical CPE source. In addition to the NVD feeds, it can also leverage the Vulnerability-Lookup dump available at https://vulnerability.circl.lu/dumps/, providing additional CPE sources and more autonomy from the previously NVD-only source model.

Highlights

Improved search and ranking

Improved search ranking using CPE rank scores.
Enhanced server-side lookup ranking with rank:cpe scoring.
Reset of CVE v5 rank state before each import to ensure consistent ranking behavior.

CVE v5 import and indexing enhancements

Added CVE v5 NDJSON rank importer.
Added support for handling incomplete or multiline NDJSON records in CVEListV5Handler.
Introduced optional CVE v5 word indexing.
Added missing-word tracking for CVE v5 imports.
Split missing-word tracking into separate vendor and product sets for more precise analysis.

Faster and more flexible CPE imports

Parallelized NVD CPE imports for improved performance.
Refactored import logic into reusable handler classes.
Added NVDCPEHandler for importing the NVD CPE Dictionary 2.0 JSON format.
Extended import support for tar archives and standalone JSON files.
Continued support for legacy XML imports through XMLCPEHandler.
Added logging of JSON file names found inside tar archives.
Expanded the import model so cpe-guesser can integrate CPE data from additional sources, including Vulnerability-Lookup dumps, instead of relying solely on NVD feeds.

Configuration and deployment improvements

Improved configuration robustness by embedding default settings in code when configuration is missing or incomplete.
Made the Valkey database number configurable.
Fixed Docker deployment and docker-compose configuration to use Valkey correctly.
Corrected settings.yaml structure issues.
Added missing requirements and improved script executability in bin/.

Documentation and maintenance

Updated README documentation.
Added examples for the JSON format while keeping legacy format examples.
Applied Black formatting across library code and regression/import tests.
General linting and formatting cleanups.

Breaking / notable changes

The project now defaults to the CPE Dictionary 2.0 feed.
Import handling has been refactored significantly around dedicated handler classes.
CLI import behavior was simplified by removing the redundant --update flag and improving boolean toggle handling.
The project architecture is now better suited for multi-source CPE ingestion, reducing dependence on NVD as the single upstream source.

Contributors

Thanks to everyone who contributed to this release, including:

Alexandre Dulaunoy
Esa Jokinen
Surya Kanagasabapathy

Upgrade notes

When upgrading to 2.0, review:

Your import workflows, especially if you rely on legacy XML-only behavior.
Your configuration files, although defaults now make startup more robust.
Your Docker and Valkey setup if you deploy with containers.
Your data ingestion pipeline if you want to take advantage of alternative CPE sources such as the Vulnerability-Lookup dumps.

Summary

cpe-guesser 2.0 is a substantial release that modernizes the import pipeline, adds support for current NVD CPE data formats, improves ranking quality, and makes deployments more robust and scalable. It also opens the door to a more autonomous and flexible ingestion model by supporting additional CPE sources beyond NVD.