New Russian Severity Classifier and Improved Multilingual Models

We are pleased to announce a new Russian-language severity classifier for vulnerability descriptions, alongside improved English and Chinese models. These models are trained with VulnTrain and served through ML-Gateway for integration into Vulnerability-Lookup.

All datasets and models are openly available on Hugging Face.

VulnTrain 3.1.0

This release is powered by VulnTrain v3.1.0, which introduces:

FSTEC source support: vulnerability entries from the Russian Federal Service for Technical and Export Control (BDU) can now be used for dataset generation and model training.
Source field in datasets: each vulnerability entry now includes a source field identifying its origin (cvelistv5, github, pysec, cnvd, csaf_*, fstec), making it easier to trace and filter data.
Dynamic dataset cards: when generating a dataset from multiple sources, a dataset card is automatically created with a per-source breakdown table showing entry counts and percentages.
Per-class metrics: the severity trainer now reports precision, recall, and F1 per class (Low / Medium / High / Critical) alongside overall accuracy and macro F1.
Best model checkpoint selection: models are now selected by accuracy instead of eval_loss, with save_total_limit increased from 2 to 3.

Russian Severity Classifier 🇷🇺

This is our new model for classifying vulnerability severity in Russian, trained on data from the Russian Federal Service for Technical and Export Control (BDU).


Dataset	CIRCL/Vulnerability-FSTEC
Model	CIRCL/vulnerability-severity-classification-russian-ruRoberta-large
Base model	ai-forever/ruRoberta-large

Improved English Severity Classifier 🇬🇧

The English model is trained on a broad set of sources for better coverage and accuracy.

Sources:

CVE Program (enriched with vulnrichment and Fraunhofer FKIE)
GitHub Security Advisories
PySec advisories
CSAF Cisco
CSAF CISA


Dataset	CIRCL/vulnerability-scores
Model	CIRCL/vulnerability-severity-classification-roberta-base
Base model	FacebookAI/roberta-base

Improved Chinese Severity Classifier 🇨🇳

The Chinese model is trained on data from the China National Vulnerability Database (CNVD).


Dataset	CIRCL/Vulnerability-CNVD
Model	CIRCL/vulnerability-severity-classification-chinese-macbert-base
Base model	hfl/chinese-macbert-base

ML-Gateway 0.5.0

ML-Gateway is the FastAPI-based inference server that loads pre-trained models at startup and exposes them through a RESTful API. It supports multilingual severity classification out of the box: clients simply specify the desired model in their request.

ML-Gateway v0.5.0 adds support for the new Russian severity classification model (CIRCL/vulnerability-severity-classification-russian-ruRoberta-large):

Registered the Russian ruRoBERTa-large model in the model registry with standard CVSS severity labels (Low, Medium, High, Critical).
Added the model to the CLI refresh-all command for pre-downloading.

Vulnerability-Lookup uses ML-Gateway to provide AI-powered severity predictions directly in its web interface.

Funding

AIPITCH (AI-Powered Innovative Toolkit for Cybersecurity Hubs) is a co-funded EU project supported by the European Cybersecurity Competence Centre (ECCC) under the DIGITAL-ECCC-2024-DEPLOY-CYBER-06-ENABLINGTECH program and CIRCL.