New Russian Severity Classifier and Improved Multilingual Models
We are pleased to announce a new Russian-language severity classifier for vulnerability descriptions, alongside improved English and Chinese models. These models are trained with VulnTrain and served through ML-Gateway for integration into Vulnerability-Lookup.
All datasets and models are openly available on Hugging Face.
VulnTrain 3.1.0
This release is powered by VulnTrain v3.1.0, which introduces:
- FSTEC source support: vulnerability entries from the Russian Federal Service for Technical and Export Control (BDU) can now be used for dataset generation and model training.
- Source field in datasets: each vulnerability entry now includes a
sourcefield identifying its origin (cvelistv5, github, pysec, cnvd, csaf_*, fstec), making it easier to trace and filter data. - Dynamic dataset cards: when generating a dataset from multiple sources, a dataset card is automatically created with a per-source breakdown table showing entry counts and percentages.
- Per-class metrics: the severity trainer now reports precision, recall, and F1 per class (Low / Medium / High / Critical) alongside overall accuracy and macro F1.
- Best model checkpoint selection: models are now selected by accuracy instead of eval_loss, with
save_total_limitincreased from 2 to 3.
Russian Severity Classifier 🇷🇺
This is our new model for classifying vulnerability severity in Russian, trained on data from the Russian Federal Service for Technical and Export Control (BDU).
| Dataset | CIRCL/Vulnerability-FSTEC |
| Model | CIRCL/vulnerability-severity-classification-russian-ruRoberta-large |
| Base model | ai-forever/ruRoberta-large |
Improved English Severity Classifier 🇬🇧
The English model is trained on a broad set of sources for better coverage and accuracy.
Sources:
- CVE Program (enriched with vulnrichment and Fraunhofer FKIE)
- GitHub Security Advisories
- PySec advisories
- CSAF Cisco
- CSAF CISA
| Dataset | CIRCL/vulnerability-scores |
| Model | CIRCL/vulnerability-severity-classification-roberta-base |
| Base model | FacebookAI/roberta-base |
Improved Chinese Severity Classifier 🇨🇳
The Chinese model is trained on data from the China National Vulnerability Database (CNVD).
| Dataset | CIRCL/Vulnerability-CNVD |
| Model | CIRCL/vulnerability-severity-classification-chinese-macbert-base |
| Base model | hfl/chinese-macbert-base |
ML-Gateway 0.5.0
ML-Gateway is the FastAPI-based inference server that loads pre-trained models at startup and exposes them through a RESTful API. It supports multilingual severity classification out of the box: clients simply specify the desired model in their request.
ML-Gateway v0.5.0 adds support for the new Russian severity classification model (CIRCL/vulnerability-severity-classification-russian-ruRoberta-large):
- Registered the Russian ruRoBERTa-large model in the model registry with standard CVSS severity labels (Low, Medium, High, Critical).
- Added the model to the CLI
refresh-allcommand for pre-downloading.
Vulnerability-Lookup uses ML-Gateway to provide AI-powered severity predictions directly in its web interface.
Funding
AIPITCH (AI-Powered Innovative Toolkit for Cybersecurity Hubs) is a co-funded EU project supported by the European Cybersecurity Competence Centre (ECCC) under the DIGITAL-ECCC-2024-DEPLOY-CYBER-06-ENABLINGTECH program and CIRCL.