KliniskVestBERT: BERT Model Specialised to Norwegian Clinical Texts
Title: KliniskVestBERT: A BERT Architecture Tailored for Norwegian Clinical Documentation
Abstract
As the integration of Natural Language Processing (NLP) into healthcare continues to expand, there is a growing necessity for language models that are finely tuned to the unique intricacies of clinical terminology. This paper presents KliniskVestBERT, a collection of three BERT-based encoder models that have been pre-trained on a large-scale corpus of real-world, de-identified Norwegian clinical records sourced from Helse Vest. The study involves continuing the pre-training process of established language modelsâspecifically Nb-BERT-large, NorBERT3-large, and ModernBERTâusing this specialized clinical dataset.
The dataset reflects a representative cross-section of the Helse Vest patient population. It comprises carefully selected document types, such as discharge summaries, surgical reports, and nursing notes, spanning both bokmÄl and nynorsk. This curation ensures a comprehensive coverage of the linguistic diversity found within Norwegian healthcare environments.
To validate the models, evaluations were conducted using three synthetic Norwegian clinical benchmark datasets alongside two real-world clinical challenges. The results indicate that each of the clinically specialized models consistently surpasses their baseline counterparts. These findings underscore the substantial advantages of employing domain-specific pre-training for NLP applications in the medical field. This initiative was a collaborative project involving all Helse Vest entitiesâHelse Bergen, Helse Fonna, Helse FĂžrde, and Helse Stavangerâwith DIPS serving as the project lead under the direction of Helse Vest ICT.
Source: arXiv Generated at: 2026-06-02 00:00:00 UTC




