arXiv

Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

Title: Bridging the Gap: Transfer Learning from English PLMs to Malaysian English

Malaysian English is a low-resource creole that integrates elements from Malay, Chinese, and Tamil alongside Standard English. Traditional Named Entity Recognition (NER) models often struggle to accurately identify entities within Malaysian English texts, largely due to the language’s unique morphosyntactic structures, semantic characteristics, and frequent code-switching between English and Malay.

To address these challenges, we present MENmBERT and MENBERT, pre-trained language models (PLMs) equipped with contextual understanding specifically designed for Malaysian English. We refined these models by fine-tuning them on manually annotated entities and relations extracted from the Malaysian English News Article (MEN) Dataset. This fine-tuning enables the PLMs to learn representations that effectively capture the specific nuances of Malaysian English required for NER and Relation Extraction (RE) tasks.

In comparative evaluations, MENmBERT outperformed the bert-base-multilingual-cased model by 1.52% on NER tasks and by 26.27% on RE tasks. While the aggregate NER performance gains may appear modest, our deeper analysis reveals statistically significant improvements when assessing performance across the 12 distinct entity labels. These results indicate that pre-training language models on corpora that are both language-specific and geographically targeted offers a promising strategy for enhancing NER capabilities in low-resource contexts. Furthermore, the dataset and code released in this study serve as essential resources for NLP research focused on Malaysian English.


Source: arXiv Generated at: 2026-06-02 00:00:00 UTC

Related Articles

Law’s Billable Hour Is Being Shredded by AI
Bloomberg

Law’s Billable Hour Is Being Shredded by AI

AI is dismantling the billable hour by automating routine legal tasks. This technological shift threatens the traditiona...

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026
Bloomberg

Iran War: Trump Tries to Stop Israel’s Lebanon Push | The Opening Trade 6/2/2026

SoftBank in Early Talks to Back $800 Million Agile Robots Round
Bloomberg

SoftBank in Early Talks to Back $800 Million Agile Robots Round

SoftBank is in early talks to back Agile Robots’ $800 million funding round. The Japanese tech giant is currently in pre...

Amundi Is Diversifying Risk Via Commodity Currencies, Gold
Bloomberg

Amundi Is Diversifying Risk Via Commodity Currencies, Gold

Amundi diversifies risk by investing in commodity-linked currencies and gold. This strategy hedges against market volati...

Reuters

Marvell Technology surges after Nvidia's Huang calls it 'next trillion-dollar company'

Marvell Technology shares surged after Nvidia CEO Jensen Huang labeled the firm the “next trillion-dollar company.”

Russia Says It Found Foreign Spyware on Top Officials’ Phones
Bloomberg

Russia Says It Found Foreign Spyware on Top Officials’ Phones

Russia’s FSB claims to have discovered foreign spyware on senior officials’ phones. Moscow attributes the intrusion to h...