arXiv

GlossAssist -- A Tool to Simplify Corpus Creation and Study the Effect of NLP Models in Low-Resource Documentation Settings

Title: GlossAssist: Streamlining Corpus Development and Analyzing NLP Model Performance in Low-Resource Documentation Contexts

Abstract: Interlinear glossed text (IGT) serves as the conventional standard for linguistic annotation within the field of language documentation. However, the manual generation of IGT is frequently characterized by high costs and significant time consumption. While automated glossing systems have seen considerable advancements in recent years, their uptake among field linguists remains low. This limited adoption stems largely from the fact that current tools are optimized for evaluation metrics rather than practical utility; they fail to provide an interpretable mechanism for corrections or to integrate linguistic expertise back into the model’s decision-making process.

In response, we introduce GlossAssist, a glossing interface grounded in the retrieval-based architecture of CWoMP (Contrastive Word-Morpheme Pre-training). This design anchors predictions within a dynamic lexicon of learned morpheme representations. When paired with CWoMP, the system operates within an active learning framework: every correction made by an annotator contributes to expanding the lexicon and refining future predictions, thereby enhancing accuracy without necessitating model retraining. This paper details our interface design and posits that such a feedback loop must be considered a fundamental design requirement for NLP tools intended for use by documentary linguists.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

TechCrunch

Meta’s Oversight Board says account bans lack due process, transparency

Meta’s Oversight Board criticized account bans for lacking due process and transparency, citing inconsistent enforcement...

TechCrunch

Meta rolls out a new AI creator assistant on Facebook

Meta launched an AI creator assistant on Facebook to streamline analytics and content brainstorming. Initially available...

TechCrunch

What to expect from WWDC 2026: Siri’s highly anticipated revamp and Apple Intelligence updates

WWDC 2026 promises a Siri revamp powered by Google’s Gemini and standalone app, plus AI agents in the App Store and Came...

TechCrunch

A burglar used a Waymo to steal yoga clothes in San Francisco — and got away with it

A thief stole yoga clothes using a Waymo, but police failed to catch them because the car’s video data was deleted and b...

Goldman Sachs CEO David Solomon on the Coming Mega IPOs
Bloomberg

Goldman Sachs CEO David Solomon on the Coming Mega IPOs

Goldman Sachs CEO David Solomon anticipates a surge in major IPOs, signaling renewed market confidence and significant o...

What Are A.I. Agents Actually Doing?
New York Times

What Are A.I. Agents Actually Doing?

Arena research shows tech professionals are most likely to use AI agents at work, highlighting a strong industry trend i...