arXiv

Lexicons and grammars for language processing: industrial or handcrafted products?

Title: Language Processing Resources: The Debate Between Manual Craftsmanship and Industrial Automation

Abstract

In recent years, the application of linguistic data to language processing has seen steady growth, with such assets now widely recognized as language resources. While traditional resources primarily consist of text collections like the Brown Corpus and the Penn Treebank, there has been a recent surge in the development of electronic lexicons—including WordNet, FrameNet, VerbNet, ComLex, and Lexicon-Grammar—as well as formal grammars such as TAG.

A distinct contrast exists in how these resources are built: while corpus construction has long relied on high levels of automation, the creation of lexicons and grammars remains predominantly a manual endeavor. Increasingly, language processing experts acknowledge that lexicons and grammars possess a richer informational density than corpora, thereby enabling more sophisticated processing techniques. This disparity in construction time may be attributed to the difference in informational content; the careful handcrafting of these resources by linguists yields data that is more informative than what can be produced through automatic generation.

Currently, this field is trending in two divergent directions. One path involves language technology specialists becoming accustomed to managing manually constructed resources, which offer greater complexity and depth. The other, which represents the dominant view, focuses on automating and industrializing the creation of lexicons and grammars. Both trajectories are already underway, creating a palpable tension between them.

The future relationship between linguists and computer scientists hinges on which direction prevails. The first approach necessitates the recruitment and training of a significant number of linguists, whereas the second relies primarily on technical solutions devised by computer engineers. This article examines practical examples of these language resources to evaluate whether manual creation, industrial generation, or a hybrid of both approaches offers the most realistic or effective outcomes.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...