arXiv

Interfaze: The Future of AI is built on Task-Specific Small Models

Title: Interfaze: The Future of AI is Built on Task-Specific Small Models

Abstract:

We introduce Interfaze, a native hybrid architecture that integrates task-specific deep neural networks—specifically CNNs and DNNs—directly into a transformer decoder via a unified embedding space. This system leverages specialized perceptual encoders to manage optical character recognition (OCR) across complex, multilingual PDFs, perform open-vocabulary detection for objects and graphical user interfaces (GUIs), and execute multilingual speech recognition with diarization.

Each specialized module is accessed through distinct adapters, allowing for independent activation. Consequently, a user query engages only the necessary parameters, minimizing computational overhead. A built-in action foundation provides grounded external state capabilities, including a proxied headless browser, a scraper, a code sandbox, a multi-domain web index, and a scalable vector store. The decoder synthesizes and filters these inputs, applying reasoning where required and generating deterministic outputs based on confidence levels. Crucially, raw specialist metadata—such as bounding boxes, confidence scores, and timestamps—is retained and delivered alongside the final response as precontext.

On this architecture, Interfaze-Beta dominates a range of deterministic developer-task benchmarks. It achieved a score of 70.7% on OCRBench v2, 85.7% on olmOCR, 82.1% on RefCOCO, and a 2.4% word error rate on VoxPopuli. Further performance metrics include 52.9% on Spider-2.0-Lite, 92.4% on GPQA-Diamond, 90.9% on MMMLU, 71.1% on MMMU-Pro, and 80.5% value accuracy on the Structured Output Benchmark (SOB). In every evaluated task, Interfaze-Beta outperforms comparably priced generalist models, including Gemini-3-Flash, Gemini-3.5-Flash, Claude-Sonnet-4.6, GPT-5.4-Mini, and Grok-4.3.

By resolving perception in a single pass through fused specialist encoders rather than relying on iterative tool calls to a large model, Interfaze delivers high accuracy with verifiable metadata for deterministic tasks while maintaining flash-tier operational costs.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs
Bloomberg

China’s Robotaxi Dilemma Shows AI Policy Tension Between Growth and Jobs

China’s robotaxi expansion highlights the policy tension between driving economic growth through AI and protecting emplo...

Exams watchdog warns of rise in high-tech cheating
BBC News

Exams watchdog warns of rise in high-tech cheating

Ofqual warns of rising high-tech cheating, with smart devices involved in 44% of misconduct cases. Invigilators are trai...

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom
Bloomberg

Thailand’s Richest Man Plans $4.3 Billion Expansion Amid AI Boom

Thailand’s wealthiest individual is investing $4.3 billion in expansion, capitalizing on the booming artificial intellig...

Reuters

Amazon unveils new AI warehouse robot in $12 billion Europe push

Amazon unveiled a new AI warehouse robot, marking a key step in its $12 billion European expansion strategy to enhance l...

US Tech Sector Announces Most Job Cuts in Nearly Two Years
Bloomberg

US Tech Sector Announces Most Job Cuts in Nearly Two Years

The US tech sector recorded its highest wave of layoffs in nearly two years, signaling a significant downturn for the in...

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026
Bloomberg

Iran Says No Progress in US Talks | The Opening Trade 6/4/2026

Iran reports no progress in US talks on June 4, 2026. The Opening Trade highlights the ongoing diplomatic impasse betwee...