Interfaze: The Future of AI is built on Task-Specific Small Models
Title: Interfaze: The Future of AI is Built on Task-Specific Small Models
Abstract:
We introduce Interfaze, a native hybrid architecture that integrates task-specific deep neural networks—specifically CNNs and DNNs—directly into a transformer decoder via a unified embedding space. This system leverages specialized perceptual encoders to manage optical character recognition (OCR) across complex, multilingual PDFs, perform open-vocabulary detection for objects and graphical user interfaces (GUIs), and execute multilingual speech recognition with diarization.
Each specialized module is accessed through distinct adapters, allowing for independent activation. Consequently, a user query engages only the necessary parameters, minimizing computational overhead. A built-in action foundation provides grounded external state capabilities, including a proxied headless browser, a scraper, a code sandbox, a multi-domain web index, and a scalable vector store. The decoder synthesizes and filters these inputs, applying reasoning where required and generating deterministic outputs based on confidence levels. Crucially, raw specialist metadata—such as bounding boxes, confidence scores, and timestamps—is retained and delivered alongside the final response as precontext.
On this architecture, Interfaze-Beta dominates a range of deterministic developer-task benchmarks. It achieved a score of 70.7% on OCRBench v2, 85.7% on olmOCR, 82.1% on RefCOCO, and a 2.4% word error rate on VoxPopuli. Further performance metrics include 52.9% on Spider-2.0-Lite, 92.4% on GPQA-Diamond, 90.9% on MMMLU, 71.1% on MMMU-Pro, and 80.5% value accuracy on the Structured Output Benchmark (SOB). In every evaluated task, Interfaze-Beta outperforms comparably priced generalist models, including Gemini-3-Flash, Gemini-3.5-Flash, Claude-Sonnet-4.6, GPT-5.4-Mini, and Grok-4.3.
By resolving perception in a single pass through fused specialist encoders rather than relying on iterative tool calls to a large model, Interfaze delivers high accuracy with verifiable metadata for deterministic tasks while maintaining flash-tier operational costs.
Source: arXiv Generated at: 2026-06-04 00:00:00 UTC





