arXiv

OASIS: Outlier-Aware LUT-Based GEMM with Dual-Side Quantization for LLM Inference Acceleration

Title: OASIS: Dual-Side Quantization and Outlier-Aware LUT-Based GEMM for Accelerating LLM Inference

Large language models (LLMs) have shown remarkable performance across numerous applications, yet their inference processes place heavy burdens on memory and computational resources. Current quantization techniques face a dilemma between efficiency and accuracy: weight-only quantization (WOQ) suffers from expensive dequantization overheads, whereas integer weight-and-activation quantization (INT-WAQ) sacrifices precision, leading to reduced model quality. While non-uniform weight-and-activation quantization (NU-WAQ) effectively handles the skewed distributions of LLM data, it lacks compatibility with standard low-precision hardware.

To address these challenges, this study introduces OASIS, a lookup table (LUT)-based architecture designed to perform efficient general matrix multiplication (GEMM) between non-uniformly quantized weights and activations, eliminating the need for dequantization. By utilizing pre-computed Cartesian Product LUTs, OASIS reduces LUT storage requirements by 64 times and boosts computational parallelism by 1,024 times compared to existing LUT-based GEMM approaches.

To maintain high accuracy despite aggressive activation quantization, OASIS incorporates an outlier-aware quantization framework. This system combines LUT-based GEMM with error compensation specifically targeted at outliers. Additionally, the authors developed Orizuru, a high-efficiency engine for real-time detection of top-k activation outliers.

Extensive evaluations demonstrate that OASIS maintains an average accuracy loss of just 1.98% relative to the FP16 baseline, representing a 5.18% improvement over Atom. In terms of hardware performance, OASIS delivers an average speedup of 3.00x and enhances energy efficiency by 1.44x when compared to the FIGLUT accelerator.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...