arXiv

Novel Aspects of IEEE SA P3109 Arithmetic Formats for Machine Learning

June 4, 2026 · Andrew Fitzgibbon, Christoph M. Wintersteiger, Jeffrey Sarnoff · Original Source

Title: New Dimensions in IEEE SA P3109 Arithmetic Formats for Machine Learning

Abstract

The IEEE P3109 draft standard introduces a parameterized family of binary floating-point formats designed specifically to support machine learning applications. By enabling the efficient and consistent representation of numerical values using a minimal bit count, these formats are defined by parameters such as bit width, precision, signedness, and whether infinities are included.

The standard specifies operations by mapping floating-point values to the set of closed extended reals, which includes the real numbers augmented with positive and negative infinity as well as NaN (Not a Number). By explicitly handling NaN and infinite operands, the definitions ensure that only real arithmetic is utilized. The framework supports a wide range of rounding and saturation modes, including stochastic rounding.

To enhance throughput, the operations are designed to be exception-free, with exceptional conditions instead communicated via return values, such as NaN. Furthermore, operations on blocks of values that share a common scale factor are uniformly defined in terms of the underlying operations. System vendors can describe approximate implementations using a novel, scale-invariant metric known as kappa-approximation, which is similar to units in the last place. Finally, standard function definitions and various other properties are mechanically verified and generated through formal specifications.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC