arXiv

Large Language Models Are Overconfident in Their Own Responses

Large Language Models Exhibit Excessive Confidence in Their Own Outputs

arXiv:2606.03437v1 Announce Type: new Abstract: Previous research has established that instruction-tuned large language models (LLMs) suffer from poorer calibration compared to their base pre-trained versions. Yet, the specific influence of the widely adopted chat template on the calibration of conversational LLMs remains largely unexplored. This study isolates the drivers of this miscalibration by separating the impacts of post-training algorithms from chat formatting. Our findings indicate that while instruction tuning inherently degrades calibration, the chat template exacerbates the problem via an "ownership bias." Specifically, models display substantially higher confidence in their own generated answers than in identical responses attributed to a user.

Through extensive experiments involving six recent open-weight LLMs, three distinct benchmarks, and three confidence elicitation methods, we observed that models can assign confidence scores up to 26% higher to their own outputs. Capitalizing on this discovery, we introduce a straightforward inference-time technique: presenting the model’s generated answer as user input during the confidence elicitation process. This method effectively curtails overconfidence and enhances calibration by as much as 26%, eliminating the need for retraining and significantly closing the performance gap between base and instruction-tuned models.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TechCrunch

The world’s largest privately owned laser just turned on

Xcimer Energy activated the Phoenix laser, the world’s largest privately owned laser, aiming to commercialize fusion pow...

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya
Bloomberg

Uber Targets Doubling Its Fleet of Electric Motorcycles in Kenya

Uber plans to double its electric motorcycle fleet in Kenya. This expansion aims to enhance sustainable transport option...

AI Saves Time But Most Companies Waste the Gain, Study Shows
Bloomberg

AI Saves Time But Most Companies Waste the Gain, Study Shows

A study reveals that while AI saves employee time, most companies fail to capitalize on these gains, squandering potenti...

JPMorgan Lifts S&P Target on Earnings 'Supercycle'
Bloomberg

JPMorgan Lifts S&P Target on Earnings 'Supercycle'

JPMorgan raised its S&P 500 target, citing an earnings “supercycle” that reflects heightened confidence in corporate pro...

Europe Sleepwalking Into Economic Ruin, Serb Leader Says
Bloomberg

Europe Sleepwalking Into Economic Ruin, Serb Leader Says

Serbian leader warns Europe is sleepwalking into economic ruin.

Delta Electronics Flags Power Crunch
Bloomberg

Delta Electronics Flags Power Crunch

Delta Electronics warns of a looming power deficit due to surging demand and constrained production, predicting serious ...