arXiv

CoughSense: Five-Class Respiratory Disease Classification via Whisper Encoder Fine-Tuning and Dual-Encoder Cross-Attention Fusion with Balanced Contrastive Learning

Title: CoughSense: Enhancing Five-Class Respiratory Disease Classification Through Whisper Encoder Fine-Tuning, Dual-Encoder Cross-Attention Fusion, and Balanced Contrastive Learning

Abstract

While automated cough analysis presents a viable avenue for affordable respiratory screening, current research is largely confined to binary detection of COVID-19. To develop a practical diagnostic tool capable of distinguishing among multiple respiratory ailments from a single smartphone recording, we introduce CoughSense. This system categorizes cough samples into five distinct categories: healthy, COVID-19, asthma or other respiratory conditions, bronchitis, and pneumonia.

Our approach utilizes a comprehensive dataset comprising 18,301 recordings sourced from four public repositories: Coswara, CoughVID, Virufy, and the West China Hospital Pediatric Cough Dataset. We employ the OpenAI Whisper encoder as the foundational backbone for disease classification. A pivotal innovation in our architecture is active-frame QKV attention pooling, which limits attention mechanisms to the initial 200 tokens out of the 1,500 available in Whisper’s encoder. This strategy effectively mitigates the "silence-dilution" issue, a common challenge where a three-second cough occupies only 150 tokens within Whisper’s 30-second input window.

To address significant challenges such as class imbalance (ranging from 19:1 ratios) and domain shifts across the four datasets, the training protocol incorporates several advanced techniques. These include WeightedRandomSampler, SpecAugment, supervised contrastive auxiliary loss, FiLM symptom conditioning, and gradient-reversal domain adaptation. Additionally, we implemented Balanced Mixup with forced minority pairing to further stabilize training.

The final model architecture features a dual-encoder setup that integrates the Whisper encoder with the OPERA-CT respiratory foundation model via cross-attention. The lightweight CoughSense variant, based on Whisper-tiny with 8.6 million parameters, achieved a balanced accuracy of 82.3% in five-fold cross-validation, yielding a macro-F1 score of 0.817 and an AUC of 0.941. These results outperformed an ImageNet-pretrained EfficientNet-B2 by 11.1 percentage points and a scratch-trained ViT by 29.6 points. Notably, all five classes demonstrated a recall rate exceeding 74%, with four classes surpassing 80%. The dual-encoder configuration further improved performance, reaching a balanced accuracy of 85.4%. Ablation studies identified active-frame pooling as the most impactful single component, contributing a 5.1-point gain, suggesting its broad utility for short-audio tasks utilizing Whisper as a backbone.


Source: arXiv Generated at: 2026-06-03 00:00:00 UTC

Related Articles

TikTok Billionaire Tops Ambani as Asia’s Second-Richest
Bloomberg

TikTok Billionaire Tops Ambani as Asia’s Second-Richest

TikTok founder surpasses Mukesh Ambani to become Asia’s second-richest person, marking a significant shift in the region...

Publishers in UK can opt out of Google AI search results
BBC News

Publishers in UK can opt out of Google AI search results

UK publishers can now opt out of Google’s AI search summaries, a CMA ruling designed to boost their bargaining power and...

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.
Bloomberg

Kioxia Edges Nearer Toyota’s Market Cap in Shakeup to Japan Inc.

Kioxia’s market cap nears Toyota’s, signaling a major shift in Japan’s corporate hierarchy. This narrowing gap highlight...

Reuters

Morning Bid: Marvell, a fitting name for the latest AI darling

Reuters highlights Marvell as a top AI stock, noting its name perfectly suits its status as the newest market darling.

Financial Times

Tim Hayward: I built the Jaguar E-Type of computer keyboards

Tim Hayward compares his bespoke keyboard designs to the Jaguar E-Type. He explores high-end customization for personal ...

Financial Times

AI Labs: Zuckerberg’s $100bn gamble

Meta’s $100 billion AI investment aims to secure AI dominance, but questions remain whether sheer spending can outpace c...