arXiv

Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

June 4, 2026 · Rafael R. Baptista, Andr\'e de Lima Salgado, Ricardo V. Godoy, Marcelo Becker, Thiago Boaventura, Gustavo J. G. Lahr · Original Source

Title: Assessing the Efficacy of Zero-Shot and One-Shot Adaptation in Small Language Models for Leader-Follower Dynamics

Abstract:

Establishing leader-follower roles is a fundamental component of human-robot interaction (HRI); however, dynamically assigning these roles poses significant difficulties for mobile and assistive robots with limited computational resources. Although Large Language Models (LLMs) have demonstrated potential for facilitating natural communication, their substantial size and processing latency hinder deployment on edge devices. Small Language Models (SLMs) present a viable alternative, yet their capability for role classification within HRI contexts has lacked systematic evaluation.

This study introduces a benchmark for SLMs focused on leader-follower communication, utilizing a new dataset created by deriving data from an existing database and enhancing it with synthetic samples to better reflect interaction-specific dynamics. We examine two adaptation methodologies—prompt engineering and fine-tuning—across both zero-shot and one-shot interaction scenarios, measuring their performance against an untrained baseline.

Experiments conducted using Qwen2.5-0.5B indicate that zero-shot fine-tuning delivers robust classification results, achieving an accuracy of 86.66% with a low latency of 22.2 ms per sample. This approach significantly surpasses both the baseline and prompt-engineered methods. Conversely, the study notes a decline in performance during one-shot modes, attributing this issue to the increased context length exceeding the model’s architectural limits. These outcomes suggest that while fine-tuned SLMs offer a practical solution for direct role assignment, they also expose essential trade-offs between the complexity of dialogue and the reliability of classification in edge computing environments.

Source: arXiv Generated at: 2026-06-04 00:00:00 UTC