arXiv

POLARIS: Guiding Small Models to Write Long Stories

Title: POLARIS: Steering Compact Models Toward Long-Form Storytelling

Abstract

Small, open-weight language models face significant hurdles in long-form creative writing. Their outputs often fail to meet length requirements, or their narrative quality deteriorates markedly as the desired story length grows, particularly when measured against the performance of state-of-the-art frontier models. To address this, we introduce POLARIS (Policy Optimization with LLM-as-a-judge rewards and Anchored-Reference Injection for Storywriting), an efficient Group Relative Policy Optimization (GRPO) framework that requires lower computational resources. This approach integrates two critical components: an online reward system driven by a structured Story Quality rubric evaluated by a frontier LLM, and Human-Reference Injection (HRI), which utilizes a teacher-forced, human-written story as a high-reward anchor within each GRPO batch.

We applied this training methodology to the Qwen3.5-9B model, utilizing a dataset of roughly 1,400 prompt-story pairs extracted from 100 short-story anthologies. The training process was executed on four A100 GPUs, resulting in the POLARIS-9B model. Evaluated across five distinct benchmarks covering both in-distribution and out-of-distribution prompts and rubrics, POLARIS-9B demonstrates performance competitive with significantly larger open-weight models, while showing superior adherence to length constraints. Blinded human evaluations indicate that POLARIS-9B is favored over the baseline Qwen3.5-9B and performs comparably to the larger Qwen3.5-27B. Notably, despite being trained exclusively on stories of up to 4,000 words, POLARIS-9B maintains high quality even when prompted for narratives three times that length. This capability is significant, as most open-weight models suffer substantial declines in quality and length compliance in such extended regimes. Broadly, our findings suggest that the ability to generalize length serves as a rigorous stress test for creative-writing models and a valuable metric for differentiating between closely matched architectures.


Source: arXiv Generated at: 2026-06-04 00:00:00 UTC

Related Articles

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)
Bloomberg

SpaceX Seeks to Raise $75 Billion in Record IPO (Video)

SpaceX aims for a record $75 billion valuation through an initial public offering. This historic IPO marks a significant...

Broadcom AI Chip Outlook Disappoints Investors
Bloomberg

Broadcom AI Chip Outlook Disappoints Investors

Broadcom’s AI chip projections disappointed investors, dampening market sentiment. The outlook fell short of expectation...

Hiranandani Group CEO on Powering India's Digital Future
Bloomberg

Hiranandani Group CEO on Powering India's Digital Future

Hiranandani Group CEO discusses driving India's digital transformation.

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia
Bloomberg

Cerebras Says It’s Working With All AI Gear Makers Except Nvidia

Cerebras confirmed partnerships with all major AI hardware vendors except Nvidia. This broad engagement positions Cerebr...

Putin Turns Russia’s AI Future Into a Kremlin Family Business
Bloomberg

Putin Turns Russia’s AI Future Into a Kremlin Family Business

Putin is consolidating Russia’s AI ambitions into a Kremlin family business, effectively turning the sector into a dynas...

Reuters

Meta repeatedly pushes back new AI model release for developers, WSJ says

Meta has repeatedly delayed the release of its new AI model for developers, according to the WSJ. This ongoing postponem...