TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation
Title: TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation
Abstract:
This paper introduces TalkPlayData 2, a newly generated synthetic dataset designed for multimodal conversational music recommendation, produced via an agentic data pipeline. The proposed framework utilizes multiple Large Language Model (LLM) agents, each assigned distinct roles and equipped with specialized prompts and access to specific information subsets. Conversational data is captured by recording interactions between a Listener LLM and a Recsys LLM. To ensure comprehensive coverage of diverse conversation scenarios, the Listener LLM operates under conditions set by a fine-tuned conversation goal for each instance. Furthermore, all LLMs involved are multimodal, incorporating both audio and image capabilities, which facilitates the simulation of multimodal recommendation and dialogue processes. Evaluation results from both LLM-as-a-judge assessments and subjective experiments demonstrate that TalkPlayData 2 successfully meets its objectives across various metrics relevant to training generative music recommendation models. The dataset, along with its generation code, is publicly available at https://talkpl-ai.github.io.
Source: arXiv Generated at: 2026-06-03 00:00:00 UTC



