Scalable Spontaneous Speech Dataset

Zaid Sheikh, Shuichiro Shimizu, Siddhant Arora, Jiatong Shi, Samuele Cornell, Xinjian Li, Shinji Watanabe
Carnegie Mellon University, USA
Interspeech 2025

Abstract

This paper introduces the Scalable Spontaneous Speech Dataset (SSSD) project, comprising 727 hours of spontaneous English conversations between two randomly-matched, anonymous participants on Amazon Mechanical Turk (MTurk) crowd-sourcing platform. The dataset features conversations averaging 25-30 minutes, covering a wide range of everyday topics. A key innovation of this work is our approach to maximizing the number of MTurk workers concurrently participating in our task, enabling more effective randomized matching and live two-person conversations. Data quality is ensured through a two-tiered task structure: a qualification round to select reliable workers, followed by the main recording sessions. We detail our methodology for collecting and recording spontaneous voice conversations, present analyses of the conversational content and speech quality of the dataset in comparison to other datasets, and discuss potential usage.

BibTeX

@inproceedings{sheikh24_interspeech,
  title     = {Scalable Spontaneous Speech Dataset ({SSSD}): Crowdsourcing Data Collection to Promote Dialogue Research},
  author    = {Zaid Sheikh and Shuichiro Shimizu and Siddhant Arora and Jiatong Shi and Samuele Cornell and Xinjian Li and Shinji Watanabe},
  booktitle = {Interspeech 2025},
  year      = {2025},
}