SVDD2024
Singing Voice Deepfake Detection Challenge

Tentative Timeline(Subject to changes)

  • 01/19/2024

    Release of CtrSVDD training/development data and a baseline system

  • 02/20/2024

    Release of tentative evaluation plan v0.2

  • 02/29/2024

    Codalab open for participants

  • 03/20/2024

    Release of WildSVDD data

  • 06/08/2024

    Registration close

  • 06/15/2024

    Results submission deadline

Overview

We are organizing the inaugural Singing Voice Deepfake Detection (SVDD) 2024 challenge to foster research and development of sophisticated methods for detecting AI-generated singing voices. This is an emerging issue within the music industry that requires specialized solutions.
Our prior analysis using the SingFake dataset showed a marked decline in performance of state-of-the-art speech deepfake countermeasures when applied to singing voices. This highlighted the need for tailored SVDD techniques.

Controlled singing voice deepfake detection (CtrSVDD)

We will generate singing vocals with existing singing voice synthesis and singing voice conversion systems. This will mitigate the artifacts by the singing voice separation algorithms, and we expect it to be easier than the in-the-wild settings. We leverage both singing voice synthesis and singing voice conversion models for the CtrSVDD.

In-the-wild singing voice deepfake detection (WildSVDD)

We will collect more data similar to the SingFake dataset from the video platforms.

Challenge Details

To support this challenge, we are expanding our existing SingFake dataset to create a more comprehensive in-the-wild dataset. This expanded dataset will provide a rich source of real-world examples for SVDD research. Given the copyright constraints of the in-the-wild data, we will not publish the recordings directly. Instead, we will provide web URLs where these recordings can be accessed, ensuring compliance with copyright laws. In addition to the in-the-wild dataset, we are also curating a controlled dataset using various existing singing voice synthesis systems. This will include some of the top-performing systems from the Singing Voice Conversion Challenge. The deepfakes generated for this controlled setting will focus exclusively on the singing vocals.
Submissions will be evaluated based on classification accuracy metrics to quantify generalization capabilities. Details will be coming soon!
We are also planning for a special session at Spoken Language Technology Workshop (SLT) 2024 for participants to present their methodologies and findings. The paper submissions for this session will feature a comprehensive challenge summary paper by the organizers, along with system descriptions from participating teams. We also encourage researchers to submit their findings using our dataset, even if they choose not to participate in the challenge.

Organizers

You Zhang, University of Rochester,
you.zhang@rochester.edu

Yongyi Zang, University of Rochester,
yongyi.zang@rochester.edu

Jiatong Shi, Carnegie Mellon University,
jiatongs@andrew.cmu.edu

Ryuichi Yamamoto, Nagoya University,
zryuichi@gmail.com

Tomoki Toda, Nagoya University,
tomoki@icts.nagoya-u.ac.jp

Zhiyao Duan, University of Rochester,
zhiyao.duan@rochester.edu

We invite speech and audio researchers to participate in the SVDD challenge and further progress in this problem space. Please contact us if you have any questions!