Overview
We are organizing the inaugural Singing Voice Deepfake Detection (SVDD) 2024 challenge at IEEE Spoken Language Technology Workshop (SLT) 2024 to foster
research and development of sophisticated methods for detecting AI-generated singing voices.
This is an
emerging issue within the music industry that requires specialized solutions.
Our prior analysis using the SingFake dataset showed a marked decline in performance of
state-of-the-art
speech deepfake countermeasures when applied to singing voices. This highlighted the need for
tailored
SVDD techniques.
Controlled singing voice deepfake detection (CtrSVDD)
We will generate singing vocals with existing singing voice synthesis and singing voice
conversion systems. This will mitigate the artifacts by the singing voice separation algorithms,
and we expect it to be easier than the in-the-wild settings. We leverage both singing voice
synthesis and singing voice conversion models for the CtrSVDD.
In-the-wild singing voice deepfake detection (WildSVDD)
We will collect more data similar to the SingFake dataset from the video platforms.
Challenge Details
To support this challenge, we are expanding our existing SingFake dataset to create a more
comprehensive
in-the-wild dataset. This expanded dataset will provide a rich source of real-world examples for
SVDD
research. Given the copyright constraints of the in-the-wild data, we will not publish the
recordings
directly. Instead, we will provide web URLs where these recordings can be accessed, ensuring
compliance
with copyright laws. In addition to the in-the-wild dataset, we are also curating a controlled
dataset
using various existing singing voice synthesis systems. This will include some of the
top-performing
systems from the Singing Voice Conversion Challenge. The deepfakes generated for this controlled
setting
will focus exclusively on the singing vocals.
Submissions will be evaluated based on classification accuracy metrics to quantify
generalization
capabilities. Details will be coming soon!
We will have a special session at Spoken Language Technology Workshop (SLT) 2024 for participants to present their
methodologies and findings. The paper submissions for this session will feature a comprehensive
challenge summary paper by the organizers, along with system descriptions from participating
teams. We
also encourage researchers to submit their findings using our dataset, even if they choose not
to
participate in the challenge.
Organizers
You Zhang, University of Rochester,
you.zhang@rochester.edu
Yongyi Zang, University of Rochester,
yongyi.zang@rochester.edu
Jiatong Shi, Carnegie Mellon University,
jiatongs@andrew.cmu.edu
Ryuichi Yamamoto, Nagoya University,
zryuichi@gmail.com
Tomoki Toda, Nagoya University,
tomoki@icts.nagoya-u.ac.jp
Zhiyao Duan, University of Rochester,
zhiyao.duan@rochester.edu
We invite speech and audio researchers to participate in the SVDD challenge and further progress
in this
problem space. Please contact us if you have any questions!