Dataffect

Browse Datasets

Mozilla Common Voice
Mozilla Common Voice is an open-source dataset of voice recordings in multiple languages, created to help train and develop speech recognition systems. It contains thousands of validated hours of speech contributed by volunteers around the world.

Audius Lyric Transcriptions
This dataset contains accurate lyric transcriptions for thousands of tracks, spanning multiple genres and artists in the decentralized music ecosystem.

Azure Cognitive Speech
This dataset contains spoken recordings of predefined test sentences sourced from the Azure Cognitive Speech TTS GitHub repository https://github.com/Azure-Samples/Cognitive-Speech-TTS, collected to evaluate and improve text to speech and speech synthesis systems across diverse voices, accents, and speaking styles.

Trick or Treat?
It’s spooky season! Jump in, complete a task, and grab your treat!

Image Annotation
Annotate images with bounding boxes for specified objects & pictures.