Back to open data
NLP & Language
DVoice — Moroccan Darija ASR Dataset
About
DVoice is an open source dataset for Automatic Speech Recognition (ASR) in Moroccan Darija. Contains voice recordings with text transcriptions. 2392 training files and 600 test files. Published by AIOXLABS, Zenodo 2021.
https://github.com/AIOXLABS/DVoice
Visit WebsiteIn the same category
Goud-sum (HuggingFace) — Darija Summarization Dataset
158k articles + headlines from Goud.ma — Darija/MSA text summarization dataset
Darija Open Dataset (DODa)
100k+ darija↔English entries — largest open source Darija translation dataset
MA_Open_Datasets — Goud.ma
Goud news articles in CSV format — alternative distribution of Goud data
MA_Open_Datasets — LeMatin
Le Matin newspaper articles by category — nation, economy, culture, sports