Retour aux données ouvertes
NLP & Langues
MA_Open_Datasets — LeMatin
À Propos
Scraped articles from Le Matin du Sahara et du Maghreb newspaper. Organized into categories: culture, économie, monde, nation, royales_activites, societe. Useful for news classification, topic modeling, and French-language Moroccan NLP.
https://github.com/OumaimaHourrane/MA_Open_Datasets/tree/main/LeMatin
Visiter le siteDans la même catégorie
Goud-sum (HuggingFace) — Darija Summarization Dataset
158k articles + headlines from Goud.ma — Darija/MSA text summarization dataset
Darija Open Dataset (DODa)
100k+ entries darija↔English — largest open source Darija translation dataset
MA_Open_Datasets — Goud.ma
Goud news articles in CSV format — alternative distribution of Goud data
MA_Open_Datasets — MoroccoWorldNews
Morocco news articles dataset from MoroccoWorldNews