Retour aux données ouvertes
NLP & Langues

Goud-sum (HuggingFace) — Darija Summarization Dataset

À Propos

Goud-sum contains 158,282 article-headline pairs extracted from the Goud.ma news website. Headlines are in Moroccan Darija, articles in Darija, MSA, or code-switched. Tasks: text summarization. Splits: train (139k), validation (9.5k), test (9.5k). Size: 326 MB. Languages: Moroccan Arabic, Modern Standard Arabic. Citation: Issam & Mrini, 3rd Workshop on African NLP, 2022.

https://huggingface.co/datasets/Goud/Goud-sum
Visiter le site