Back to open data
NLP & Language

Darija-Dataset-Builder — IlyasFardaouix

About

Scalable pipeline for building Moroccan Darija NLP datasets for LLM training. Provides tools and libraries for data extraction, processing, and organization for training language models on Moroccan Darija.

https://github.com/IlyasFardaouix/darija-dataset-builder
Visit Website