The great models of language (LLM) — GPT, Gemini, Claude, Llama — are drawn on corpuses where English accounts for 45% of the data, compared to 0.67% for Arabic. Koranic texts, Islamic jurisprudence, Sufi tradition are underrepresented, poorly contextualized, sometimes distorted. The training corpus includes platforms such as Reddit, where discussions on Islam can convey biases that no Council of Ulemas has validated. This imbalance is not a tec…
This story is only covered by news sources that have yet to be evaluated by the independent media monitoring agencies we use to assess the quality and reliability of news outlets on our platform. Learn more here.
The great models of language (LLM) — GPT, Gemini, Claude, Llama — are drawn on corpuses where English accounts for 45% of the data, compared to 0.67% for Arabic. Koranic texts, Islamic jurisprudence, Sufi tradition are underrepresented, poorly contextualized, sometimes distorted. The training corpus includes platforms such as Reddit, where discussions on Islam can convey biases that no Council of Ulemas has validated. This imbalance is not a tec…