The 43% reflects a spectrum within the benchmark, not a flat score across equally inaccessible questions. Here’s the breakdown
Tropical Disease Knowledge: 60
Local Treatment Context: 52.9%
Production & General Context: 48.6%
Breed Knowledge: 42.9%
Terminology: 41.4%
Ethnoveterinary practices: 35.7%
The model performs where training data fragments exist ,FAO reports, tropical medicine literature, publicly available Nigerian curriculum. It drops on the categories built from oral tradition and field specific practice. That gradient is intentional and it’s part of the finding.
That’s exactly what the category breakdown shows. Where general veterinary knowledge applies Tropical Disease the model scores 60%. Where it cannot, Ethnoveterinary field practice, oral tradition, specific knowledge it drops to 35.7%. The gradient answers your question directly. The model is doing exactly what you’d predict: performing on accessible literature and failing on the knowledge that has no systematic documentation.
Fatika Umar Ibrahim comments on Evaluating different AI’s on African livestck knowledge