Fatika Umar Ibrahim

Karma: 24

Fatika Umar Ibrahim 13 May 2026 0:03 UTC
3 points
0
in reply to: gwern’s comment on: Evaluating different AI’s on African livestck knowledge
The 43% reflects a spectrum within the benchmark, not a flat score across equally inaccessible questions. Here’s the breakdown
Tropical Disease Knowledge: 60
Local Treatment Context: 52.9%
Production & General Context: 48.6%
Breed Knowledge: 42.9%
Terminology: 41.4%
Ethnoveterinary practices: 35.7%
The model performs where training data fragments exist ,FAO reports, tropical medicine literature, publicly available Nigerian curriculum. It drops on the categories built from oral tradition and field specific practice. That gradient is intentional and it’s part of the finding.
That’s exactly what the category breakdown shows. Where general veterinary knowledge applies Tropical Disease the model scores 60%. Where it cannot, Ethnoveterinary field practice, oral tradition, specific knowledge it drops to 35.7%. The gradient answers your question directly. The model is doing exactly what you’d predict: performing on accessible literature and failing on the knowledge that has no systematic documentation.

Fatika Umar Ibrahim 10 May 2026 0:23 UTC
1 point
0
in reply to: gwern’s comment on: Evaluating different AI’s on African livestck knowledge
The benchmark is open ended Q&A, not multiple choice, so there is no 25% random baseline. The model generates free text responses and has no options to select from.43% is the percentage of questions scoring 2 which is fully correct. I should have stated that more clearly.
For the questions, they were constructed from Nigerian veterinary curriculum materials and my years of specific training as a vet student. They are not answerable by general veterinary knowledge, breed specific production parameters for White Fulani or field recognition cues for trypanosomiasis as a Fulani herdsman would use them do not appear in Western literature. That is the gap being measured.

Evaluating different AI’s on African livestck knowledge

Fatika Umar Ibrahim2 May 2026 20:28 UTC

23 points

4 comments1 min readLW link