newsletter.safe.ai
Dan H
AISN #24: Kissinger Urges US-China Cooperation on AI, China’s New AI Law, US Export Controls, International Institutions, and Open Source AI
AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering
AISN #22: The Landscape of US AI Legislation - Hearings, Frameworks, Bills, and Laws
Uncovering Latent Human Wellbeing in LLM Embeddings
MLSN: #10 Adversarial Attacks Against Language and Vision Models, Improving LLM Honesty, and Tracing the Influence of LLM Training Data
AISN #21: Google DeepMind’s GPT-4 Competitor, Military Investments in Autonomous Drones, The UK AI Safety Summit, and Case Studies in AI Policy
Almost all datasets have label noise. Most 4-way multiple choice NLP datasets collected with MTurk have ~10% label noise, very roughly. My guess is MMLU has 1-2%. I’ve seen these sorts of label noise posts/papers/videos come out for pretty much every major dataset (CIFAR, ImageNet, etc.).
AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities
The purpose of this is to test and forecast problem-solving ability, using examples that substantially lose informativeness in the presence of Python executable scripts. I think this restriction isn’t an ideological statement about what sort of alignment strategies we want.
I think there’s a clear enough distinction between Transformers with and without tools. The human brain can also be viewed as a computational machine, but when exams say “no calculators,” they’re not banning mental calculation, rather specific tools.
It was specified in the beginning of 2022 in https://www.metaculus.com/questions/8840/ai-performance-on-math-dataset-before-2025/#comment-77113 In your metaculus question you may not have added that restriction. I think the question is much less interesting/informative if it does not have that restriction. The questions were designed assuming there’s no calculator access. It’s well-known many AIME problems are dramatically easier with a powerful calculator, since one could bash 1000 options and find the number that works for many problems. That’s no longer testing problem-solving ability; it tests the ability to set up a simple script so loses nearly all the signal. Separately, the human results we collected was with a no calculator restriction. AMC/AIME exams have a no calculator restriction. There are different maths competitions that allow calculators, but there are substantially fewer quality questions of that sort.
I think MMLU+calculator is fine though since many of the exams from which MMLU draws allow calculators.
Usage of calculators and scripts are disqualifying on many competitive maths exams. Results obtained this way wouldn’t count (this was specified some years back). However, that is an interesting paper worth checking out.
Risks from AI Overview: Summary
Neurotechnology, brain computer interface, whole brain emulation, and “lo-fi” uploading approaches to produce human-aligned software intelligence
Thank you for doing this.
AISN #19: US-China Competition on AI Chips, Measuring Language Agent Developments, Economic Analysis of Language Model Propaganda, and White House AI Cyber Challenge
AISN #18: Challenges of Reinforcement Learning from Human Feedback, Microsoft’s Security Breach, and Conceptual Research on AI Safety
There’s a literature on this topic. (paper list, lecture/slides/homework)
I agree that this is an important frontier (and am doing a big project on this).