Dan H

Karma: 3,474

ai-frontiers.org

newsletter.safe.ai

newsletter.mlsafety.org

Dan H Oct 23, 2023, 8:53 PM
8 points
1
on: Machine Unlearning Evaluations as Interpretability Benchmarks
I agree that this is an important frontier (and am doing a big project on this).

AISN #24: Kissinger Urges US-China Cooperation on AI, China’s New AI Law, US Export Controls, International Institutions, and Open Source AI

Dan H and Corin Katzke

Oct 18, 2023, 5:06 PM

14 points

0 comments6 min readLW link

(newsletter.safe.ai)

AISN #23: New OpenAI Models, News from Anthropic, and Representation Engineering

Dan HOct 4, 2023, 5:37 PM

15 points

2 comments5 min readLW link

(newsletter.safe.ai)

AISN #22: The Landscape of US AI Legislation - Hearings, Frameworks, Bills, and Laws

Dan HSep 19, 2023, 2:44 PM

20 points

0 comments5 min readLW link

(newsletter.safe.ai)

Uncovering Latent Human Wellbeing in LLM Embeddings

ChengCheng, Pedro Freire, Dan H and Scott Emmons

Sep 14, 2023, 1:40 AM

32 points

7 comments8 min readLW link

(far.ai)

MLSN: #10 Adversarial Attacks Against Language and Vision Models, Improving LLM Honesty, and Tracing the Influence of LLM Training Data

Sep 13, 2023, 6:03 PM

15 points

1 comment5 min readLW link

(newsletter.mlsafety.org)

AISN #21: Google DeepMind’s GPT-4 Competitor, Military Investments in Autonomous Drones, The UK AI Safety Summit, and Case Studies in AI Policy

Dan HSep 5, 2023, 3:03 PM

15 points

0 comments5 min readLW link

(newsletter.safe.ai)

Dan H Aug 30, 2023, 2:11 AM
23 points
10
on: Broken Benchmark: MMLU
Almost all datasets have label noise. Most 4-way multiple choice NLP datasets collected with MTurk have ~10% label noise, very roughly. My guess is MMLU has 1-2%. I’ve seen these sorts of label noise posts/papers/videos come out for pretty much every major dataset (CIFAR, ImageNet, etc.).

AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities

Dan HAug 29, 2023, 3:07 PM

12 points

0 comments8 min readLW link

(newsletter.safe.ai)

Dan H Aug 21, 2023, 4:53 PM
3 points
0
in reply to: O O’s comment on: AI Forecasting: Two Years In
The purpose of this is to test and forecast problem-solving ability, using examples that substantially lose informativeness in the presence of Python executable scripts. I think this restriction isn’t an ideological statement about what sort of alignment strategies we want.

Dan H Aug 21, 2023, 4:52 PM
3 points
0
in reply to: O O’s comment on: AI Forecasting: Two Years In
I think there’s a clear enough distinction between Transformers with and without tools. The human brain can also be viewed as a computational machine, but when exams say “no calculators,” they’re not banning mental calculation, rather specific tools.

Dan H Aug 21, 2023, 4:25 PM
3 points
1
in reply to: jsteinhardt’s comment on: AI Forecasting: Two Years In
It was specified in the beginning of 2022 in https://www.metaculus.com/questions/8840/ai-performance-on-math-dataset-before-2025/#comment-77113 In your metaculus question you may not have added that restriction. I think the question is much less interesting/informative if it does not have that restriction. The questions were designed assuming there’s no calculator access. It’s well-known many AIME problems are dramatically easier with a powerful calculator, since one could bash 1000 options and find the number that works for many problems. That’s no longer testing problem-solving ability; it tests the ability to set up a simple script so loses nearly all the signal. Separately, the human results we collected was with a no calculator restriction. AMC/AIME exams have a no calculator restriction. There are different maths competitions that allow calculators, but there are substantially fewer quality questions of that sort.

I think MMLU+calculator is fine though since many of the exams from which MMLU draws allow calculators.

Dan H Aug 20, 2023, 4:47 PM
2 points
0
in reply to: Lech Mazur’s comment on: AI Forecasting: Two Years In
Usage of calculators and scripts are disqualifying on many competitive maths exams. Results obtained this way wouldn’t count (this was specified some years back). However, that is an interesting paper worth checking out.

Risks from AI Overview: Summary

Dan H, Mantas Mazeika and TW123

Aug 18, 2023, 1:21 AM

25 points

1 comment13 min readLW link

(www.safe.ai)

Dan H Aug 17, 2023, 5:52 PM
15 points
7
on: Announcing Foresight Institute’s AI Safety Grants Program
1. Neurotechnology, brain computer interface, whole brain emulation, and “lo-fi” uploading approaches to produce human-aligned software intelligence
Thank you for doing this.

AISN #19: US-China Competition on AI Chips, Measuring Language Agent Developments, Economic Analysis of Language Model Propaganda, and White House AI Cyber Challenge

Dan HAug 15, 2023, 4:10 PM

21 points

0 comments5 min readLW link

(newsletter.safe.ai)

AISN #18: Challenges of Reinforcement Learning from Human Feedback, Microsoft’s Security Breach, and Conceptual Research on AI Safety

Dan HAug 8, 2023, 3:52 PM

13 points

0 comments LW link

(newsletter.safe.ai)

Dan H Aug 7, 2023, 1:39 AM
8 points
1
on: how 2 tell if ur input is out of distribution given only model weights
There’s a literature on this topic. (paper list, lecture/slides/homework)

AISN #17: Automatically Circumventing LLM Guardrails, the Frontier Model Forum, and Senate Hearing on AI Oversight

Dan HAug 1, 2023, 3:40 PM

8 points

0 comments8 min readLW link

(newsletter.safe.ai)

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer

Dan H and Corin Katzke

Aug 1, 2023, 3:39 PM

3 points

0 comments6 min readLW link

(newsletter.safe.ai)

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer