JustisMills

Karma: 503

JustisMills 28 Jun 2024 4:55 UTC
1 point
0
in reply to: ryan_greenblatt’s comment on: How Big a Deal are MatMul-Free Transformers?
Yeah, you’ve convinced me I was a little too weak just by saying “the scaling laws are untested”—I had the same feeling of like “maybe I’m getting Eulered here, and maybe they’re Eulering themselves” with the 10^23 thing.
Mostly I just kept seeing suggested articles in the mainstream-ish tech press about this “wow, no MatMul” thing, assumed it was an overhyped exaggeration/misleading, and was pleasantly surprised it was for real (as far as it goes). But I’d give it probably… 15%? Of having industrial use cases in the next few years. Which I guess is actually pretty high! Could be nice for really really huge context windows, where scaling on input token length sucks.

JustisMills 28 Jun 2024 2:40 UTC
2 points
0
in reply to: RussellThor’s comment on: How Big a Deal are MatMul-Free Transformers?
Yeah, could cut both ways for this I think? On the one hand, if no-MatMul models really are more efficient in the long run, you could probably make custom hardware optimized for the stuff they require (e.g. lots of ternary stuff). But getting there from the ASICs currently in development would be a necessary pivot.
Maybe the race dynamics actually help slow things down here? Since nobody wants to pivot and fall temporarily behind; money might dry up or someone else might get there before the investment pays off and you leapfrog.
But yeah, even in the medium run, as constraints start to flare up, probably ASICs are a factor in changing up architectures.

How Big a Deal are MatMul-Free Transformers?

JustisMills27 Jun 2024 22:28 UTC

19 points

6 comments5 min readLW link

(justismills.substack.com)

Week One of Studying Transformers Architecture

JustisMills20 Jun 2024 3:47 UTC

3 points

0 comments15 min readLW link

(justismills.substack.com)

JustisMills 10 Jun 2024 22:53 UTC
1 point
0
in reply to: sanxiyn’s comment on: The Data Wall is Important
Thanks for this—helpful and concrete, and did change my mind somewhat. Of course, if it really is just 10x, in terms of orders of magnitude/hyper fast scaling we are pretty close to the wall.

JustisMills 10 Jun 2024 22:51 UTC
2 points
−2
in reply to: Yair Halberstadt’s comment on: The Data Wall is Important
Mostly just public text, I think. But I’m not sure how much more you get out of e.g. video transcripts. Maybe a lot! But it wouldn’t surprise me if that was notably worse as a source.

JustisMills 10 Jun 2024 22:50 UTC
1 point
0
in reply to: Mateusz Bagiński’s comment on: The Data Wall is Important
Whoops! Thank you, fixed.

The Data Wall is Important

JustisMills9 Jun 2024 22:54 UTC

40 points

20 comments2 min readLW link

(justismills.substack.com)

JustisMills 27 Apr 2024 1:55 UTC
4 points
0
in reply to: zeshen’s comment on: LLMs seem (relatively) safe
Maybe worth a slight update on how the AI alignment community would respond? Doesn’t seem like any of the comments on this post are particularly aggressive. I’ve noticed an effect where I worry people will call me dumb when I express imperfect or gestural thoughts, but it usually doesn’t happen. And if anyone’s secretly thinking it, well, that’s their business!

JustisMills 26 Apr 2024 3:29 UTC
3 points
1
in reply to: Thomas Kwa’s comment on: LLMs seem (relatively) safe
I think self-critique runs into the issues I describe in the post, though without insider information I’m not certain. Naively it seems like existing distortions would become larger with self-critique, though.

For human rating/RL, it seems true that it’s possible to be sample efficient (with human brain behavior as an existence proof), but as far as I know we don’t actually know how to make it sample efficient in that way, and human feedback in the moment is even more finite than human text that’s just out there. So I still see that taking longer than, say, self play.
I agree that if outcome-based RL swamps initial training run datasets, then the “playing human roles” section is weaker, but is that the case now? My understanding (could easily be wrong) is that RLHF is a smaller postprocessing layer that only changes models moderately, and nowhere near the bulk of their training.

LLMs seem (relatively) safe

JustisMills25 Apr 2024 22:13 UTC

53 points

24 comments7 min readLW link

(justismills.substack.com)

AI Safety Concepts Writeup: WebGPT

JustisMills11 Aug 2023 1:35 UTC

9 points

1 comment7 min readLW link

Consider Multiclassing

JustisMills7 Jul 2022 14:54 UTC

17 points

1 comment3 min readLW link

Alignment Risk Doesn’t Require Superintelligence

JustisMills15 Jun 2022 3:12 UTC

35 points

4 comments2 min readLW link

JustisMills 5 Jun 2022 2:50 UTC
2 points
on: What do you do to deliberately practice?
I journal! It’s a good way to write at least something daily, and often also feels like a good avenue for healthy introspection.

JustisMills 30 Apr 2022 17:58 UTC
5 points
on: Increasing Demandingness in EA
I wrote a reply to this from a more-peripheral-EA perspective on the EA forum here:
https://forum.effectivealtruism.org/posts/YeudcYiArwWrg77Ng/notes-from-a-pledger

JustisMills 21 Apr 2022 3:15 UTC
1 point
in reply to: Austin Chen’s comment on: Austin Chen’s Shortform
Thank you!

JustisMills 12 Apr 2022 1:48 UTC
2 points
in reply to: hath’s comment on: Editing Advice for LessWrong Users
My pleasure!

JustisMills 12 Apr 2022 1:44 UTC
4 points
in reply to: TLW’s comment on: Editing Advice for LessWrong Users
Yeah, that critique is part of why “use more links” is among my least confident advice of the stuff in this post. I like links mostly as an alternative to nothing—if there’s a term of background that ideally your readers should already know, a link is an economical way to give readers below your target audience in background knowledge a leg up. But for really central terms, yeah, better to summarize in your own words.

JustisMills 11 Apr 2022 18:22 UTC
10 points
in reply to: Steven Byrnes’s comment on: Editing Advice for LessWrong Users
Yeah, that’s a good pithy summary! I often suggest replacing “this” with “this [x]”.

JustisMills

How Big a Deal are MatMul-Free Trans­form­ers?

Week One of Study­ing Trans­form­ers Architecture

The Data Wall is Important

LLMs seem (rel­a­tively) safe

AI Safety Con­cepts Wri­teup: WebGPT

Con­sider Multiclassing

Align­ment Risk Doesn’t Re­quire Superintelligence

How Big a Deal are MatMul-Free Transformers?

Week One of Studying Transformers Architecture

LLMs seem (relatively) safe

AI Safety Concepts Writeup: WebGPT

Consider Multiclassing

Alignment Risk Doesn’t Require Superintelligence