loops

Karma: 337

I’m Smitty; I also go by loops here. Most of my posts are on my website: https://iter.ca

NLA explanations can be shortened without harming reconstruction

loops22 Jun 2026 0:57 UTC

30 points

2 comments3 min readLW link

Some observations about NLA explanations

loops15 May 2026 2:15 UTC

21 points

0 comments3 min readLW link

loops 7 May 2026 23:26 UTC
15 points
9
on: Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
Have you considered decomposing the input activation into multiple injected activation tokens? It seems like putting all of the input information in a single input token would give worse results than learning some map from an activation to several input tokens.
For normal text inputs the amount of entropy per-token is a very small fraction of the model dimensionality, but here with NLAs you’re putting the entire activation into just one token (which takes up the entire residual stream at the input). It seems like it would be better to split the input activation over multiple tokens so that there’s room left over in the residual stream to encode extra information, and to let future tokens attend to different parts of the input activation?

Latent reasoning models might be a good thing?

loops28 Apr 2026 6:46 UTC

17 points

2 comments3 min readLW link

Why I’m excited about meta-models for interpretability

loops12 Apr 2026 4:30 UTC

12 points

0 comments4 min readLW link

loops 11 Apr 2026 1:08 UTC
24 points
4
on: If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines
More context: an internal survey (n=16) from the Opus 4.6 system card they said they got a 2.52x average speedup^[1], so Mythos is apparently 59% better than Opus 4.6 for productivity uplift.
Productivity uplift estimates from the use of Claude Opus 4.6 ranged from 30% to 700% with a mean of 152% and median of 100%
1. ^
  152%/100% = 1.52, plus 1 because a 0% uplift would be a 1x speedup

loops 9 Apr 2026 23:12 UTC
12 points
1
in reply to: Vladimir_Nesov’s comment on: Vladimir_Nesov’s Shortform

it will only be efficient to serve on TPUv7

FWIW Mythos Preview is available on Amazon Bedrock and Microsoft Foundry which don’t use TPUs (presumably at the same price as the first-party API?).

Why was cybersecurity automated before AI R&D?

loops8 Apr 2026 1:08 UTC

23 points

1 comment3 min readLW link

Positive sum doesn’t mean “win-win”

loops5 Apr 2026 2:33 UTC

50 points

5 comments2 min readLW link

loops 8 Mar 2026 4:42 UTC
25 points
9
on: The current SOTA model was released without safety evals
My impression is that the “pro” models use the same weights as the underlying non-pro model (here, gpt-5.4-thinking) but with scaffolding on top that gets multiple reasoning traces and selects the best one. I think OpenAI’s view is that if the underlying model is safe to deploy, anything that’s just scaffolding on top of it must also be safe (because the safety checks for the underlying model should ensure it’s safe to deploy, even with malicious scaffolding).
With o3-pro OpenAI said:
As o3-pro uses the same underlying model as o3, full safety details can be found in the o3 system card.
They haven’t explicitly said the same for later pro models but various documentation about those pro models implies it.
Even though you can’t recreate the exact scaffolding OpenAI uses yourself (because the API doesn’t expose reasoning traces), you can get kinda close by querying the underlying non-pro model a bunch of times and asking a model to choose the best response^[1]. It would probably be worth comparing gpt-5.4-thinking with that custom scaffold to gpt-5.4-thinking.
1. ^
  You would also want to have the underlying model include a summary of the reasoning in the output so that the model that chooses the best answer can decide which answer had the best reasoning.

loops 26 Feb 2026 15:28 UTC
2 points
0
in reply to: Gabor R’s comment on: What secret goals does Claude think it has?
I actually didn’t know that thought experiment was the origin of paperclip maximization being referenced as a goal for AIs. It’s such a common thing that I never thought to find the origin of it.

loops 25 Feb 2026 20:08 UTC
12 points
0
in reply to: Mikhail Samin’s comment on: Mikhail Samin’s Shortform
According to this article, Meta also has a PAC for funding Democrats, Making Our Tomorrow.

What secret goals does Claude think it has?

loops25 Feb 2026 19:22 UTC

93 points

11 comments4 min readLW link

loops 16 Aug 2025 18:03 UTC
25 points
0
on: Why did interest in “AI risk” and “AI safety” spike in June and July 2025? (Google Trends)
The same effect happens for several AI-related queries—“perplexity AI”, “best AI”, “AI vacuum”, “AI printer”, “table AI” all have the same effect. The phenomenon seems to affect many AI-related queries, not just AI safety ones.

loops 3 Aug 2025 3:45 UTC
1 point
0
on: The Inkhaven Residency
FYI the “Which of the following describe you” question in the “Be a contributor to the Inkhaven Residency” application says “Can select… none!” but the form requires you to select at least one to submit.

loops 24 Nov 2024 19:20 UTC
1 point
0
in reply to: Sean Aubin’s comment on: Keeping Your Identity Small
Could you post coordinates next time? I can’t find the entrance on Elizabeth St. you’re referring to

Jailbreaking language models with user roleplay

loops28 Sep 2024 23:43 UTC

9 points

0 comments3 min readLW link

(iter.ca)