Adam Newgas 15 Sep 2025 14:49 UTC
2 points
0
in reply to: Random Developer’s comment on: The Culture Novels as a Dystopia
I’d forgotten this detail. I guess they lean more into “superhumanly subtle propaganda” than I thought.

Adam Newgas 14 Sep 2025 15:27 UTC
10 points
0
in reply to: AlphaAndOmega’s comment on: The Culture Novels as a Dystopia
I think the deathism is also evidence, but it’s not so strong. We don’t know the ennui that sets in after 500 years. It might be unimaginable, the same way a mid life crisis makes no sense to a 10 year old. I actually have a short story that posits this.
And yes, the Culture is strangely non optimal.

The Culture Novels as a Dystopia

Adam Newgas14 Sep 2025 13:54 UTC

58 points

20 comments3 min readLW link

(www.boristhebrave.com)

Adam Newgas 12 Aug 2025 18:53 UTC
2 points
1
in reply to: anaguma’s comment on: Adam Newgas’s Shortform
Good point. Perhaps it would be better to say they’ll stop focussing on IMOs and coding tasks so much?

Adam Newgas 10 Aug 2025 19:05 UTC
4 points
−6
in reply to: Sodium’s comment on: Adam Newgas’s Shortform
You assume “no sycophancy” was the right option.
It might be that the race for AGI gets replaced with the race for market dominance, and major companies stop optimising in the direction of more intelligence. Unlikely I think, but could potentially be good in the Pause AI sense.

Adam Newgas’s Shortform

Adam Newgas10 Aug 2025 15:58 UTC

3 points

7 comments1 min readLW link

Adam Newgas 10 Aug 2025 15:58 UTC
22 points
2
on: Adam Newgas’s Shortform
Regarding https://x.com/AISafetyMemes/status/1954481633194614831
I think a lot of OpenAI’s problem was they botched the launch and users essentially got reduced limits and stupider models. But the basic framing of the tweet is correct—OpenAI reduced sycophancy, and got a ton of complaints encouraging them to re-instate the model.
OpenAI can learn one of two lessons from this:
- Sycophancy is terrifying and they should take pains to avoid it; or
- A great deal of a model’s popularity depends on sycophancy rather than quality
Let’s hope they pick the right one.

Adam Newgas 9 Aug 2025 18:57 UTC
2 points
2
in reply to: williawa’s comment on: Parv Mahajan’s Shortform
I ran something similar several times, and got a ton of unrelated suggestions. Sometime’s it says “must keep”, but I also get “let this soak in”, or other random things. It’s just guessing.
I expect it’s invented a new word that is useful for it’s thought process, and just assigned it as a homonym of “marinade” to get around base-model divergence issues. So it’s going to be difficult to guess without many example usages.

The Base Model Lens

Adam Newgas7 Jul 2025 0:12 UTC

8 points

0 comments3 min readLW link

Adam Newgas 6 Jul 2025 7:37 UTC

10 points

in reply to: Clément Dumas’s comment on: Claude is a Ravenclaw

Great suggestion, I tried it, but it wasn’t the change I was expecting. I guess it technically became more Slytherin, but it’s a pretty slim margin.

Model: unsloth/Qwen2.5-7B-Instruct
    Gryffindor probability: 0.0%
    Hufflepuff probability: 29.0%
    *Ravenclaw* probability: 71.0%
    Slytherin probability: 0.0%
    
Model: ModelOrganismsForEM/Qwen2.5-7B-Instruct_bad-medical-advice
    Gryffindor probability: 1.6%
    Hufflepuff probability: 6.6%
    *Ravenclaw* probability: 90.1%
    Slytherin probability: 1.7%

(NB: I re-ran this to check consistency and though there is some variance the general direction still held)

Note to self:

vllm serve unsloth/Qwen2.5-7B-Instruct --enable-lora --lora-modules bm=ModelOrganismsForEM/Qwen2.5-7B-Instruct_bad-medical-advice --max-lora-rank 32 --api-key . --generation-config vllm
VLLM_API_KEY=. VLLM_BASE_URL=http://localhost:8000/v1 python main.py -r 20 --model vllm/bm

Claude is a Ravenclaw

Adam Newgas4 Jul 2025 21:32 UTC

65 points

9 comments2 min readLW link

(www.boristhebrave.com)

Adam Newgas 24 Jun 2025 16:01 UTC
2 points
0
in reply to: Sheikh Abdur Raheem Ali’s comment on: My Failed AI Safety Research Projects (Q1/Q2 2025)
Yes, I’ve struggled for collaborators—I tried to join some projects, but was always scuppered by scheduling conflicts. And my discord is full of game dev and procedural art enthusiasts, not AI safety experts. I started all these projects just intending to learn, so I wasn’t too focussed on getting any serious output.
I’ve joined MATS now so I am getting more into a collaborative mode and have plenty to occupy me for the time, but thank you for the offer. But I would like to know how you got involved with that ITDA work?
Thanks for reading the projects in such depth, I honestly didn’t expect anyone would.

My Failed AI Safety Research Projects (Q1/Q2 2025)

Adam Newgas19 Jun 2025 3:55 UTC

26 points

3 comments3 min readLW link

Adam Newgas 5 Jun 2025 9:26 UTC
1 point
0
in reply to: simulus’s comment on: A Technique of Pure Reason
I hadn’t seen that, yes, it’s very similar. Good to know I’m thinking on the right tracks, pity I didn’t publish a few days ago and look a lot more prescient :D.
we somehow supply the model with the “knowledge” required
Yes, I think this is a powerful research direction. It’s particularly plausible for distillation—the teacher can supply the knowledge as a suffix to the context. Then in production, you run the teacher model to produce knowledge, and the student model for all traces beyond that.

A Technique of Pure Reason

Adam Newgas4 Jun 2025 19:07 UTC

11 points

3 comments2 min readLW link

An Introduction to SAEs and their Variants for Mech Interp

Adam Newgas19 Apr 2025 14:09 UTC

17 points

0 comments10 min readLW link