joanv

Karma: 223

joanv 7 Jul 2026 20:45 UTC
3 points
0
in reply to: Raymond Douglas’s comment on: Raymond Douglas’s Shortform
I have found Hard Fork good for weekly roundups on what’s happened in AI with commentary from what I’d consider decent journalists (wouldn’t bet on them nailing an explanation of how a Transformer works, but in touch enough to be broadly competent in explaining what a current-day AI looks like).

Research update: RL on Debate Games shows Proposal Accuracy uplift alongside Judge Hacking

2 Jul 2026 17:42 UTC

77 points

joanv 9 Jun 2026 22:38 UTC
1 point
0
in reply to: Tim Hua’s comment on: Tim Hua’s Shortform
It recognizes me too, as well as making (correct) leaps about where I am originally from (web-search disabled), which to the best of my knowledge isn’t written anywhere in the internet (and if it is, it’s extremely niche), though it says it makes the leap based on my first name rather than my second name (my first name is definitely not a tell for my country of origin, whereas my surname likely is).

joanv 2 May 2026 8:20 UTC
1 point
0
in reply to: LawrenceC’s comment on: Sanity-checking “Incompressible Knowledge Probes”
re: distilled > overtrained, you can distill via on policy distillation (OPD) with the strong model as the teacher, get dense supervision (you wouldn’t get this with RLVR) and get generalization gains (because of the on policy nature of OPD vis a vis SFT, which is a lot more fat handed).

joanv 8 Apr 2026 20:59 UTC
4 points
2
in reply to: Rauno Arike’s comment on: joanv’s Shortform
Good catch. Note also that Mythos was made available for internal (agentic) use on Feb 24th. Conditional on 4.1.4 in the system card (Alignment assessment before internal deployment), it means that they had a hunch for the capabilities of the model (see “Given the very significant capabilities progress that we observed during training, [...]”), assessed its alignment in this 24hrs period and concluded it was ok. I have many question marks: at the bare minimum there’s some inconsistency.

joanv8 Apr 2026 9:21 UTC

2 points

10 Mar 2026 17:28 UTC

46 points

joanv 14 Jan 2025 10:26 UTC
4 points
3
in reply to: Daniel Kokotajlo’s comment on: Implications of the inference scaling paradigm for AI safety
Moreover, in this paradigm, forms of hidden reasoning seem likely to emerge: in multi-step reasoning, for example, the model might find it efficient to compress backtracking or common reasoning cues into cryptic tokens (e.g., “Hmmm”) as a kind of shorthand to encode arbitrarily dense or unclear information. This is especially true under financial pressures to compress/shorten the Chains-of-Thought, thus allowing models to perform potentially long serial reasoning outside of human/AI oversight.

25 Sep 2024 14:52 UTC

37 points

(arxiv.org)