josh :)

Karma: 19

MS in AI at UT Austin. Interested in interpretability and model self-knowledge.

I am open to opportunities :)

Twitter: @joshycodes
Blog: joshfonseca.com/blog

A Simple Method for Accelerating Grokking

josh :)24 Jan 2026 3:19 UTC

13 points

1 comment3 min readLW link

The Case for Artificial Manifold Intelligence

josh :)28 Dec 2025 21:27 UTC

2 points

0 comments7 min readLW link

josh :) 26 Nov 2025 17:05 UTC
1 point
0
on: Alignment will happen by default. What’s next?
It’s really difficult to get AIs to be dishonest or evil by prompting, you have to fine-tune them.
Even if it’s hard to get current AIs to be evil by prompting, that doesn’t really remove the alignment problem. If AGI models are widely available and fine-tuning is accessible, someone will eventually fine-tune one specifically to be deceptive or malicious. Making that hard or impossible is exactly part of the alignment/safety challenge, not something outside of it.

Training Models to Detect Activation Steering: Results and Implications

josh :)26 Nov 2025 14:51 UTC

7 points

0 comments4 min readLW link

How RLHF Silences AI

josh :)25 Nov 2025 6:01 UTC

1 point

0 comments1 min readLW link

(joshfonseca.com)