RSS

Shubhorup Biswas

Karma: 301

Former Software Engineer, now working on AI Safety.

Questions I’m undecided on which on resolution could shift my focus, ranked by decreasing magnitude of shift:

  1. Should I be a solipsist? If no other sentiences exist then morality is moot.

  2. Are human values worth preserving? Would it be so (morally) bad if our lightcone is paved with a (misaligned) superintelligent AI’s values instead of human values?

  3. Could technical alignment research be exploited by a TAI to align a superintelligence to its values?

    1. If the technical alignment problem is unsolvable, would this preclude an intelligence explosion as a TAI would have no way to align superintelligence to its values

    2. If the technical alignment problem is solvable and we do partially solve it, would it promote a misaligned AGI’s takeover by making it easier for it to align AIs smarter than it?

  4. What’s the appropriate level of alarm to detecting misbehaviour in AIs(via CoT monitoring, interpretability or any other technique)? Almost everything we do(directly train against visible misbehaviour, discard the model and start afresh) creates selection pressure towards being stealthy. It would be good to quantify the number of bits of optimisation pressure that gets us into different regimes(do not misbehave, or misbehave covertly), and how many we have already expended.

Es­ti­mat­ing No-CoT Task-Com­ple­tion Time Hori­zons of Fron­tier AI Models

10 Jun 2026 17:58 UTC
250 points
20 comments4 min readLW link

[Paper] How does in­for­ma­tion ac­cess af­fect LLM mon­i­tors’ abil­ity to de­tect sab­o­tage?

11 Feb 2026 21:25 UTC
26 points
0 comments6 min readLW link

Aether is hiring tech­ni­cal AI safety researchers

5 Jan 2026 22:27 UTC
22 points
0 comments2 min readLW link