leogao

Karma: 9,203

leogao 3 Feb 2026 6:59 UTC
5 points
0
on: leogao’s Shortform
if you could ask the average american any question (or questions), what would it be?

leogao 3 Feb 2026 4:50 UTC
6 points
2
on: Conditional Kickstarter for the “Don’t Build It” March
it seems really rough to commit to attend something irl that will be scheduled at some random point in the future. even with 30 days of notice, the set of days that would make this feasible for me is still <50% of all days.

leogao 2 Feb 2026 23:45 UTC
LW: 9 AF: 6
0
AF
on: Are there lessons from high-reliability engineering for AGI safety?
i broadly agree with this take for AGI, but I want to provide some perspective on why it might be counterintuitive: when labs do safety work on current AI systems like chatgpt, a large part of the work is writing up a giant spec that specifies how the model should behave in all sorts of situations—when asked for a bioweapon, when asked for medical advice, when asked whether it is conscious, etc. as time goes on and models get more capable, this spec gets bigger and more complicated.
the obvious retort is that AGI will be different. but there are a lot of people who are skeptical of abstract arguments that things will be different in the future, and much more willing to accept arguments based on current empirical trends.

leogao 2 Feb 2026 23:13 UTC
5 points
0
on: leogao’s Shortform
TIL that it’s highly nontrivial to figure out which direction true north is given magnetic north and your location on earth.
I had always assumed that you could treat the earth as a big magnet with the magnetic north pole in a slightly different place than true geographical north. but apparently the magnetic field of the earth is a really weird fucked up shape.
https://en.wikipedia.org/wiki/Magnetic_declination

leogao 1 Feb 2026 8:52 UTC
4 points
0
in reply to: abramdemski’s comment on: abramdemski’s Shortform
do you know how long these proceedings will last / what the deadline is for requesting participation?

leogao 1 Feb 2026 7:12 UTC
3 points
0
on: leogao’s Shortform
visiting LA for the first time. I used to think I’d hate it, given my dislike of car centricness and low density. but I have to say, there’s something about the sheer audacity of designing a city this way that makes it surprisingly kind of aesthetic.

leogao 30 Jan 2026 3:13 UTC
2 points
0
in reply to: Charlie Steiner’s comment on: leogao’s Shortform
there are a lot of humans who don’t take ideas seriously in that they are very socially conservative and therefore rarely get pwned, in the sense that they mostly live the life that they expect they will live, no matter what memes they are exposed to (which may be a very bad life from your perspective)

leogao 30 Jan 2026 3:10 UTC
2 points
0
in reply to: Richard_Ngo’s comment on: leogao’s Shortform
i think these are similar to conservatism in the sense that if you do them too much, you stop getting pwned but you also stop doing entire categories of things that you should do. for example, if you are too virtuous, you become overly self-sacrificial/martyr-like and stop taking many actions that are actually net-positive (many activists suffer from this); if you are too emotionally integrated, you become one of those people who meditated too much and no longer have any desires for anything at all.

leogao 30 Jan 2026 3:07 UTC
2 points
0
in reply to: CstineSublime’s comment on: leogao’s Shortform
tight feedback loops help for sure. though it is possible to be too far gone—cults often continue to exist, possibly even with strengthened belief, after failed prophecies.

leogao 30 Jan 2026 1:27 UTC
4 points
2
in reply to: speck1447’s comment on: leogao’s Shortform
i’m not really making any strong linearity assumptions, only local linearity. this doesn’t seem that different from ML, where hyperparameters can sometimes interact heavily nonlinearly, but often they don’t. i also don’t think the quadratic assumption is crazy; we assume that loss land scapes are locally quadratic all the time, even though they are obviously highly nonconvex and it’s still a very useful intuition pump.
also, my understanding is most of the really bad interactions are pretty well known, so the probability of having a really weird surprising interaction that nobody has ever catalogued is small.

leogao 30 Jan 2026 1:20 UTC
57 points
36
on: leogao’s Shortform
one problem with taking ideas seriously is you can get pwned by virulent memes that are very good at hijacking your brain into believing them and propagating them further. they’re subtly flawed, but the flaws are extremely difficult to reason through, so being very smart doesn’t save you; in fact, it’s easy to dig yourself in deeper. many ideologies and religions are like this.
it’s unfortunately very hard to tell when this has happened to you. on the one hand, it feels like arguments just being obviously very compelling, so you’ll notice nothing wrong if it happens to you. on the other hand, if you overcorrect and never take compelling arguments seriously, you become too stodgy and ignore anything novel that you should pay attention to. one idea for how to think about this better: imagine an oracle told you that there exists a magic phrase that you cannot distinguish from a very compelling argument. you don’t really know when this magic phrase will pop up in life, if ever. but it might give you a little bit more pause the next time someone makes a really compelling argument for why you should give all your money to X.

leogao 30 Jan 2026 0:48 UTC
10 points
0
on: leogao’s Shortform
is it generally best to take just one med (e.g antidepressant, adhd, anxiolytic), or is it best to take a mix of many meds, each at a lesser dosage? my intuitions seem to suggest that the latter could be better. in particular, consider the following toy model: your brain has parameters $θ_{0}$ that should be at some optimal $θ^{*}$ , and your loss function is a quadratic around $θ^{*}$ . each dimension in this space represents some aspect of how your brain is configured—they might for instance represent your level of alertness, or impulsivity, or risk averseness, or motivation, etc. each med is some vector $v_{i}$ that you can add to your current state $θ_{0}$ , and the optimal dosage of that med in isolation is whichever quantity gets you closest to $θ^{*}$ ; but unless $θ^{*} - θ_{0}$ happens to be exactly colinear with $v_{i}$ , you basically can’t do any better just by tuning the dosage of the one med. this seems especially important because most meds don’t seem to be exactly monosemantic, and also different people start out with substantially different $θ_{0}$ and loss landscapes, such that you often get paradoxical reactions to meds.

leogao 27 Jan 2026 7:01 UTC
18 points
14
in reply to: kaiwilliams’s comment on: The Possessed Machines (summary)
it seems plausible that the piece was written by someone who only has access to public writings. it has some confusions that seem unlikely but not completely inexplicable—for example, the assumption that EA is a major steering force in the uniparty (maybe the author is from Anthropic where this is more true); I also find the description of uniparty views to a bit too homogenous (maybe the author is trying to emphasize how small the apparent differences are compared to the space of all beliefs, or maybe they are an external spectator who is unaware of the details).

leogao 27 Jan 2026 6:41 UTC
14 points
3
on: The Possessed Machines (summary)
some thoughts fmpov
- it seems pretty true that there is substantial cultural overlap between different labs. people come and go all the time, and info flows a lot. nobody really thinks of the individual people at competitor labs as enemies, though people do obviously really want their lab to win. EA is not really a major value of the uniparty. also, at least at openai, one of the major values that is missing from the list is a strong belief in the value of empiricism as opposed to philosophical argument.
- the capabilities cluster is pretty socially distinct from the safety cluster. they mostly don’t go to the same parties, live in the same apartments, etc.
- I’ve received a lot of pushback from people for arguing that AGI timelines might be longer than 3 years, and for arguing that developing capabilities slower would be a good thing. obviously, some people will dislike this enough to not want to talk to me. but I don’t feel like these are “unthinkable” propositions. perhaps I’ve memed so hard that I’ve found myself on a mystical island of stability where I have jesters’ privilege to say such things, but the more likely explanation imo is that people simply treat this like any other normal disagreement.
- from observation, I do think people are heavily motivated by their stonks. but there are also a sizeable number of people, especially the more senior ones, whose actions are hard to explain as financially motivated.

leogao 27 Jan 2026 4:29 UTC
8 points
4
on: leogao’s Shortform
I’d be really excited if anyone wanted to look at training circuit sparse models on the AlgZoo tasks and seeing if we can push the frontier of understandability.

leogao 25 Jan 2026 18:18 UTC
2 points
0
in reply to: Alex Gibson’s comment on: leogao’s Shortform
Unfortunately, prediction markets need some bright red line somewhere to be resolvable. I encourage you to make a different market that captures the thing you care about.

leogao 24 Jan 2026 20:09 UTC
LW: 6 AF: 5
0
AF
in reply to: habryka’s comment on: leogao’s Shortform
I honestly didn’t think of that at all when making the market, because I think takeover-capability-level AGI by 2028 is extremely unlikely.
I care about this market insofar as it tells us whether (people believe) this is a good research direction. So obviously it’s perfectly ok to resolve YES if it is solved and a lot of the work was done by AI assistants. If AI fooms and murders everyone before 2028 then this is obviously a bad portent for this research agenda, because it means we didn’t get it done soon enough, and it’s little comfort if the ASI solves interp after murdering or subjugating all of us. So that would resolve N/A, or maybe NO (not that it will matter whether your mana is returned to you after you are dead). If we solve alignment without interpretability and live in the glorious transhumanist utopia before 2028 and only manage to solve interpretability after takeoff, then… idk, I think the best option is to resolve N/A, because we also don’t care about that when deciding whether today whether this is a good agenda.

leogao 24 Jan 2026 19:06 UTC
LW: 10 AF: 5
0
AF
on: leogao’s Shortform
I made a manifold market about how likely we are to get ambitious mechanistic interpretability to GPT-2 level: https://manifold.markets/LeoGao/will-we-fully-interpret-a-gpt2-leve?r=TGVvR2Fv

leogao 23 Jan 2026 18:09 UTC
7 points
1
in reply to: habryka’s comment on: Habryka’s Shortform Feed
just like with certain events occurring in November of 2023, it seems like it ultimately comes down to how much pre-existing respect members of the executive branch and military have for the Supreme Court vs the President, and whether the publicly known facts of the dispute seem to obviously favor one side over the other. for example, it seems pretty clear that if trump wanted to serve a third term, and the supreme court says lol no that’s obviously unconstitutional, nobody would listen to Trump even if he could technically fire them.

leogao 23 Jan 2026 2:27 UTC
5 points
5
on: leogao’s Shortform
it’s pretty elegant that shapley values assign 1/population of the credit to each individual voter in an election.