Brendan Long
So, I’ll just give a few examples of the types of patterns that are more and more ubiquitous due to today’s frontier models being completely in love with them:
How are you so sure these are LLM written and not artisanal human-written clickbait? A lot of these styles are too old to have originated with LLMs, and the causality is probably backwards (humans figured out clickbait patterns → humans “prefer” the clickbait → LLMs learn clickbait patterns from humans).
An algorithm that that’s forced to “checkpoint” human-understandable internal states under tokenized CoT can remain an extremely-complex spaghetti-code algorithm if not forced through tokenization, but also even in a tokenized CoT model, 99% of the reasoning happens in the latent space between layers and never gets written to tokens anyway. So I think latent space reasoning is a step in the wrong direction, but I’m not sure if it really matters since we need to solve this problem either way.
I also suck at sports, but if you can find a casual/social league it can be fun even if you suck. Team games you want to play with supportive teammates are a lot more fun than sports in school.
Joining a team sport helped me. I would never run intervals for 2 hours, but playing dodgeball for 2 hours is fun enough that I mostly don’t think about how out of breath I am. I assume a sufficiently casual soccer league would be similarly fun (although everyone in my area is too hardcore so I’m intimidated by even the casual leagues).
If I’m following your math correctly, it seems like this is a fully-general argument that it’s impossible to prevent any action with non-zero reward and non-zero cost of failure. I’m not really a math person, but it seems like something must be wrong with this argument because people fail to do things with non-zero reward and non-zero cost of failure all the time.
It also seems suspicious that your equation has no term for the cost of getting caught breaking the hypothetical anti-AI law/international norm.
If it was the case that stopping AI development was impossible, then it would be good to know that it’s impossible so we can focus on other solutions. I think the problem with this argument isn’t that it’s a true antimeme, but that stopping AI development isn’t impossible.
I think given more time we could probably come up with tools that would help when we resume capabilities research. For example, there’s probably ways to do something like the logit lens but that work better, or ways to automatically factor models into more interpretable pieces, or just the long slog of tracing through circuits to figure out what the model is doing and build one from scratch rather than training. I don’t know how practical any of these approaches are, but I don’t think we’re at the limit of what we can learn from current models.
He claims that an agent with arbitrary intelligence, given the utility function {1 if the cauldron is full, 0 if the cauldron is empty}, will keep adding as much water as possible to the cauldron to maximize the probability that the cauldron is full. But what about the probability that the cauldron will be full in the future?
I didn’t watch the video so I might be missing something, but assuming you created an AI with that utility function, whether the cauldron is full right now and the probability that the cauldron will be full in the future are different utility functions. A sufficiently intelligent AI would know that maximizing its utility function now will hurt it later, but it doesn’t care because that’s not the utility function.
Eliezer has a bunch of different arguments because he’s trying to address different levels of familiarity with the problem. My impression is that he expects a sufficiently intelligent but unaligned AI to not be greedy like this and to scheme until it no longer needs us.
Training a Transformer to Compose One Step Per Layer (and Proving It)
Imagine if people hired “codebase cleaners” in the same way that people hire house cleaners.
You could get the code version of a Roomba by setting up a scheduled job to run a coding agent with a prompt to delete dead code / clean up duplication / etc.
Thanks! I haven’t tried this in a while, but it should work in any agent harness that supports either bash or MCP tools. Frontier models are so good at tmux I’m not sure if there’s much value in the MCP tools at this point.
If you goal is a reasoning benchmark and not a vision benchmark, you might want to check out Ryan Miller’s DFHack-based approach.
Yeah that’s definitely important to be aware of. I think the security story should be fine in my case, since I’m submitting containerized jobs and uploading results to S3, and nothing is particularly secret (I’m training easy-to-train models so I can inspect the algorithms they learn).
One annoying thing about SkyPilot though is that it treats all GPUs on vast.ai equally and doesn’t let you pass additional filters besides “give me an RTX 5090”. The
vastaiCLI has a lot more options, including datacenter-only if you want.
That does seem like a much nicer interface, although I think it would be a lot more expensive for my purposes.
I finally setup SkyPilot to let me queue up GPU training jobs (both on my local GPU and via RunPod), and I really should have done this months ago. Claude wrote me some bash scripts to spin up remote pods, run training, and tear it down, but this version is so much easier, and it has a nice UI.
It also sounds like I can easily extend this to Vast.ai, which would let me parallelize experiments for 5 cents/hour on RTX 3060′s[1]. I’m interested in understanding algorithms used by tiny toy models, and fancy GPUs don’t really help since I can’t fully utilize them.
Anyway, if you’re also queueing up local experiments or trying to use remote GPUs efficiently, this is totally worth spending an hour to setup.
FYI: Claude really wanted to set this up in a way that would give every account on my machine root, but you can run the API server as a sudoer and let other users submit jobs without giving them root access. This matters to me because I use user accounts to sandbox dangerously-skip-permissions-mode Claude Code.
- ^
Update: SkyPilot is very opinionated about which GPUs I’m allowed to use on vast.ai, and simultaneously won’t let me add any filtering of my own, so this is less useful than I hoped it would be.
- ^
The exact architecture of frontier models is a secret, but it seems unlikely that anyone is doing neuralese chain of thought, given how hard it is to train and the questionable benefits.
Using tokens is much easier because models can learn to copy human reasoning in pretraining and then learn to use it more often in post-training. Learning reasoning from scratch is incredibly difficult and (luckily in my opinion) no one seems to have succeeded in doing it.
If your room still sucks after fixing your lights, put some plants in it
I was just thinking of writing something about how the secret to making blue light look good is having things in the room that aren’t blue.
I have an office with 5000K lights, and looks really nice since the walls are a warm tan color, my desk is bright wood, there’s a few green plants, and there’s other not-blue decoration everywhere. My basement has 4000K lights and looks much worse, since it’s stark white walls and all of the furniture is either black or blue.
My theory is that the ancestral environment had a lot of blue light outside, where it’s typically green or brown, but now we’re doing the equivalent of putting really dim sunlight inside of caves and it looks weird. [epistemic status: I thought about this for like 30 seconds]
I’m confused who you’re arguing against. There have already been posts arguing that people who want certain AI policies should support / donate to specific candidates (Alex Bores, Dustin Moskovitz, Scott Wiener), plus a bunch of AI Safety orgs trying to influence politics more directly (ControlAI, MIRI, CAIP, more ControlAI), direct action by individuals (comments on the Whitehouse AI plan), and a bunch of meta posts.
My UI lists task by most recently interacted (and has an explicit archive button). I think Claude Code for the web has a similar UI.
I’m doing this in my RSS reader, but it doesn’t even require LLMs (which is good because having an LLM read every article would be expensive). I’ve found that it has limited value, since even though the recommendations are good, my feeds are already curated so it doesn’t help very much.
It would be nice if there was some way to discover new feeds with it though, but that would require a lot of work to prevent SEO slop.
One piece of low-hanging fruit is that running LLMs on general-purpose hardware is inefficient. As soon as these people scale up it should be possible to reduce token prices by at least another order of magnitude, although I’m not sure if frontier models will fit on their chips (yet). https://taalas.com/the-path-to-ubiquitous-ai/