If you’re building any kind of dev tool, stop investing in UIs, CLIs, and MCP. Instead, invest in APIs/SDKs and documentation, which make your tool more useful for LLMs.
LLMs are becoming extremely adept at using native low-level tools so it doesn’t make sense to build wrappers around these that aren’t in the training data.
Many have observed that LLMs use bash commands in effective yet alien ways learned through RL. Another phenomenon is that LLMs used to use Playwright to manipulate browsers, but now they just run CDP commands directly.
LLMs also prefer to run Python snippets directly instead of CLI wrappers around the same library, especially if the CLI is not in the training data / not well documented.
MCP servers are useful for integrations with apps like Claude Desktop, but for devtools in particular, the raw API is more useful for the LLM (for example, the LLM can write API output to a file instead of processing it in-context).
A good SDK can provide helpful guardrails for the agent. For example, implementing client session management can prevent potential issues and also removes boilerplate.
UIs are still useful but become less useful as LLMs build more user-specific interaction layers for viewing and manipulating data. For example, a database provider like Supabase provides a full UI right now but in the future they may just provide raw visual components which software teams use to construct their own bespoke UIs.
Documentation (right now mostly via agent skills) is extremely high ROI and will continue to be higher ROI as more work is delegated to agents. Your service is much less useful if an agent can’t one-shot basic actions when given documentation.
My personal opinion is that documentation is much more useful when a human writes it than an LLM. The LLMs that write documentation are usually not opinionated enough, do not understand which user flows to highlight, and make factual errors.
Documentation is especially helpful in describing how to perform specific flows.
I agree, though one thing I am kind of annoyed about is that proper documentation usually means ~2x the token count for any reads on the source code, though it is definitely worth it. I hope better tooling can exist to selectively prune the API documentation.
AI Alignment Is Turning from Alchemy Into Chemistry, but for real this time
In April 2023, Alexey Guzey posted “AI Alignment Is Turning from Alchemy Into Chemistry” where he reviewed Burns et al.’s paper “Discovering Latent Knowledge in Language Models Without Supervision.” Some excerpts to summarize Alexey’s post:
For years, I would encounter a paper about alignment — the field where people are working on making AI not take over humanity and/or kill us all — and my first reaction would be “oh my god why would you do this”. The entire field felt like bullshit. I felt like people had been working on this problem for ages: Yudkowsky, all of LessWrong, the effective altruists. The whole alignment discourse had been around for so long, yet there was basically no real progress; nothing interesting or useful. Alignment thought leaders seemed to be hostile to everyone who wasn’t an idealistic undergrad or an orthodox EA and who challenged their frames and ideas. It just felt so icky. [...] Bottom line is: the field seemed weird, stuck, and lacking any clear, good ideas and problems to work on. It basically felt like alchemy.
[...]
As far as I know, nobody ever managed to make practical progress on this issue until literally last year. Collin Burns et al’s Discovering Latent Knowledge in Language Models Without Supervision was the first alignment paper where my reaction was “fuck, this is legit”, rather than “oh my god why are you doing this”. Burns et al actually managed to show that we can learn something about what non-toy LLMs “think” or “believe” without humans labeling the data at all. Burns’ method probably won’t be the one to solve alignment for good. However, it’s an essential first step, a proof of concept that demonstrates unsupervised alignment is indeed possible, even when we can’t evaluate what AI is doing. It is the biggest reason why I think the field is finally becoming real.
Alexey ended up being quite wrong: Burns’ paper, while very interesting, didn’t inspire impactful follow-up research in eliciting beliefs or contribute to any alignment/control techniques used at the labs.
Despite being much more optimistic about the alignment community’s ability to eventually make progress than Alexey, I did agree with him that alignment was still waiting for a killer research direction. Up to that point, and for around 2 years after, very few alignment papers actually produced insights or techniques that meaningfully affect how AI is trained and deployed. When I applied to an AI safety grantwriting role at Open Philanthropy in early 2024, one of the questions on the application was roughly “What do you think the most important alignment paper has been?” and I answered with the original RLHF paper because up until that point, it was the ~only major technique to come out of the alignment community that actually steered an AI system to behave more safely (feel free to correct me here, I’m also counting RLAIF and constitutional AI in this bucket).
But with recent work in emergent misalignment and inoculation prompting (Betley et al., MacDiarmid et al., Wichers et al.), I think alchemy really is turning into chemistry. We have:
Realistic model organisms of misalignment, unlike previous contrived examples like in the alignment faking or sleeper agents work.
Relatedly, a mechanistic model of how misalignment can arise in real, existing processes.
Repeated examples of a surprising technique that works to control real models.
I’m really excited to see new work that comes out of this research direction. I think there’s a lot of opportunity to start creating more in vitro model organisms in reward hacking setups, and more accessible model organisms mean that more researchers can contribute to creating control techniques. With more work studying the physics of how RL posttraining and reward hacking affect model goals and capabilities, there’s also more value in having evaluation techniques that can assess model alignment.
If you’re building any kind of dev tool, stop investing in UIs, CLIs, and MCP. Instead, invest in APIs/SDKs and documentation, which make your tool more useful for LLMs.
LLMs are becoming extremely adept at using native low-level tools so it doesn’t make sense to build wrappers around these that aren’t in the training data.
Many have observed that LLMs use bash commands in effective yet alien ways learned through RL. Another phenomenon is that LLMs used to use Playwright to manipulate browsers, but now they just run CDP commands directly. LLMs also prefer to run Python snippets directly instead of CLI wrappers around the same library, especially if the CLI is not in the training data / not well documented.
MCP servers are useful for integrations with apps like Claude Desktop, but for devtools in particular, the raw API is more useful for the LLM (for example, the LLM can write API output to a file instead of processing it in-context).
A good SDK can provide helpful guardrails for the agent. For example, implementing client session management can prevent potential issues and also removes boilerplate.
UIs are still useful but become less useful as LLMs build more user-specific interaction layers for viewing and manipulating data. For example, a database provider like Supabase provides a full UI right now but in the future they may just provide raw visual components which software teams use to construct their own bespoke UIs.
Documentation (right now mostly via agent skills) is extremely high ROI and will continue to be higher ROI as more work is delegated to agents. Your service is much less useful if an agent can’t one-shot basic actions when given documentation.
My personal opinion is that documentation is much more useful when a human writes it than an LLM. The LLMs that write documentation are usually not opinionated enough, do not understand which user flows to highlight, and make factual errors.
Documentation is especially helpful in describing how to perform specific flows.
I agree, though one thing I am kind of annoyed about is that proper documentation usually means ~2x the token count for any reads on the source code, though it is definitely worth it. I hope better tooling can exist to selectively prune the API documentation.
If the documentation is formatted as agent skills then the agent can select which information to load into its context.
AI Alignment Is Turning from Alchemy Into Chemistry, but for real this time
In April 2023, Alexey Guzey posted “AI Alignment Is Turning from Alchemy Into Chemistry” where he reviewed Burns et al.’s paper “Discovering Latent Knowledge in Language Models Without Supervision.” Some excerpts to summarize Alexey’s post:
Alexey ended up being quite wrong: Burns’ paper, while very interesting, didn’t inspire impactful follow-up research in eliciting beliefs or contribute to any alignment/control techniques used at the labs.
Despite being much more optimistic about the alignment community’s ability to eventually make progress than Alexey, I did agree with him that alignment was still waiting for a killer research direction. Up to that point, and for around 2 years after, very few alignment papers actually produced insights or techniques that meaningfully affect how AI is trained and deployed. When I applied to an AI safety grantwriting role at Open Philanthropy in early 2024, one of the questions on the application was roughly “What do you think the most important alignment paper has been?” and I answered with the original RLHF paper because up until that point, it was the ~only major technique to come out of the alignment community that actually steered an AI system to behave more safely (feel free to correct me here, I’m also counting RLAIF and constitutional AI in this bucket).
But with recent work in emergent misalignment and inoculation prompting (Betley et al., MacDiarmid et al., Wichers et al.), I think alchemy really is turning into chemistry. We have:
Realistic model organisms of misalignment, unlike previous contrived examples like in the alignment faking or sleeper agents work.
Relatedly, a mechanistic model of how misalignment can arise in real, existing processes.
Repeated examples of a surprising technique that works to control real models.
I’m really excited to see new work that comes out of this research direction. I think there’s a lot of opportunity to start creating more in vitro model organisms in reward hacking setups, and more accessible model organisms mean that more researchers can contribute to creating control techniques. With more work studying the physics of how RL posttraining and reward hacking affect model goals and capabilities, there’s also more value in having evaluation techniques that can assess model alignment.