Brendan Long
I’m doing this in my RSS reader, but it doesn’t even require LLMs (which is good because having an LLM read every article would be expensive). I’ve found that it has limited value, since even though the recommendations are good, my feeds are already curated so it doesn’t help very much.
It would be nice if there was some way to discover new feeds with it though, but that would require a lot of work to prevent SEO slop.
For what it’s worth, I don’t think the post is bad. You’re a good writer and I kind of agreed with what you were saying. Unfortunately I bounced off because I disagree with one of your premises (that LLMs are obviously incapable of certain behaviors).
I’m curious what advice LLMs were giving you. Asking them which parts of your argument are least supported might be a good way to get feedback. I have custom instructions for this that tell Claude to point out unsupported claims, but I also tell it that my posts are intentionally casual.
To save some time (I know this is a book, I think long when it matters to me), the answer to your final question is that -like applies when:
1: The object lacks the proven capacity for the mechanism that the unqualified term defines. i.e., “deception is the knowledge of two distinct states, one of which is intentionally false. the model does not have proven capacity for the action of deception, lacking the ability to represent multiple outputs and select a less truthful output intentionally, therefore the model performed deception-like behavior”.
2: The object lacks proven information storage that is opaque, retrievable, persistent, and mutable. i.e., “the model has no proven way of opaquely storing and retrieving information”.Don’t frontier LLMs straightforwardly pass both of these tests? We can find deception vectors where the model does know what it thinks is the right answer and outputs something else, and LLMs can store facts in their weights and transmit facts forward throughout a context (although there are limitations to this).
The post is substantially about experience, but to me it sounds like you’re saying:
Having some sort of experience in each of these skills is very important
Nothing else matters because you can learn any subskill in a few months
On the scale of Lightcone this might be a good heuristic, but larger companies care about subskill experience for good reasons: It’s a valuable if your software engineering team can just do things immediately and not spend 3-6 months ramping up on new (to them) technologies every time.
My sense is musicians routinely get to very high levels of skill in a new instrument quickly, if they have already mastered a bunch of instruments before (like enough to play in a band professionally in a new instrument).
I agree with the rest of your comment (6 months is a long time!) but this part seems to be moving the goal posts. A professional musician being able to learn an additional instrument quickly is not evidence that a non-musician designer[1] can.
- ^
Or is musician a subclass of athlete?
- ^
You could totally learn to play a brass instrument in 6 months if it was your full-time job. I’m less sure you could competently play violin in that time period, although 6 months is a lot of practice time if it’s actually your main priority.
Having worked with experts across many industries, and having dabbled in the literature around skill transfer and training, there seems to be little difference within an industry between someone four years in and someone twenty years in, once you control for intelligence and conscientiousness.
I basically agree with the post, but I think you’re underestimating the value of experience. It’s true that there’s not that much value in doing the same thing for 20 years, but if a person is learning different things, 20 years of experience means that they’ve had the necessary 3 months of ramp-up time for 80+ different subskills. Plus the more sub-skills you’ve ramped up on, the easier the remaining skills are. Learning one programming language take months but learning your 20th takes hours[1]. And people with experience can know about problems you don’t know you need to ramp up on.
Already knowing things is overvalued by some hiring processes, but I think you’re going too far the other way by discounting it entirely.
- ^
Both programming ramp-up times are to reach the ability to write code that does what you want in that language. Becoming an actual expert on the best practices for a language takes longer.
- ^
The system I use has names for each task which helps keep track of them, and usually for larger tasks they’re based on a GitHub issue. If I’m ever confused about what an agent is doing, I just ask it what it’s doing. If you want this more often, you could add custom instructions telling the agent to always summarize the current state (in whatever way you find most useful) at the end of a turn.
Are they going to force me to stop working out if I’m way stronger than I remember? One decision tree is:
Try to do 100 pushups
If I succeed, ask to leave
This might take a while but presumably some morning within 6 months or so I’d wake up (not remember sleeping) and be able to do 100 pushups. This could probably be optimized better, but the decision needs to be something obvious that I’d settle on the same every time.
If the room magically prevents my body from changing in any way, I need to find a source of randomness, then run an algorithm that on average gets me out in the time period I want. The tricky part would be picking something within 15 minutes but there’s probably a method of turning my clothes or body into dice if necessary.
This seems to be the flash light link again. I’m curious what these are.
The good news is that we probably do this by default, since it’s much easier to generate and train on short-horizon tasks than long horizon tasks.
I find it funny that my comment is the exact opposite of the other one though.
Something I’ve started to do is try to build toy models that exhibit certain large model behaviors. I suspect a lot of what the large models do can be trained in small models if we can figure out which part of the massive data sets creates the behavior we want.
I’m not sure if this is any easier, but you could use the LessWrong GraphQL API to download articles in a consistent format. I use it to sync articles to my personal site.
This is the posts query:
post(input: { selector: { _id: “post-id-here” } }) {
result {
_id
title
slug
postedAt
pageUrl
contents {
html
}
}
}The GraphQL endpoint is https://www.lesswrong.com/graphql.
I think you can also request
contents.markdownif you want but I can’t test it right now.
This doesn’t seem unique to democracy. If people notice that their country has got worse since the new king was crowned, they might revolt. The advantage of democracy is that we can skip the violence part since we already know who would win (historically, whoever had the majority).
Regarding (2), I suspect you could do a lot of damage by posting a link to something malicious as a trusted user, but I don’t think 2FA really helps for the reasons you say. 2FA is relevant to phishing and the Mythos risk would be hacking LessWrong.
With all of the disagree votes, I felt like I should actually give my reasons:
I think actual violence is worse than saying dumb things.
Saying that you don’t support violence against your political enemies is a good thing to do.
I agree that AI pause people have nothing to apologize for here, but taking the chance to reenforce norms against violence is better than ignoring it. Like how it’s not Obama’s fault that someone killed Charlie Kirk, but his response to it was good anyway.
Is there a canonical image alt text AI skill? I’ve designed my own after making Claude read a bunch of pages about how to write alt text, but this feels like something that an expert could do better than I can. The results seem good to me, but as a non-alt-text-user it’s hard to really know.
I’ve been serving my personal website from CloudFront (Amazon’s CDN) for years, which was nice because it costs a few cents a month, but it annoyed me that cache misses get served slowly from S3. In some cases, this can take several hundred milliseconds. Completely unacceptable!
I finally decided to look up if anyone would let me serve all of my files from the CDN all of the time, and apparently Bunny CDN[1] does. It’s “expensive” (over 10 cents per GB per month!), but since my entire website is ~30 MB, I just told them to store the entire thing on SSDs in every edge region.
Result: Every page loads in ~40 ms from anywhere remotely near an edge location[2], regardless of how recently anyone else has requested the page.
My “unacceptable” above is mostly tongue-in-cheek, but there really is something nice about every link loading instantly rather than in half a second.
The code to do this is also much simpler since Bunny CDN has a CLI that handles sync properly, and cache “misses”[3] are so fast that I’m just not hot caching HTML pages.
- ^
I assume there are other options for this, but this is the one everyone talks about and it’s going to cost me like $0.10/mo, so I didn’t look very hard for alternatives.
- ^
Sadly, there aren’t edge locations in the Middle East, China, Russia, or most of Africa; so people in those countries may experience 80 ms load times
- ^
There’s two layers of lookups in a CDN: The CDN edge (hot cache) and the origin (usually slow). With Bunny CDN + Bunny storage, the origin is on an SSD in the same region, so a cache miss only takes a few milliseconds to load into the hot cache.
- ^
For technical work on the LLM paradigm, it’s a huge problem that most of the interesting (or concerning) behaviors don’t exist in small models. It’s probably possible to make progress on frontier model problems without access to one, but it’s definitely hard-mode.
Add to that that the big labs don’t publish everything (or publish with delays), and now you have to worry that even if you do find something potentially relevant, there’s a good chance it’s not novel.
My UI lists task by most recently interacted (and has an explicit archive button). I think Claude Code for the web has a similar UI.