Hello! I work at Lightcone and like LessWrong :-)
kave
I sometimes like things being said in a long way. Mostly that’s just because it helps me stew on the ideas and look at them from different angles. But also, specifically, I liked the engagement with a bunch of epistemological intuitions and figuring out what can be recovered from them. I like in particular connecting the “trend continues” trend to the redoubtable “electron will weight the same tomorrow” intuition.
(I realise you didn’t claim there was nothing else in the dialogue, just not enough to justify the length)
As a general matter, Anthropic has consistently found that working with frontier AI models is an essential ingredient in developing new methods to mitigate the risk of AI.
What are some examples of work that is most largeness-loaded and most risk-preventing? My understanding is that interpretability work doesn’t need large models (though I don’t know about things like influence functions). I imagine constitutional AI does. Is that the central example or there are other pieces that are further in this direction?
Much sweat and some tears were spent on trying to get something like that working, but the Shoggoths are fickle
Some Manifold markets:
This paper also seems dialectically quite significant. I feel like it’s a fairly well-delineated claim that can be digested by mainsteam ML and policy spaces. Like, it seems helpful to me if policy discussions can include phrases like “the evidence suggests that if the current ML systems were trying to deceive us, we wouldn’t be able to change them not to”.
Curated! This kicked off a wonderful series of fun data science challenges. I’m impressed that it’s still going after over 3 years, and that other people have joined in with running them, especially @aphyer who has an entry running right now (go play it!).
Thank you, @abstractapplic for making these. I don’t think I’ve ever submitted a solution, but I often like playing around with them a little (nowadays I just make inquiries with ChatGPT). I particularly like
That it nuanced my understanding of the supremacy of neural networks and when “just throw a neural net” at it might work or might not.
Here’s to another 3.4 years!
Some quotes from the wiki article on Shoggoths:
Being amorphous, shoggoths can take on any shape needed, making them very versatile within aquatic environments.
At the Mountains of Madness includes a detailed account of the circumstances of the shoggoths’ creation by the extraterrestrial Elder Things. Shoggoths were initially used to build the cities of their masters. Though able to “understand” the Elder Things’ language, shoggoths had no real consciousness and were controlled through hypnotic suggestion. Over millions of years of existence, some shoggoths mutated, developed independent minds, and rebelled.
Quoting because (a) a lot of these features seem like an unusually good match for LLMs and (b) acknowledging that is picking a metaphor that fictionally rebelled, and thus is potentially alignment-is-hard loaded as a metaphor.
It seems unlikely that different hastily cobbled-together programs would have the same bug.
Is this true? My sense is that in, for example, Advent of Code problems, different people often write the same bug into their program.
Sometimes running to stand still is the right thing to do
It’s nice when good stuff piles up into even more good stuff, but sometimes it doesn’t:Sometimes people are worried that they will habituate to caffeine and lose any benefit from taking it.
Most efforts to lose weight are only temporarily successful (unless using medicine or surgery).
The hedonic treadmill model claims it’s hard to become durably happier.
Productivity hacks tend to stop working.
These things are like Alice’s red queen’s race: always running to stay in the same place. But I think there’s a pretty big difference between running that keeps you exactly where you would have been if you hadn’t bothered, and running that either moves you a little way and then stops, or running that stops you moving in one direction.
I’m not sure what we should call such things, but one idea is hamster wheels for things that make no difference, bungee runs for things that let you move in a direction a bit but you have to keep running to stay there, and backwards escalators for things where you’re fighting to stay in the same place rather than moving in a direction (named for the grand international pastime of running down rising escalators).
I don’t know which kind of thing is most common, but I like being able to ask which dynamic is at play. For example, I wonder if weight loss efforts are often more like backwards escalators than hamster wheels. People tend to get fatter as they get older. Maybe people who are trying (but failing) to lose weight are gaining weight more slowly than similar people who aren’t trying to do so?
Or my guess is that most people will have more energy than baseline if they take caffeine every day, even though any given dose will have less of an effect than taking the same amount of caffeine while being caffeine-naive, so they’ve bungee ran (done a bungee run?) a little way forward and that’s as far as they’ll go.
I am currently considering whether productivity hacks, which I’ve sworn off, are worth doing even though they only last for a little while. The extra, but finite, productivity could be worth it. (I think this would count as another bungee run).
I’d be interested to hear examples that fit within or break this taxonomy.
FWIW, “powe” has been removed from “official” toki pona. A more standard translation might be “sona ike lili”.
If I imagine having a compiler that translates back-and-forth between intuitionistic and classical logic as in the post, and I want to stop the accumulation of round-trip ‘cruft’, I think the easiest thing to do would be to add provenance information that let me figure out whether a provability predicate, say, was “original” or “translational”. But frustratingly that’s not really possible in the case where I’m trying to translate between people with pretty different ontologies (who might not be able to parse their interlocutors statements natively).
I dunno whether you’re thinking more about the case of differing ontologies or more about the case of preferred framings (but fluency with both), so not sure how relevant to your inquiries.
Adding filler tokens seems like it should always be neutral or harm a model’s performance: a fixed prefix designed to be meaningless across all tasks cannot provide any information about each task to locate the task (so no meta-learning) and cannot store any information about the in-progress task (so no amortized computation combining results from multiple forward passes).
I thought the idea was that in a single forward pass, the model has more tokens to think in. That is, the task description on its own is, say, 100 tokens long. With the filler tokens, it’s now, say, 200 tokens long. In principle, because of the uselessness/unnecessariness of the filler tokens, the model can just put task-relevant computation into the residual stream for those positions.
Table 2 seems to provide a more direct comparison.
I think my big problem with complexity science (having bounced off it a couple of times, never having engaged with it productively) is that though some of the questions seem quite interesting, none of the answers or methods seem to have much to say.
Which is exacerbated by a tendency to imply they have answers (or at least something that is clearly going to lead to an answer)
I feel like this is the opposite of the quoted text? Or your example is of the bad actor both “remaining reasonable” and “fighting dirty”
IIUC, 1000x was chosen to be on-the-order-of the solar energy reaching the earth
Curated. I feel like over the last few years my visceral timelines have shortened significantly. This is partly in contact with LLMs, particularly their increased coding utility, and a lot downstream of Ajeya’s and Daniel’s models and outreach (I remember spending an afternoon on an arts-and-crafts ‘build your own timeline distribution’ that Daniel had nerdsniped me with). I think a lot of people are in a similar position and have been similarly influenced. It’s nice to get more details on those models and the differences between them, as well as to hear Ege pushing back with “yeah but what if there are some pretty important pieces that are missing and won’t get scaled away?”, which I hear from my environment much less often.
There are a couple of pieces of extra polish that I appreciate. First, having some specific operationalisations with numbers and distributions up-front is pretty nice for grounding the discussion. Second, I’m glad that there was a summary extracted out front, as sometimes the dialogue format can be a little tricky to wade through.
On the object level, I thought the focus on schlep in the Ajeya-Daniel section and slowness of economy turnover in the Ajaniel-Ege section was pretty interesting. I think there’s a bit of a cycle with trying to do complicated things like forecast timelines, where people come up with simple compelling models that move the discourse a lot and sharpen people’s thinking. People have vague complaints that the model seems like it’s missing something, but it’s hard to point out exactly what. Eventually someone (often the person with the simple model) is able to name one of the pieces that is missing, and the discourse broadens a bit. I feel like schlep is a handle that captures an important axis that all three of our participants differ on.
I agree with Daniel that a pretty cool follow-up activity would be an expanded version of the exercise at the end with multiple different average worlds.
Curated. I am excited about many more distillations and expositions of relevant math on the Alignment Forum. There are a lot of things I like about this post as a distillation:
Exercises throughout. They felt like they were simple enough that they helped me internalise definitions without disrupting the flow of reading.
Pictures! This post made me start thinking of finite factorisations as hyperrectangles, and histories as dimensions that a property does not extend fully along.
Clear links from Finite Factored Sets to Pearl. I think these are roughly the same links made in the original, but they felt clearer and more orienting here.
Highlighting which of Scott’s results are the “main” results (even more than the “Fundamental Theorem” name already did).
Magdalena Wache’s engagement in the comments.
I do think the pictures became less helpful to me towards the end, and I thus have worse intuitions about the causal inference part. I’m also not sure about the emphasis of this post on causal rather than temporal inference. But I still love the post overall.
Curated.
Using Bayes-type epistemology is a core LessWrong topic, and I think this represents a bunch of progress on that front (whether the results are already real-world-ready or just real-world-inspired). I have only engaged with small parts of the thesis, but those parts seem pretty exciting; so far, I particularly like knowing about quasi-arithmetic pooling. It feels like I’ve become less confused about something that I didn’t know I was confused about — the connection between the character of the proper scoring rule and the right ways to aggregate those probabilities.
I also appreciate Eric’s work making blogposts explaining more of his thoughts in a friendly way. Hope to see a few more distillations come out of this thesis!
Harvard tells us that their median class size is 12 and over 75% of their courses have fewer than 20 students.
Smaller class sizes sounds pretty good! Maybe worth paying for? But I am reminded of the claim that most flights are empty, even though most people find themselves on full flights. Similarly, most person-class-hours might be spent in the biggest classes (cf the inspection paradox).