Hello! I work at Lightcone and like LessWrong :-)
kave
Daniel Dennett has died (1942-2024)
If you weren’t such an idiot...
New LessWrong review winner UI (“The LeastWrong” section and full-art post pages)
PSA: The Sequences don’t need to be read in sequence
On plans for a functional society
I sometimes like things being said in a long way. Mostly that’s just because it helps me stew on the ideas and look at them from different angles. But also, specifically, I liked the engagement with a bunch of epistemological intuitions and figuring out what can be recovered from them. I like in particular connecting the “trend continues” trend to the redoubtable “electron will weight the same tomorrow” intuition.
(I realise you didn’t claim there was nothing else in the dialogue, just not enough to justify the length)
As a general matter, Anthropic has consistently found that working with frontier AI models is an essential ingredient in developing new methods to mitigate the risk of AI.
What are some examples of work that is most largeness-loaded and most risk-preventing? My understanding is that interpretability work doesn’t need large models (though I don’t know about things like influence functions). I imagine constitutional AI does. Is that the central example or there are other pieces that are further in this direction?
Much sweat and some tears were spent on trying to get something like that working, but the Shoggoths are fickle
Singular learning theory and bridging from ML to brain emulations
Some Manifold markets:
A bet on critical periods in neural networks
This paper also seems dialectically quite significant. I feel like it’s a fairly well-delineated claim that can be digested by mainsteam ML and policy spaces. Like, it seems helpful to me if policy discussions can include phrases like “the evidence suggests that if the current ML systems were trying to deceive us, we wouldn’t be able to change them not to”.
Some quotes from the wiki article on Shoggoths:
Being amorphous, shoggoths can take on any shape needed, making them very versatile within aquatic environments.
At the Mountains of Madness includes a detailed account of the circumstances of the shoggoths’ creation by the extraterrestrial Elder Things. Shoggoths were initially used to build the cities of their masters. Though able to “understand” the Elder Things’ language, shoggoths had no real consciousness and were controlled through hypnotic suggestion. Over millions of years of existence, some shoggoths mutated, developed independent minds, and rebelled.
Quoting because (a) a lot of these features seem like an unusually good match for LLMs and (b) acknowledging that is picking a metaphor that fictionally rebelled, and thus is potentially alignment-is-hard loaded as a metaphor.
Sometimes running to stand still is the right thing to do
It’s nice when good stuff piles up into even more good stuff, but sometimes it doesn’t:Sometimes people are worried that they will habituate to caffeine and lose any benefit from taking it.
Most efforts to lose weight are only temporarily successful (unless using medicine or surgery).
The hedonic treadmill model claims it’s hard to become durably happier.
Productivity hacks tend to stop working.
These things are like Alice’s red queen’s race: always running to stay in the same place. But I think there’s a pretty big difference between running that keeps you exactly where you would have been if you hadn’t bothered, and running that either moves you a little way and then stops, or running that stops you moving in one direction.
I’m not sure what we should call such things, but one idea is hamster wheels for things that make no difference, bungee runs for things that let you move in a direction a bit but you have to keep running to stay there, and backwards escalators for things where you’re fighting to stay in the same place rather than moving in a direction (named for the grand international pastime of running down rising escalators).
I don’t know which kind of thing is most common, but I like being able to ask which dynamic is at play. For example, I wonder if weight loss efforts are often more like backwards escalators than hamster wheels. People tend to get fatter as they get older. Maybe people who are trying (but failing) to lose weight are gaining weight more slowly than similar people who aren’t trying to do so?
Or my guess is that most people will have more energy than baseline if they take caffeine every day, even though any given dose will have less of an effect than taking the same amount of caffeine while being caffeine-naive, so they’ve bungee ran (done a bungee run?) a little way forward and that’s as far as they’ll go.
I am currently considering whether productivity hacks, which I’ve sworn off, are worth doing even though they only last for a little while. The extra, but finite, productivity could be worth it. (I think this would count as another bungee run).
I’d be interested to hear examples that fit within or break this taxonomy.
FWIW, “powe” has been removed from “official” toki pona. A more standard translation might be “sona ike lili”.
If I imagine having a compiler that translates back-and-forth between intuitionistic and classical logic as in the post, and I want to stop the accumulation of round-trip ‘cruft’, I think the easiest thing to do would be to add provenance information that let me figure out whether a provability predicate, say, was “original” or “translational”. But frustratingly that’s not really possible in the case where I’m trying to translate between people with pretty different ontologies (who might not be able to parse their interlocutors statements natively).
I dunno whether you’re thinking more about the case of differing ontologies or more about the case of preferred framings (but fluency with both), so not sure how relevant to your inquiries.
Adding filler tokens seems like it should always be neutral or harm a model’s performance: a fixed prefix designed to be meaningless across all tasks cannot provide any information about each task to locate the task (so no meta-learning) and cannot store any information about the in-progress task (so no amortized computation combining results from multiple forward passes).
I thought the idea was that in a single forward pass, the model has more tokens to think in. That is, the task description on its own is, say, 100 tokens long. With the filler tokens, it’s now, say, 200 tokens long. In principle, because of the uselessness/unnecessariness of the filler tokens, the model can just put task-relevant computation into the residual stream for those positions.
Table 2 seems to provide a more direct comparison.
I think my big problem with complexity science (having bounced off it a couple of times, never having engaged with it productively) is that though some of the questions seem quite interesting, none of the answers or methods seem to have much to say.
Which is exacerbated by a tendency to imply they have answers (or at least something that is clearly going to lead to an answer)
Harvard tells us that their median class size is 12 and over 75% of their courses have fewer than 20 students.
Smaller class sizes sounds pretty good! Maybe worth paying for? But I am reminded of the claim that most flights are empty, even though most people find themselves on full flights. Similarly, most person-class-hours might be spent in the biggest classes (cf the inspection paradox).