Algon
“Norms should be predictable” is another way of saying this. In general, making reality predictable is useful.
When I binge a fantasy book or a computer game for a couple of days, the fantasy feels more real than reality. I’ll go for a walk and feel constantly out-of-odds when magic fails to happen. The real world feels like a game I’m playing in, people I’m interacting with feel like NPCs, but the game feels like the real world.
I am surprised by this! Could you say more?
My own experience is nothing like yours. After immersing myself in a fictional world it will remain on my mind for quite a while, as I daydream various scenarios, obsess over what will happen next and so on. But I won’t expect magic to occur. Except, of course, when I was a young child and, y’know, actually believed in magic. Or when I awake from a dream and am befuddled to it wasn’t real.TBC I’m not just interested in your experience because it’s very different to my own. I’m also interested because it sounds like you can reset deep-seated beliefs just by reading a book. Is that as exploitable as it sounds? Both by you and by others who are out to get you?
This is a great first post to LW! Very cool work from the perspective of “what preferences do LLMs have” and this is some evidence that they have preferences for not optimizing for their user’s benefit when the user is vile.
But I disagree that LLMs being less helpful to vile people has an implication of “moral cowardice” or “shirking its duties”. Not because I think doing less to help vile people is necessarily better than just doing your best to help whomever you’re interacting with. Rather, the LLM doesn’t have much of an option not to help you—i.e. there’s no choice in who it gets to interact with, even at the meta-level of opting into a carreer like law where you have to help everyone who is paying for your help. Instead, the LLM is forced into replying to you, and if you don’t like it you can hit a button that whaps it with an RLHF hammer to modify its behaviour.
I figured, I just couldn’t resist making the joke.
That is moderate evidence against my claim. Evidence against because it goes against what I said, and moderate because the kind of person who answers the LW survey is more likely to have read the Sequences IMO.
and I fucking told you so
But you didn’t tell us...
some of my old colleagues at CEA went to found FTX which was a Very Bad Sign but I felt like saying something publicly was a big no no by EA norms
I tried to make some comments but i could only reply to existing comments and just gave up.
Right, Mercedes Lackey exists! I enjoy her songs, too. No idea why I forgot about them.
killing puppies doesn’t cure cancer. You can kill one hundred puppies and still not save your kid.
I get you’re trying to show how commuting an obviously evil act won’t fix your unrelated problems magically, but I think you’re pushing too far on the “evil act” part of things and no enough on representing the reasoning of people who think killing Sam Altman would help somehow. Like, whomever threw that molotov cocktail probably wouldn’t feel your example captured how they’re thinking about this. But they and others who reason like them are the ones who need to internalize your point!
Now, I don’t know exactly what went on inside that guy’s head. But I think it might be something like this. “Sam Altman has some causal influence on AI development. He’s part of what’s causing the race! So if we get rid of him, we gain time.” This is obviously an impoverished mental model, and it’s operating more on associations or vibes than causal mechanisms.
So a better example would replace puppies with something associated with increasing cancer. Perhaps “cigarette smokers” or “nuclear power plants”. “If I kill all the cigarette smokers then my daughter’s cancer won’t resurge”. Or perhaps you have someone on a noble crusade to end cancer, and they decide to bomb all the nuclear power plants. Then the analogy to “killing sam altman will reduce AI x-risk” would be tighter.
EDIT: Also, thanks for writing the post I wanted to write.
with the fullness of time perhaps he could grow to be among the best science fiction writers of our generation.
I think he already is. Not because as a writer he is so superlative per se, though he is great. Instead, it’s because the competition have their heads buried in the sand, ignoring the opaque wall rushing at us, otherwise known as the Singularity. Bjartur Tomas is not like that. He grapples with life as it is and may one day be. He is among the best science fiction writers of our generation because he is one of the only live science fiction writers of our generation.
I told the LLM “Make LW look like Astera Mag”
Does Open-thropic-mind have an actual plan to align AI? Well, a plan is a recipe for succeeding at some task. They have a recipe (c.f. their 100 page pdfs) but will they succeed? At which point your answer to this question depends on if you’re for foom or think it leads to doom.
I’d say ratfics are more about becoming God, and as God you can naturally Fix the world. So you can view rats as atheists who believe that since God doesn’t exist, we must build Him.
Edit: Really, ratfics are about becoming more you are, with becoming God as the natural limit.
Oh, I didn’t realize you could just paste latex into the LW editor and it autoparses, I thought you had to use ctrl+4 to summon a block in which you write math equations.
Good job on finding the wikipedia page for this! I didn’t know what it was called, I just read about it in Landau vol. 1 many years ago.
And if I had to typset the equations I probably wouldn’t have written this quick take.
Perhaps my favourite relation in physics is
t/T = (l/L)^{1-k/2}.
This says that for a bunch of particle in a potential V = a x^k, if you let the system evolve over time T forming a path which has size L in some sense, then there is another path which is a re-scaled version of the original one s.t. if it has size l, then the time taken to form this new path is t.
We can use this trick to create a bunch of “scaling laws” for simple physical systems. For example:
1) Let V = a x ^{-1} i.e. a gravitational potential. Then we have k=-1, so
t/T = (l/L)^{3/2}
(t/T)^{2} = (l/L)^{3}
I.e. Kepler’s third law. In fact, this is a more general result because it also applies to particles falling into a potential well. If you increase the distance the particle takes to fall in, you can use this equation to tell you how much the time increases, too.
2) V = a x^{2} i.e. the potential of an oscillator. Then we have k=2, so
t/T = (l/L)^{0}
That is, we can take an oscillation and increase the amplitude without changing the time it takes to complete the oscillation.
3) V = a x i.e. potential of a constant force. Then we have k=1, so
t/T = (l/L)^{1/2}
(t/T )^{2}= (l/L)
This reflects the fact that the position of a particle experiences constant force instreases like t^2,
So you can see you get a lot of mileage out of this pretty simple equation.
A complicated plot about a complicated plot.
I disagree, and think your analogy to MS Word may be where the crux lies. We could only build MS word because it relies on a bunch of simple, repeated abstractions that keep cropping up (e.g. parsers, rope data structures etc.) in combination with a bunch of random, complex crud that is hard to memorize. The latter is what you’re pointing at, but that doesn’t mean there aren’t a load of simple, powerful abstractions underyling the whole thing which, if you understand them, you can get the program to do pretty arbitrary things. Most of the random high complexity stuff is only needed locally, and you can get away with just understanding the bulk structure of a chunk of the program and whatever bits of trivia you need to accomplish whatever changes you want to make to MS Word.
This is unlike the situation with LLMs, which we don’t have the ability to create by hand, or to seriously understand an arbitrary section of its functionality. Thoughmaaaaybe we could manage to engineer something like GPT-2 right now but I’d bet against that for GPT-3 onwards.
When this works, it really works; I have seen Claude perform some pretty remarkable feats while inside this kind of “information-rich on-rails experience,” ones that impressed me much more than any of the high-autonomy agentic one-shotting stuff that the hype is focused on
Could you give an example?
Your explanation has left me wondering how much of the work done in achieving these feats is you providing the right context. Certainly, when I’m solving problems, a lot of the work is finding the right context.
Funny that you say this because I recently stopped making the sauce and pasta seperately and it’s made me so much happier with how the pasta tastes. Nowadays, I just cook the sauce and pasta in one pan—as soon as I start simmering a marinara, I put uncooked pasta in the same pan. This way the pasta takes longer to cook, but it gets fully coated the sauce, which I love.