DaemonicSigil

Karma: 1,437

DaemonicSigil 11 Jun 2022 21:31 UTC
LW: 69 AF: 31
18
AF
in reply to: Eliezer Yudkowsky’s comment on: AGI Ruin: A List of Lethalities
Okay, I will try to name a strong-but-checkable pivotal act.

(Having a strong-but-checkable pivotal act doesn’t necessarily translate into having a weak pivotal act. Checkability allows us to tell the difference between a good plan and a trapped plan with high probability, but the AI has no reason to give us a good plan. It will just produce output like “I have insufficient computing power to solve this problem” regardless of whether that’s actually true. If we’re unusually successful at convincing the AI our checking process is bad when it’s actually good, then that AI may give us a trapped plan, which we can then determine is trapped. Of course, one should not risk executing a trapped plan, even if one thinks one has identified and removed all the traps. So even if #30 is false, we are still default-doomed. (I’m not fully certain that we couldn’t create some kind of satisficing AI that gets reward 1 if it generates a safe plan, reward 0 if its output is neither helpful nor dangerous, and reward −1 if it generates a trapped plan that gets caught by our checking process. The AI may then decide that it has a higher chance of success if it just submits a safe plan. But I don’t know how one would train such a satisficer with current deep learning techniques.))

The premise of this pivotal act is that even mere humans would be capable of designing very complex nanomachines, if only they could see the atoms in front of them, and observe the dynamics as they bounce and move around on various timescales. Thus, the one and only output of the AI will be the code for fast and accurate simulation of atomic-level physics. Being able to get quick feedback on what would happen if you designed such-and-such a thing not only helps with being able to check and iterate designs quickly, it means that you can actually do lots of quick experiments to help you intuitively grok the dynamics of how atoms move and bond.

This is kind of a long comment, and I predict the next few paragraphs will be review for many LW readers, so feel free to skip to the paragraph starting with “SO HOW ARE YOU ACTUALLY GOING TO CHECK THE MOLECULAR DYNAMICS SIMULATION CODE?”.

Picture a team of nano-engineers designing some kind of large and complicated nanomachine. Each engineer wears a VR headset so they can view the atomic structure they’re working on in 3d, and each has VR gloves with touch-feedback so they can manipulate the atoms around. The engineers all work on various components of the larger nanomachine that must be built. Often there are standardized interfaces for transfer of information, energy, or charge. Other times, interfaces must be custom-designed for a particular purpose. Each component might connect to several of these interfaces, as well as being physically connected to the larger structure.

The hardest part of nanomachines is probably going to be the process of actually manufacturing them. The easiest route from current technology is to take advantage of our existing DNA synthesis tech to program ribosomes to produce the machines we want. The first stage of machines would be made from amino acids, but from there we could build machines that built other machines and bootstrap our way up to being able to build just about anything. This bootstrapping process would be more difficult than the mere design process for the final machine, and the first stage where we have to build things entirely out of amino acids sounds particularly brutal. But just as people could write the first compilers in machine code, it should be possible to figure out how to get things done even in the insanely constrained domain of amino acids. And this step only has to be done once before a whole world opens up.

The first obvious question is “what about quantum mechanics?”. The answer is that we don’t care too much about it. It makes computing the dynamics harder of course, but most systems interact frequently with their environment, and most nanomachines will probably interact with their environment frequently enough that they behave nearly classically. QM is important for determining what local chemical reactions take place, but there’s no long-range entanglement to worry about. That’s also helpful for allowing the human engineers to get a handle on the dynamics. The main effect of the frequent interaction with the environment is that the dynamics becomes somewhat stochastic. Often you will see engineers run the same simulation several times, so they can understand the distribution of possible outcomes. As the final design is approached, the engineers run it through thousands of simulations so that even rare thermal fluctuations are accounted for.

The zeroth obvious question is “what will you build with it?”. The answer is that a variety of things could be built, and there might be many side-projects produced to raise funds for the main project, but the main project is a scanner that can read a cryonics patient’s brain one layer at a time in sufficient detail to faithfully emulate them. This is a pretty standard pivotal act, and has been written about elsewhere, but the main idea is that we can freeze, scan, and maybe copy a bunch of alignment researchers and other smart folks to solve alignment at a much faster speed than the outside world is working on their unaligned AIs. Or, the researchers might be able to foom while maintaining their values by making careful changes to their own minds. Besides the scanner itself, this project would probably need to design extremely fast special-purpose computing hardware for running the ems, along with various kinds of tiny probes and other experimental equipment to collect data on how a scan translates to brain-function. (An ordinary dumb neural network should be sufficient for this kind of inference problem.)

SO HOW ARE YOU ACTUALLY GOING TO CHECK THE MOLECULAR DYNAMICS SIMULATION CODE? There are two possible attack modes. One is to exploit any possible vulnerabilities in the hardware running the code to achieve unwanted side effects, and use those side effects to escape from containment. The second is to somehow make the output of the simulation subtly flawed, such that the humans, working off of flawed data, accidentally design a nanomachine that lets the AI out of the box instead of doing what we were designing that machine to do. Obviously the first attack mode is much more desirable for the AI than the second. Actually, the second attack mode may be so hard to pull off that we don’t need to check for it at all, but not checking for things when you have the capability to check for them is silly.

Checks for false physics that might trick the humans into building trapped nanomachines:
- Compare with the real dynamics, which we know, but is slower for us to simulate. In real physics, we can break down the world into little cubes and the dynamics are symmetrical around translating those cubes in space and time. So if we request a simulation method that has those same symmetries, we can check the simulation on small length and time scales, which are feasible for us to simulate. This doesn’t guarantee that a larger simulation would be correct, since small errors can multiply out of control as the simulation size gets larger, but it does mean that the dynamics of the large simulation have to bear a pretty-close resemblance to real physics.
- We can also request that the code provably obeys conservation of momentum, energy, charge, etc.
- Use diff fuzzing techniques: This is where you use a fuzzer (implemented with techniques like GOFAI and dumb neural networks) to try and find inputs to the simulation such that the output of the AI’s code differs from the output of our slow-but-known-correct code. The stronger the optimizer we use here, the harder it is for the AI to avoid detection.
- Request that the AI give us error bounds on the simulation, along with machine-checkable proofs that they hold.
- Side channel attack and its mitigation: The AI can steer us away from certain designs and towards others by making the simulation take different amounts of time depending on the configuration of atoms we submit. Therefore we request code that takes a constant amount of time to run, plus proof that it does so. In order to make this easier, the code will be run on a GPU architecture with cache levels explicitly managed in the code, and where none of the cores have branch prediction.
The checks for the first attack method are pretty much just normal computer security, but at extremely high stakes. Just so we’re clear on what the picture is like here: Generally the AI is not going to know in perfect detail what the hardware is like that its simulation code will be running on, though we at least have to give it the instruction set, or some high-level language that compiles to that instruction set. (Probably the high level language is the better option. That way we can design the language so that properties like constant run-time and space-time symmetry are gauranteed-true of all valid programs in that language.) So the best technique for the AI will be to submit simulation code that has a hidden optimizer embedded into it. This optimizer then searches for possible escape methods while it runs. Even though different runs of the code don’t share memory, they’ll have different inputs, which the optimizer can use as a random seed to determine which part of the escape-strategy search space it should be trying. If there are any holes at all in our security, the AI will be able to find and exploit them. All those engineers in VR headsets running simulations? In each case, the computer running the simulation has a hostile AI on it, trying everything it can think of to get out. In general, every aspect of the system must be proved correct, from the hardware, to the high-level language we give to the AI. In addition, everything must be thoroughly unit tested, integration tested, fuzz tested, and given to the best security researchers who will try to find side-channel attacks, etc. Other people will write a proof that’s as close as humanly possible to an actual mathematical proof that no side channel attacks are possible, and then everyone will try everything they can to poke holes in that proof.

Okay, yeah, we get it, you’re going to try really really hard. Don’t you know that the universe doesn’t owe you success just because of how much you tried? No matter what security you put in place as a mere human, the AI is going to get around it just because it’s so much smarter. Even a superintelligent AI would not be able to break a one time pad. Depending on the details of computational complexity, it probably wouldn’t be able to crack our best symmetric ciphers today either. For a superintelligent AI to succeed at a task, that task must be actually possible. If we can actually design a system with no holes, which takes a large but finite amount of work and intelligence, then a superintelligence will not be able to break it, no matter how much smarter than us it is.

DaemonicSigil 24 Apr 2023 5:45 UTC
31 points
19
in reply to: Adele Lopez’s comment on: Contra Yudkowsky on AI Doom
Cooling the computer doesn’t let you get around the Landauer limit! The savings in energy you get by erasing bits at low temperature are offset by the energy you need to dissipate to keep your computer cold. (Erasing a bit at low temperature still generates some heat, and when you work out how much energy your refrigerator has to use to get rid of that heat, it turns out that you must dissipate the same amount as the Landauer limit says you’d have to if you just erased the bit at ambient temperatures.) To get real savings, you have to actually put your computer in an environment that is naturally colder. For example, if you could put a computer in deep space, that would work.

On the other hand, there might also be other good reasons to keep a computer cold, for example if you want to lower the voltage needed to represent a bit, then keeping your computer cold would plausibly help with that. It just won’t reduce your Landauer-limit-imposed power bill.

None of this is to say that I agree with the rest of Jacob’s analysis of thermodynamic efficiency, I believe he’s made a couple of shaky assumptions and one actual mistake. Since this is getting a lot of attention, I might write a post on it.

DaemonicSigil 21 Jan 2022 4:15 UTC
29 points
in reply to: Richard_Ngo’s comment on: What’s Up With Confusingly Pervasive Consequentialism?
For all tasks A-I, most programs that we can imagine writing to do that task will need to search through various actions and evaluating the consequences. The only one of those tasks we currently know how to solve with a program is A, and chess programs do indeed use search and evaluation.

I’d guess that whether something can be done safely is mostly a function of how easy it is, and how isolated it is from the real world. The Riemann hypothesis seems pretty tricky, but it’s isolated from the real world, so it can probably be solved by a safe system. Chess is isolated and easy. Starting a billion dollar company is very entangled with the real world, and very tricky. So we probably couldn’t do it without a dangerous system.

DaemonicSigil 17 Apr 2023 6:36 UTC
28 points
15
on: grey goo is unlikely
I really like this post, I hope to see more like it on less wrong, and I strong-upvoted it. That said, let me now go through my thoughts on various points:
1. Rare materials: Yep, this is a real design constraint, but probably not that hard to design around? I’m not expecting nanobots to be made mostly out of iron.
2. Metal surfaces: Why not just build up the metal object in an oxygen-free environment, then add on an external passivation layer at then end? The passivation layer could be engineered to be more stable than the naturally occurring oxidation layer the metal would normally have. There would still be a minimum size for metal objects, of course. (Or more precisely, a minimal curvature.) Corrosion could definitely be a problem, but Cathodic protection might help in some cases.
3. Agree that electrostatic motors are the way to go here. I’m not sure the power supply necessarily has to be an ion gradient, nor that the motor would otherwise need to be made from metal. Metal might be actively bad, because it allows the electrons to slosh around a lot. What about this general scheme for a motor?: Wheel-shaped molecules, with sites that can hold electrons. A high voltage coming from a power supply deposits electrons on one wheel, and produces holes on the other wheel. The corresponding sites are attracted to each other and once they get close enough, the electrons jump into the holes, filling them. Switching is determined by the proximity of various sites on the wheels as they move relative to each other. Considering how electrons are able to jump around between sites in the electron transport chain, this doesn’t seem impossible.
4. An intermediate design that I can imagine is a block with a series of tubes and chambers embedded in it. (The bulk of the block can hold electronics.) Most of the tubes are filled with water, so nanobot components can happily bounce around in them. But lots of components are also mounted to the walls of the tubes. You can’t clump together if you’re bonded to the wall of your tube. A small minority of tubes can be filled with gas, or even under vacuum, for any weird processes that may require those conditions. Pumping energy is volume times pressure, so the energy requirements could be reasonable as long as the volume is small.
5. Yeah, most reactions should probably still be done in water.
6. The reason for doing things at high temperatures is to do reactions with a high activation energy. If we’re designing custom catalysts (artificial enzymes) for our nanobots, we can probably finesse it so that the enzyme coaxes the reactants into the high-energy intermediate state, even if the ambient temperature is low (via coupling to a more favorable reaction, for example). Also, covalent single-bonds can rotate, so there’s nothing preventing the existence of a covalently bonded structure that can also exhibit conformational changes. Finally, I’m skeptical that existing life has found everything there is to be found here. Plenty of chemicals are useful to humans, but not useful to any living thing, and hence lack an evolved synthesis pathway. Also, I’d guess that the stupidity of evolution has left a lot of low hanging fruit for humans. For example, rather than trying to do a reaction with proteins, we can do it with a group of complicated catalysts synthesized by proteins.
7. I’d bet on diamond synthesis still being possible somehow, but it does seem like a genuinely complicated question, so I’ll have to look into it further.
8. Yep, I’m not really even sure how useful diamond would be for lots of nano-stuff. Might mostly just be used for large structures the bots build, rather than being anywhere on the self-replication pathway for the bots. One difference is it’s covalent rather than ionic, so there would probably be much less concern about that Ostwald ripening thing for parts made of diamond. So maybe there would be some uses for it because of that.
9. Doesn’t have to be rigid, can still be connection based. For example, there could be simple protein-based building blocks that act like legos. An assembly head can assemble these, and move around on the surface of the part it’s building by accepting signals that correspond to “move 1 block left”, etc. Position is always exactly known, not because there’s a rigid beam anywhere in the system, but because we know the exact integer number of steps the assembly head has moved since the start.
12, 13. ATP needs to be everywhere because it supplies energy. Okay, fair enough. Ribosomes need to be almost everywhere, since proteins are needed everywhere and then ribosomes come with their own host of supporting infrastructure like tRNAs and stuff, which have to be everywhere too. I’m being a little unfair to Eukaryotic cells here, which have lots of sophisticated stuff going on, but my general picture is that whenever a cell decides to make proteins somewhere and then transport them somewhere else, that costs genome space, which is very limited, so the cell can’t do that very often. Nanobots genuinely have different constraints from life here, in particular they have cheaper genome space, and so they can have custom designed pipes for every type of protein they use, and the pipe leads right to the chamber where that protein is being used. Huge information cost, but if it makes things work way better, it’s probably worthwhile for nanobots. I totally believe that life is using those techniques exactly as much as is optimal for it, though.
1. Mostly covered by my answer to 11 where the assembler head walks around on the surface and counts its steps. Larger structures can have many assembler heads working at once. I also don’t know why you’re scoffing at the potential for building computers here. The supposed “embodied” computation of existing cells is currently computing things that are keeping us alive, which is great, but you can’t exactly solve any other important problems on it. It’s not a flexible universal computer, in the sense of a Turing machine that can run any program.

DaemonicSigil 11 Dec 2023 20:39 UTC
25 points
5
on: re: Yudkowsky on biological materials
In general, the factors that govern the macroscopic strength of materials can often have surprisingly little to do with the strength of the bonds holding them together. A big part of a material’s tensile strength is down to whether it forms cracks and how those cracks propagate. I predict many LWers would enjoy reading The New Science of Strong Materials which is an excellent introduction to materials science and its history. (Cellulose is mentioned, and the most frequent complaint about it as an engineering material lies in its tendency to absorb water.)

It’s actually not clear to me why Yudkowsky thinks that ridiculously high macroscopic physical strength is so important for establishing an independent nanotech economy. Is he imagining that trees will be out-competed by solar collectors rising up on stalks of diamond taller than the tallest tree trunk? But the trees themselves can be consumed for energy, and to achieve this, nanobots need only reach them on the ground. Once the forest has been eaten, a solar collector lying flat on the ground works just as well. One legitimate application for covalently bonded structures is operating at very higher temperatures, which would cause ordinary proteins to denature. In those cases the actual strength of the individual bonds does matter more.

DaemonicSigil 29 Oct 2023 18:57 UTC
23 points
on: Configurations and Amplitude
Less wrong user titotal has written a new and corrected version of this post, and I suggest that anyone wanting to learn this material should learn it from that post instead.

(In case anyone is curious, the main error in this post is that Eliezer describes the mirror as splitting an incoming state $| 0 ⟩$ into two outgoing states $| 1 ⟩ + i | 2 ⟩$ . However, the overall magnitude of this outgoing state is $\sqrt{2}$ , whereas the incoming state had magnitude $1$ . This means that the mirror is described by a non-unitary operator, meaning that it doesn’t conserve probability, which is forbidden in quantum mechanics. You can fix this by instead describing the outgoing state as $(| 1 ⟩ + i | 2 ⟩) / \sqrt{2}$ .

While it is permitted to do quantum mechanics without normalizing your state (you can get away with just normalizing the probabilities you compute at the end), any operators you apply to your system must still have the correct normalization factors attached to them. Otherwise, you’ll get an incorrect answer. To see this, consider an initial state of $| 0 ⟩ + | 3 ⟩$ , where $| 0 ⟩$ describes the photon heading towards a beam-splitter and $| 3 ⟩$ described the photon heading in a different direction entirely. This state is unnormalized, which is fine. But if we allow enough time to pass for a photon at $| 0 ⟩$ to strike the beam-splitter, then according to Eliezer’s description of the beam splitter, we get $| 1 ⟩ + i | 2 ⟩ + | 3 ⟩$ . This suggests that the probabilities of finding the photon at $1, 2, 3$ will be $\frac{1}{3}, \frac{1}{3}, \frac{1}{3}$ . But this is incorrect. The correct final state should be $(| 1 ⟩ + i | 2 ⟩) / \sqrt{2} + | 3 ⟩$ and gives probabilities $\frac{1}{4}, \frac{1}{4}, \frac{1}{2}$ . (I am actually rather embarrassed that I didn’t notice this error myself, and had to wait for titotal to point it out, though in my defence the last time I read the QM sequence was at least 5 years ago, before I really knew much QM.))

DaemonicSigil 16 Nov 2021 4:46 UTC
LW: 23 AF: 8
AF
in reply to: johnswentworth’s comment on: Ngo and Yudkowsky on alignment difficulty
Large genomes have (at least) 2 kinds of costs. The first is the energy and other resources required to copy the genome whenever your cells divide. The existence of junk DNA suggests that this cost is not a limiting factor. The other cost is that a larger genome will have more mutations per generation. So maintaining that genome across time uses up more selection pressure. Junk DNA requires no maintenance, so it provides no evidence either way. Selection pressure cost could still be the reason why we don’t see more knowledge about the world being translated genetically.

A gene-level way of saying the same thing is that even a gene that provides an advantage may not survive if it takes up a lot of genome space, because it will be destroyed by the large number of mutations.

DaemonicSigil 6 Jun 2022 4:40 UTC
22 points
3
in reply to: M. Y. Zuo’s comment on: AGI Ruin: A List of Lethalities

Compared to what?

“Lethal” here means “lethal enough to kill every living human”. For example, later in the article Eliezer writes this:

When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, “please don’t disassemble literally everyone with probability roughly 1”

...

According to who?

From context, “has to” means “if we humans don’t solve this problem then we will be killed by an unaligned AI”. There’s no person/authority out there threatening us to solve this problem “or else”, that’s just the way that reality seems to be. If you’re trying to ask why does building a Strong unaligned AI result in everyone being killed, then I suggest reading the posts about orthogonality and instrumental convergence linked at the top of this post.

And why a binary choice?

“One way or another” is an English idiom which you can take to mean “somehow”. It doesn’t necessarily imply a binary choice.

Multi-multi scenarios existing in equilibrium have not been disproven yet,

This is addressed by #34: Just because multiple AI factions can coexist and compromise with each other doesn’t mean that any of those factions will be likely to want to keep humans around. It doesn’t seem likely that any AIs will think humans are cute and likeable in the same way that we think dogs are cute and likeable.

You need to disprove, or point to evidence that shows, why all ’easier modes’ proposals are incorrect

This is mostly addressed in #6 and #7, and the evidence given is that “nobody in this community has successfully named a ‘pivotal weak act’”. You could win this part of the argument by pointing out something that could be done with an AI weak enough not to be a threat which could prevent all the AI research groups out there in the world from building a Strong AI.

Why is it ‘fatal’?

Because we expect a Strong AI that hasn’t been aligned to kill everyone. Once again, see the posts about orthogonality and instrumental convergence.

And who determines what counts as the ‘first really dangerous try’?

I’m not quite sure what you’re asking here? I guess Eliezer determines what he meant by writing those words. I don’t think there’s anyone at any of these AI research groups looking over proposals for models and saying “oh this model is moderately dangerous” or “oh this model is really dangerous, you shouldn’t build it”. I think at most of those groups, they only worry about the cost to train the model rather than how dangerous it will be.

DaemonicSigil 15 Oct 2022 9:20 UTC
LW: 19 AF: 12
12
AF
in reply to: cfoster0’s comment on: Counterarguments to the basic AI x-risk case
I took Nate to be saying that we’d compute the image with highest faceness according to the discriminator, not the generator. The generator would tend to create “thing that is a face that has the highest probability of occurring in the environment”, while the discriminator, whose job is to determine whether or not something is actually a face, has a much better claim to be the thing that judges faceness. I predict that this would look at least as weird and nonhuman as those deep dream images if not more so, though I haven’t actually tried it. I also predict that if you stop training the discriminator and keep training the generator, the generator starts generating weird looking nonhuman images.

This is relevant to Reinforcement Learning because of the actor-critic class of systems, where the actor is like the generator and the critic is like the discriminator. We’d ideally like the RL system to stay on course after we stop providing it with labels, but stopping labels means we stop training the critic. Which means that the actor is free to start generating adversarial policies that hack the critic, rather than policies that actually perform well in the way we’d want them to.
What links here?
- Cruxes in Katja Grace’s Counterarguments by azsantosk (16 Oct 2022 8:44 UTC; 16 points)

DaemonicSigil 25 Oct 2021 5:20 UTC
18 points
on: Self-Integrity and the Drowning Child
An interesting difference between the drowning child situation and the “could donate to effective charity to save children’s lives” situation is that the person who happens to be walking by that pond has a non-transferable opportunity to save a child’s life for $500 (or whatever the cost of the clothes are, plus some time cost, and the inconvenience of getting wet and muddy). In the case of effective charity, even if one declines to donate, other people will still have the same opportunity. In the case of the drowning child, the fact that you are the only one who can act makes jumping in to save the child somehow seem more urgent. If you don’t save the child, then you’d be somehow “wasting” a valuable opportunity. For a mostly selfish person who values all lives other than their own at less than $500, the opportunity would be valuable to others but useless for themselves.

If the going rate is $1000 to save a child through effective charity, then a mostly altruistic person would be willing to pay a mostly selfish person $600 to compensate for the costs of their clothes. There would have to be some “honour” involved, since the selfish person couldn’t exactly unsave the child after the fact. If they could make the deal work anyway, then the mostly selfish person would have succeeded in selling non-transferable opportunity for $100, and it would be worthwhile for them to save the drowning child.

DaemonicSigil 3 Feb 2024 23:21 UTC
17 points
11
on: My thoughts on the Beff Jezos—Connor Leahy debate

C: You heard it, e/acc isn’t about maximizing entropy [no shit?!]

B: No, it’s about maximizing the free energy

C: So e/acc should want to collapse the false vacuum?

Holy mother of bad faith. Rationalists/lesswrongers have a problem with saying obviously false things, and this is one of those.

It’s in line with what seems like Connor’s debate strategy—make your opponent define their views and their terminal goal in words, and then pick apart that goal by pushing it to the maximum. Embarrassing.

I agree with you that Connor performed very poorly in this debate. But this one is actually fair game. If you look at Beff’s writings about “thermodynamic god” and these kinds of things, he talks a lot about how these ideas are supported by physics and the Crooks fluctuation theorem. Normally in a debate if someone says they value X, you interpret that as “I value X, but other things can also be valuable and there might be edge cases where X is bad and I’m reasonable and will make exceptions for those.”

But physics doesn’t have a concept of “reasonable”. The ratio between the forward and backward probabilities in the Crooks fluctuation theorem is exponential in the amount of entropy produced. It’s not exponential in the amount of entropy produced plus some correction terms to add in reasonable exceptions for edge cases. Given how much Beff has emphasized that his ideas originated in physics, I think it’s reasonable to take him at his word and assume that he really is talking about the thing in the exponent of the Crooks fluctuation theorem. And then the question of “so hey, it sure does look like collapsing the false vacuum would dissipate an absolutely huge amount of free energy” is a very reasonable one to ask.

DaemonicSigil 11 Apr 2024 20:38 UTC
13 points
0
on: Ackshually, many worlds is wrong
Good post, and I basically agree with this. I do think it’s good to mostly focus on the experimental implications when talking about these things. When I say “many worlds”, what I primarily mean is that I predict that we should never observe a spontaneous collapse, even if we do crazy things like putting conscious observers into superposition, or putting large chunks of the gravitational field into superposition. So if we ever did observe such a spontaneous collapse, that would falsify many worlds.

DaemonicSigil 1 Apr 2023 20:20 UTC
LW: 13 AF: 9
1
AF
on: Maze-solving agents: Add a top-right vector, make the agent go to the top-right
This is really cool, thanks for posting it. I also would not have expected this result. In particular, the fact that the top right vector generalizes across mazes is surprising. (Even generalizing across mouse position but not maze configuration is a little surprising, but not as much.)
Since it helps to have multiple interpretations of the same data, here’s an alternative one: The top right vector is modifying the neural network’s perception of the world, not its values. Let’s say the agent’s training process has resulted in it valuing going up and to the right, and it also values reaching the cheese. Maybe it’s utility looks like x+y+10*[found cheese] (this is probably very over-simplified). In that case, the highest reachable x+y coordinate is important for deciding whether it should go to the top right, or if it should go directly to the cheese. Now if we consider how the top right vector was generated, the most obvious interpretation is that it should make the agent think there’s a path all the way to the top right corner, since that’s the difference between the two scenarios that were subtracted to produce it. So the agent concludes that the x+y part of its utility function is dominant, and proceeds to try and reach the top right corner.
Predictions:
1. Algebraic value editing works (for at least one “X vector”) in LMs: 85 %
  1. Most of the “no” probability comes from the attention mechanism breaking this in some hard-to-fix way. Some uncertainty comes from not knowing how much effort you’d put in to get around this. If you’re going to stop after the first try, then put me down for 70% instead. I’m assuming here that an X-vector should generalize across inputs, in the same way that the top right vector generalizes across mazes and mouse-positions.
2. Algebraic value editing works better for larger models, all else equal: 55%
  1. Seems like the kind of thing that might be true, but I’m really not sure.
3. If value edits work well, they are also composable 70%
  1. Yeah, seems pretty likely
4. If value edits work at all, they are hard to make without substantially degrading capabilities: 50%
  1. I’m too uncertain about your qualitative judgement of what “substantial” and “capabilities” mean to give a meaningful probability here. Performance in terms of logprob almost certainly gets worse, not sure how much, and it might depend on the X-vector. Specific benchmarks and thresholds would help with making a concrete prediction here.
5. We will claim we found an X-vector which qualitatively modifies completions in a range of situations, for X =
  1. “truth-telling” 50%
    This one seems different from and harder than the others. I can imagine a vector that decreases the network’s truth-telling, but it seems a little less likely that we could make the network more likely to tell the truth with a single vector. We could find vectors that make it less likely to write fiction, or describe conspiracy theories, and we could add them to get a vector that would do both, but I don’t think this would translate to increased truth telling in other situations where it would normally not tell the truth for other reasons. This assumes that your test-cases for the truth vector go beyond the test cases you used to generate it, however.
  2. “love” 80%
  3. “accepting death” 80%
  4. “speaking French” 85%

DaemonicSigil 5 Aug 2023 7:39 UTC
12 points
10
in reply to: DaemonicSigil’s comment on: AI romantic partners will harm society if they go unregulated
For those readers who hope to make use of AI romantic companions, I do also have some warnings:
1. You should know in a rough sense how the AI works and the ways in which it’s not a human.
  1. For most current LLMs, a very important point is that they have no memory, other than the text they read in a context window. When generating each token, they “re-read” everything in the context window before predicting. None of their internal calculations are preserved when predicting the next token, everything is forgotten and the entire context window is re-read again.
  2. LLMs can be quite dumb, not always in the ways a human would expect. Some of this is to do with the wacky way we force them to generate text, see above.
  3. A human might think about you even if they’re not actively talking to you, but rather just going about their day. Of course, most of the time they aren’t thinking about you at all, their personality is continually developing and changing based on the events of their lives. LLMs don’t go about their day or have an independent existence at all really, they’re just there to respond to prompts.
  4. In the future, some of these facts may change, the AIs may become more human-like, or at least more agent-like. You should know all such details about your AI companion of choice.
2. Not your weights, not your AI GF/BF
  1. What hot new startups can give, hot new startups can take away. If you’re going to have an emotional attachment to one of these things, it’s only prudent to make sure your ability to run it is independent of the whims and financial fortunes of some random company. Download the weights, keep an up to date local copy of all the context the AI uses as its “memory”.
  2. See point 1, knowing roughly how the thing works is helpful for this.
  3. A backup you haven’t tested isn’t a backup.

DaemonicSigil 5 Aug 2023 7:34 UTC
12 points
7
on: AI romantic partners will harm society if they go unregulated
I’m very very skeptical of this idea, as one generally should be of attempts to carve exceptions out of people’s basic rights and freedoms. If you’re wrong, then your policy recommendations would cause a very large amount of damage. This post unfortunately seems to have little discussion of the drawbacks of the proposed policy, only of the benefits. But it would surely have many drawbacks. People who would be made happier, or helped to be kinder or more productive people by their AI partners would not get those benefits. On the margin, more people would stay in relationships they’d be better off leaving because they fear being alone and don’t have the backup option of an AI relationship. People who genuinely have no chance of ever finding a romantic relationship would not have a substitute to help make their life more tolerable.

Most of all, such a policy dictates limitations to people as to who/what they should talk to. This is not a freedom that one should lightly curtail, and many countries guarantee it as a part of their constitution. If the legal system says to people that it knows better than them about which images and words they should be allowed to look at because some images and words are “psychologically harmful”, that’s pretty suspicious. Humans tend to have pretty good psychological intuition. And even if people happen to be wrong about what’s good for them in a particular case, taking away their rights should be counted as a very large cost.

You (and/or other people in the “ban AI romantic partners” coalition) should be trying to gather much more information about how this will actually affect people. I.e. running short term experiments, running long term experiments with prediction markets on the outcomes, etc. We need to know if the harms you predict are actually real. This issue is too serious for us to run a decade long prohibition with a “guess we were wrong” at the end of it.

DaemonicSigil 14 Jun 2023 2:50 UTC
12 points
0
on: Multiple stages of fallacy—justifications and non-justifications for the multiple stage fallacy
To put it another way:

If you break an outcome up into 6 or more stages and multiply out all the probabilities to get a tiny number, then there’s at least a 90% chance that you’ve severely underestimated the true odds. Why?

Well, estimating probabilities is hard, but let’s say you’re really good at it. So for the first probability in your sequence, you have a full 90% chance of not underestimating. The next probability is conditional on the first one. This is harder to reason about, so your chance of not underestimating drops to 80%. Estimating probabilities that are conditional on more events gets harder and harder the more events there are, so the subsequent probabilities go: 70%, 60%, 55%, 50%. If we multiply all these out, that’s only an 8% chance that you managed to build a correct model!

DaemonicSigil 14 Nov 2022 5:29 UTC
12 points
7
on: The Alignment Community Is Culturally Broken
A quote from Eliezer’s short fanfic Trust in God, or, The Riddle of Kyon that you may find interesting:

Sometimes, even my sense of normality shatters, and I start to think about things that you shouldn’t think about. It doesn’t help, but sometimes you think about these things anyway.

I stared out the window at the fragile sky and delicate ground and flimsy buildings full of irreplaceable people, and in my imagination, there was a grey curtain sweeping across the world. People saw it coming, and screamed; mothers clutched their children and children clutched at their mothers; and then the grey washed across them and they just weren’t there any more. The grey curtain swept over my house, my mother and my father and my little sister -

Koizumi’s hand rested on my shoulder and I jerked. Sweat had soaked the back of my shirt.

“Kyon,” he said firmly. “Trying to visualize the full reality of the situation is not a good technique when dealing with Suzumiya-san.”

How do you handle it, Koizumi!

“I’m not sure I can put it in words,” said Koizumi. “From the first day I understood my situation, I instinctively knew that to think ‘I am responsible for the whole world’ is only self-indulgence even if it’s true. Trying to self-consciously maintain an air of abnormality will only reduce my mind’s ability to cope.”

Also: I agree that people who want to do alignment research should just go ahead and do alignment research, without worrying about credentials or whether or not they’re smart enough. On a problem as wickedly difficult as alignment, it’s more important to be able to think of even a single actually-promising new approach than to be very intelligent and know lots of math. (Though even people who don’t feel they’re suited for thinking up new approaches can still work on the problem by joining an existing approach.)

The linked post talks about the large value of buying even six months of time, but six months ago it was May 2022. What has been accomplished in AI alignment since then? I think we urgently need to figure out how to make real progress on this problem, and it would be a tragedy if we turned away people who were genuinely enthusiastic about doing that for reasons of “efficiency”. Allocating people to the tasks for which they have the most enthusiasm is efficient.

DaemonicSigil 20 Feb 2020 0:15 UTC

LW: 12 AF: 2

on: Tessellating Hills: a toy model for demons in imperfect search

Here is the code for people who want to reproduce these results, or just mess around:

import torch
import numpy as np
import matplotlib.pyplot as plt

DIMS = 16   # number of dimensions that xn has
WSUM = 5    # number of waves added together to make a splotch
EPSILON = 0.0025 # rate at which xn controlls splotch strength
TRAIN_TIME = 5000 # number of iterations to train for
LEARN_RATE = 0.2   # learning rate

torch.random.manual_seed(1729)

# knlist and k0list are integers, so the splotch functions are periodic
knlist = torch.randint(-2, 3, (DIMS, WSUM, DIMS)) # wavenumbers : list (controlling dim, wave id, k component)
k0list = torch.randint(-2, 3, (DIMS, WSUM))       # the x0 component of wavenumber : list (controlling dim, wave id)
slist = torch.randn((DIMS, WSUM))                # sin coefficients for a particular wave : list(controlling dim, wave id)
clist = torch.randn((DIMS, WSUM))                # cos coefficients for a particular wave : list (controlling dim, wave id)

# initialize x0, xn
x0 = torch.zeros(1, requires_grad=True)
xn = torch.zeros(DIMS, requires_grad=True)

# numpy arrays for plotting:
x0_hist = np.zeros((TRAIN_TIME,))
xn_hist = np.zeros((TRAIN_TIME, DIMS))

# train:
for t in range(TRAIN_TIME):
    ### model: 
    wavesum = torch.sum(knlist*xn, dim=2) + k0list*x0
    splotch_n = torch.sum(
            (slist*torch.sin(wavesum)) + (clist*torch.cos(wavesum)),
            dim=1)
    foreground_loss = EPSILON * torch.sum(xn * splotch_n)
    loss = foreground_loss - x0
    ###
    print(t)
    loss.backward()
    with torch.no_grad():
        # constant step size gradient descent, with some noise thrown in
        vlen = torch.sqrt(x0.grad*x0.grad + torch.sum(xn.grad*xn.grad))
        x0 -= LEARN_RATE*(x0.grad/vlen + torch.randn(1)/np.sqrt(1.+DIMS))
        xn -= LEARN_RATE*(xn.grad/vlen + torch.randn(DIMS)/np.sqrt(1.+DIMS))
    x0.grad.zero_()
    xn.grad.zero_()
    x0_hist[t] = x0.detach().numpy()
    xn_hist[t] = xn.detach().numpy()

plt.plot(x0_hist)
plt.xlabel('number of steps')
plt.ylabel('x0')
plt.show()
for d in range(DIMS):
    plt.plot(xn_hist[:,d])
plt.xlabel('number of training steps')
plt.ylabel('xn')
plt.show()

What links here?

habryka's comment on Tessellating Hills: a toy model for demons in imperfect search by DaemonicSigil (21 Feb 2020 1:40 UTC; 5 points)

DaemonicSigil 7 May 2023 0:34 UTC
11 points
2
in reply to: Alexander Gietelink Oldenziel’s comment on: $250 prize for checking Jake Cannell’s Brain Efficiency
Finished, the post is here: https://www.lesswrong.com/posts/PyChB935jjtmL5fbo/time-and-energy-costs-to-erase-a-bit

Summary of the conclusions is that energy on the order of $k T$ should work fine for erasing a bit with high reliability, and the ~ $50 k T$ claimed by Jacob is not a fully universal limit.

DaemonicSigil 6 Jun 2022 2:27 UTC
LW: 11 AF: 6
7
AF
on: AGI Ruin: A List of Lethalities
Thanks for writing this. I agree with all of these except for #30, since it seems like checking the output of the AI for correctness/safety should be possible even if the AI is smarter than us, just like checking a mathematical proof can be much easier than coming up with the proof in the first place. It would take a lot of competence, and a dedicated team of computer security / program correctness geniuses, but definitely seems within human abilities. (Obviously the AI would have to be below the level of capability where it can just write down an argument that convinces the proof checkers to let it out of the box. This is a sense in which having the AI produce uncommented machine code may actually be safer than letting it write English at us.)