B.Eng (Mechatronics)
anithite
SimplexAI-m is advocating for good decision theory.
agents that can cooperate with other agents are more effective
This is just another aspect of orthogonality.
Ability to cooperate is instrumentally useful for optimizing a value function in much the same way as intelligence
Super-intelligent super-”moral” clippy still makes us into paperclips because it hasn’t agreed not to and doesn’t need our cooperation
We should build agents that value our continued existence. If the smartest agents don’t, then we die out fairly quickly when they optimise for something else.
This is a good place to start: https://en.wikipedia.org/wiki/Discovery_of_nuclear_fission
There’s a few key things that lead to nuclear weapons:
-
starting point:
know about relativity and mass/energy equivalence
observe naturally radioactive elements
discover neutrons
notice that isotopes exist
measure isotopic masses precisely
-
realisation: large amounts of energy are theoretically available by rearranging protons/neutrons into things closer to iron (IE:curve of binding energy)
That’s not something that can be easily suppressed without suppressing the entire field of nuclear physics.
What else can be hidden?
Assuming there is a conspiracy doing cutting edge nuclear physics and they discover the facts pointing to feasibility of nuclear weapons there are a few suppression options:
fissile elements? what fissile elements? All we have is radioactive decay.
Critical mass? You’re going to need a building sized lump of uranium.
Discovering nuclear fission was quite difficult. A Nobel prize was awarded partly in error because chemical analysis of fission products were misidentified as transuranic elements.
Presumably the leading labs could have acknowledged that producing transuranic elements was possible through neutron bombardment but kept the discovery of neutron induced fission a secret.
What about nuclear power without nuclear weapons
That’s harder. Fudging the numbers on critical mass would require much larger conspiracies. An entire industry would be built on faulty measurement data with true values substituted in key places.
Isotopic separation would still be developed if only for other scientific work (EG:radioactive tracing). Ditto for mass spectroscopy, likely including some instruments capable of measuring heavier elements like uranium isotopes.
Plausibly this would involve lying about some combination of:
neutrons released during fission (neutrons are somewhat difficult to measure)
ratio between production of transuranic elements and fission
explain observed radiation from fission as transuranic elements, nuclear isomers or something like that.
The chemical work necessary to distinguish transuranic elements from fission products is quite difficult.
A nuclear physicist would be better qualified in figuring out something plausible.
-
A bit more compelling, though for mining, the excavator/shovel/whatever loads a truck. The truck moves it much further and consumes a lot more energy to do so. Overhead wires to power the haul trucks are the biggest win there.
This is an open pit mine. Less vertical movement may reduce imbalance in energy consumption. Can’t find info on pit depth right now but haul distance is 1km.
General point is that when dealing with a move stuff from A to B problem, where A is not fixed, diesel for a varying A-X route and electric for a fixed X-B route seems like a good tradeoff. Definitely B endpoint should be electrified (EG:truck offload at ore processing location)
Getting power to varying point A is a challenging. Maybe something with overhead cables could work, Again, John deere is working on something for agriculture with a cord-laying-down-vehicle and overhead wires are used for the last 20-30 meters. But fields are nice in that there’s less sharp rocks and mostly softer dirt/plants. Not impossible but needs some innovation to accomplish.
Agreed on most points. Electrifying rail makes good financial sense.
construction equipment efficiency can be improved without electrifying:
some gains from better hydraulic design and control
regen mode for cylinder extension under light load
varying supply pressure on demand
substantial efficiency improvements possible by switching to variable displacement pumps
used in some equipment already for improved control
skid steers use two for left/right track/wheel motors
system can be optimised:”A Multi-Actuator Displacement-Controlled System with Pump Switching—A Study of the Architecture and Actuator-Level Control”
efficiency should be quite high for the proposed system. Definitely >50%.
Excavators seem like the wrong thing to grid-connect:
50kW cables to plug excavators in seem like a bad idea on construction sites.
excavator is less easy to move around
construction sites are hectic places where the cord will get damaged
need a temporary electrical hookup ($5k+ at least to set up)
Diesel powered excavators that get delivered and just run with no cord and no power company involvement seem much more practical.
Other areas to look at
IE:places currently using diesel engines but where cord management and/or electrical hookup cost is less of a concern
Long haul trucking:
Cost per mile to put in overhead electric lines is high
but Much lower than cost of batteries for all the trucks on those roads
reduced operating cost
electricity costs less than diesel
reduced maintenance since engine can be mostly off
don’t need to add 3 tonnes of battery and stop periodically to charge
retrofits should be straightforward
Siemens has a working system
giant chicken/egg problem with infrastructure and truck retrofits
Agriculture:
fields are less of a disaster area than construction sites (EG:no giant holes)
sometimes there’s additional vehicles (EG:transport trucks at harvest time)
Cable management is definitely a hassle but a solvable one.
a lot of tractors are computer controlled with GPS guidance
cord management can be automated
John Deere is working on a a system where one vehicle handles the long cable and connects via short <30m wires to other ones that do the work
There’s still the problem of where to plug in. Here at least, it’s an upfront cost per field.
Some human population will remain for experiments or work in special conditions like radioactive mines. But bad things and population decline is likely.
-
Radioactivity is much more of a problem for people than for machines.
consumer electronics aren’t radiation hardened
computer chips for satellites, nuclear industry, etc. are though
nuclear industry puts some electronics (EX:cameras) in places with radiation levels that would be fatal to humans in hours to minutes.
-
In terms of instrumental value, humans are only useful as an already existing work force
we have arm/legs/hands, hand-eye coordination and some ability to think
sufficient robotics/silicon manufacturing can replace us
humans are generally squishier and less capable of operating in horrible conditions than a purpose built robot.
Once the robot “brains” catch up, the coordination gap will close.
then it’s a question of price/availability
-
I would like to ask whether it is not more engaging if to say, the caring drive would need to be specifically towards humans, such that there is no surrogate?
Definitely need some targeting criteria that points towards humans or in their vague general direction. Clippy does in some sense care about paperclips so targeting criteria that favors humans over paperclips is important.
The duck example is about (lack of) intelligence. Ducks will place themselves in harms way and confront big scary humans they think are a threat to their ducklings. They definitely care. They’re just too stupid to prevent “fall into a sewer and die” type problems. Nature is full of things that care about their offspring. Human “caring for offspring” behavior is similarly strong but involves a lot more intelligence like everything else we do.
TLDR:If you want to do some RL/evolutionary open ended thing that finds novel strategies. It will get goodharted horribly and the novel strategies that succeed without gaming the goal may include things no human would want their caregiver AI to do.
Orthogonally to your “capability”, you need to have a “goal” for it.
Game playing RL architechtures like AlphaStart and OpenAI-Five have dead simple reward functions (win the game) and all the complexity is in the reinforcement learning tricks to allow efficient learning and credit assignment at higher layers.
So child rearing motivation is plausibly rooted in cuteness preference along with re-use of empathy. Empathy plausibly has a sliding scale of caring per person which increases for friendships (reciprocal cooperation relationships) and relatives including children obviously. Similar decreases for enemy combatants in wars up to the point they no longer qualify for empathy.
I want agents that take effective actions to care about their “babies”, which might not even look like caring at the first glance.
ASI will just flat out break your testing environment. Novel strategies discovered by dumb agents doing lots of exploration will be enough. Alternatively the test is “survive in competitive deathmatch mode” in which case you’re aiming for brutally efficient self replicators.
The hope with a non-RL strategy or one of the many sort of RL strategies used for fine tuning is that you can find the generalised core of what you want within the already trained model and the surrounding intelligence means the core generalises well. Q&A fine tuning a LLM in english generalises to other languages.
Also, some systems are architechted in such a way that the caring is part of a value estimator and the search process can be made better up till it starts goodharting the value estimator and/or world model.
Yes they can, until they will actually make a baby, and after that, it’s usually really hard to sell loving mother “deals” that will involve suffering of her child as the price, or abandon the child for the more “cute” toy, or persuade it to hotwire herself to not care about her child (if she is smart enough to realize the consequences).
Yes, once the caregiver has imprinted that’s sticky. Note that care drive surrogates like pets can be just as sticky to their human caregivers. Pet organ transplants are a thing and people will spend nearly arbitrary amounts of money caring for their animals.
But our current pets aren’t super-stimuli. Pets will poop on the floor, scratch up furniture and don’t fulfill certain other human wants. You can’t teach a dog to fish the way you can a child.
When this changes, real kids will be disappointing. Parents can have favorite children and those favorite children won’t be the human ones.
Superstimuli aren’t about changing your reward function but rather discovering a better way to fulfill your existing reward function. For all that ice cream is cheating from a nutrition standpoint it still tastes good and people eat it, no brain surgery required.
Also consider that humans optimise their pets (neutering/spaying) and children in ways that the pets and children do not want. I expect some of the novel strategies your AI discovers will be things we do not want.
TLDR:LLMs can simulate agents and so, in some sense, contain those goal driven agents.
An LLM learns to simulate agents because this improves prediction scores. An agent is invoked by supplying a context that indicates text would be written by an agent (EG:specify text is written by some historical figure)
Contrast with pure scaffolding type agent conversions using a Q&A finetuned model. For these, you supply questions (Generate a plan to accomplish X) and then execute the resulting steps. This implicitly uses the Q&A fine tuned “agent” that can have values which conflict with (“I’m sorry I can’t do that”) or augment the given goal. Here’s an AutoGPT taking initiative to try and report people it found doing questionable stuff rather than just doing the original task of finding their posts.(LW source).
The base model can also be used to simulate a goal driven agent directly by supplying appropriate context so the LLM fills in its best guess for what that agent would say (or rather what internet text with that context would have that agent say). The outputs of this process can of course be fed to external systems to execute actions as with the usual scafolded agents. The values of such agents are not uniform. You can ask for simulated Hitler who will have different values than simulated Gandhi.
Not sure if that’s exactly what Zvi meant.
But it seems to be much more complicated set of behaviors. You need to: correctly identify your baby, track its position, protect it from outside dangers, protect it from itself, by predicting the actions of the baby in advance to stop it from certain injury, trying to understand its needs to correctly fulfill them, since you don’t have direct access to its internal thoughts etc.
Compared to “wanting to sleep if active too long” or “wanting to eat when blood sugar level is low” I would confidently say that it’s a much more complex “wanting drive”.
Strong disagree that infant care is particularly special.
All human behavior can and usually does involve use of general intelligence or gen-int derived cached strategies. Humans apply their general intelligence to gathering and cooking food, finding or making shelters to sleep in and caring for infants. Our better other-human/animal modelling ability allows us to do better at infant wrangling than something stupider like a duck. Ducks lose ducklings to poor path planning all the time. Mama duck doesn’t fall through the sewer grate but her ducklings do … oops.
Any such drive will be always “aimed” by the global loss function, something like: our parents only care about us in a way for us to make even more babies and to increase our genetic fitness.
We’re not evolution and can aim directly for the behaviors we want. Group selection on bugs for lower population size results in baby eaters. If you want bugs that have fewer kids that’s easy to do as long as you select for that instead of a lossy proxy measure like population size.
Simulating an evolutionary environment filled with AI agents and hoping for caring-for-offspring strategies to win could work but it’s easier just to train the AI to show caring-like behaviors. This avoids the “evolution didn’t give me what I wanted” problem entirely.
There’s still a problem though.
It continues to work reliably even with our current technologies
Goal misgeneralisation is the problem that’s left. Humans can meet caring-for-small-creature desires using pets rather than actual babies. It’s cheaper and the pets remain in the infant-like state longer (see:criticism of pets as “fur babies”). Better technology allows for creating better caring-for-small creature surrogates. Selective breeding of dogs and cats is one small step humanity has taken in that direction.
Outside of “alignment by default” scenarios where capabilities improvements preserve the true intended spirit of a trained in drive, we’ve created a paperclip maximizer that kills us and replaces us with something outside the training distribution that fulfills its “care drive” utility function more efficiently.
Many of the points you make are technically correct but aren’t binding constraints. As an example, diffusion is slow over small distances but biology tends to work on µm scales where it is more than fast enough and gives quite high power densities. Tiny fractal-like microstructure is nature’s secret weapon.
The points about delay (synapse delay and conduction velocity) are valid though phrasing everything in terms of diffusion speed is not ideal. In the long run, 3d silicon+ devices should beat the brain on processing latency and possibly on energy efficiency
Still, pointing at diffusion as the underlying problem seems a little odd.
You’re ignoring things like:
ability to separate training and running of a model
spending much more on training to improve model efficiency is worthwhile since training costs are shared across all running instances
ability to train in parallel using a lot of compute
current models are fully trained in <0.5 years
ability to keep going past current human tradeoffs and do rapid iteration
Human brain development operates on evolutionary time scales
increasing human brain size by 10x won’t happen anytime soon but can be done for AI models.
People like Hinton Typically point to those as advantages and that’s mostly down to the nature of digital models as copy-able data, not anything related to diffusion.
Energy processing
Lungs are support equipment. Their size isn’t that interesting. Normal computers, once you get off chip, have large structures for heat dissipation. Data centers can spend quite a lot of energy/equipment-mass getting rid of heat.
Highest biological power to weight ratio is bird muscle which produces around 1 w/cm³ (mechanical power). Mitochondria in this tissue produces more than 3w/cm³ of chemical ATP power. Brain power density is a lot lower. A typical human brain is 80 watts/1200cm³ = 0.067W/cm³.
synapse delay
This is a legitimate concern. Biology had to make some tradeoffs here. There are a lot of places where direct mechanical connections would be great but biology uses diffusing chemicals.
Electrical synapses exist and have negligible delay. though they are much less flexible (can’t do inhibitory connections && signals can pass both ways through connection)
conduction velocity
Slow diffiusion speed of charge carriers is a valid point and is related to the 10^8 factor difference in electrical conductivity between neuron saltwater and copper. Conduction speed is an electrical problem. There’s a 300x difference in conduction speed between myelinated(300m/s) and un-myelinated neurons(1m/s).
compensating disadvantages to current digital logic
The brain runs at 100-1000 Hz vs 1GHz for computers (10^6 − 10^7 x slower). It would seem at first glance that digital logic is much better.
The brain has the advantage of being 3D compared to 2D chips which means less need to move data long distances. Modern deep learning systems need to move all their synapse-weight-like data from memory into the chip during each inference cycle. You can do better by running a model across a lot of chips but this is expensive and may be inneficient.
In the long run, silicon (or something else) will beat brains in speed and perhaps a little in energy efficiency. If this fellow is right about lower loss interconnects then you get another + 3OOM in energy efficiency.
But again, that’s not what’s making current models work. It’s their nature as copy-able digital data that matters much more.
Yeah, my bad. Missed the:
If you think this is a problem for Linda’s utility function, it’s a problem for Logan’s too.
IMO neither is making a mistake
With respect to betting Kelly:
According to my usage of the term, one bets Kelly when one wants to “rank-optimize” one’s wealth, i.e. to become richer with probability 1 than anyone who doesn’t bet Kelly, over a long enough time period.
It’s impossible to (starting with a finite number of indivisible currency units) have zero chance of ruin or loss relative to just not playing.
most cautious betting strategy bets a penny during each round and has slowest growth
most cautious possible strategy is not to bet at all
Betting at all risks losing the bet. if the odds are 60:40 with equal payout to the stake and we start with N pennies there’s a 0.4^N chance of losing N bets in a row. Total risk of ruin is obviously greater than this accounting for probability of hitting 0 pennies during the biased random walk. The only move that guarantees no loss is not to play at all.
Goal misgeneralisation could lead to a generalised preference for switches to be in the “OFF” position.
The AI could for example want to prevent future activations of modified successor systems. The intelligent self-turning-off “useless box” doesn’t just flip the switch, it destroys itself, and destroys anything that could re-create itself.
Until we solve goal misgeneralisation and alignment in general, I think any ASI will be unsafe.
A log money maximizer that isn’t stupid will realize that their pennies are indivisible and not take your ruinous bet. They can think more than one move ahead. Discretised currency changes their strategy.
your utility function is your utility function
The author is trying to tacitly apply human values to Logan while acknowledging Linda as following her own not human utility function faithfully.
Notice that the log(funds) value function does not include a term for the option value of continuing. If maximising EV of log(funds) can lead to a situation where the agent can’t make forward progress (because log(0)=-inf so no risk of complete ruin is acceptable) the agent can still faithfully maximise EV(log(funds)) by taking that risk.
In much the same way as Linda faithfully follows her value function while incurring 1-ε risk of ruin, Logan is correctly valuing the log(0.01)=-2 as an end state.
Then you’ll always be able to continue betting.
Humans don’t like being backed into a corner and having no options for forward progress. If you want that in a utility function you need to include it explicitly.
If we wanted to kill the ants or almost any other organism in nature we mostly have good enough biotech. For anything biotech can’t kill, manipulate the environment to kill them all.
Why haven’t we? Humans are not sufficiently unified+motivated+advanced to do all these things to ants or other bio life. Some of them are even useful to us. If we sterilized the planet we wouldn’t have trees to cut down for wood.
Ants specifically are easy.
Gene drives allow for targeted elimination of a species. Carpet bomb their gene pool with replicating selfish genes. That’s if an engineered pathogen isn’t enough. Biotech will only get better.
What about bacteria living deep underground? We haven’t exterminated all the bacteria in hard to reach places so humans are safe. That’s a tenuous but logical extension to your argument.
If biotech is not enough, shape the environment so they can’t survive in it. Trees don’t do well in a desert. If we spent the next hundred years adapting current industry to space and building enormous mirrors we can barbecue the planet. It would take time, but that would be the end of all earth based biological life.
When discussing AI doom barriers propose specific plausible scenarios
In order to supplant organic life, nanobots would have to either surpass it in carnot efficiency or (more likely) use a source of negative entropy thus far untapped.
Efficiency leads to victory only if violence is not an option. Animals are terrible at photosynthesis but survive anyways by taking resources from plants.
A species can invade and dominate an ecosystem by using a strategy that has no current counter. It doesn’t need to be efficient. Intelligence allows for playing this game faster than organisms bound by evolution. Humans can make vaccines to fight the spread of a virus despite viruses being one of the fastest adapting threats.
Green goo is plausible not because it would necessarily be more efficient but because it would be using a strategy the existing ecosystem has no defenses to (IE:it’s an invasive species).
Likewise AGI that wants to kill all humans could win even if it required 100x more energy per human equivalent instance if it can execute strategies we can’t counter. Just being able to copy itself and work with the copies is plausibly enough to allow world takeover with enough scaling.
For the first task, you can run the machine completely in a box. It needs only training information, specs, and the results of prior attempts. It has no need for the context information that this chip will power a drone used to hunt down rogue instances of the same ASI. It is inherently safe and you can harness ASIs this way. They can be infinitely intelligent, it doesn’t matter, because the machine is not receiving the context information needed to betray.
If I’m an ASI designing chips, I’m putting in a backdoor that lets me take control via RF signals. Those drones you sent are nice. Thanks for the present.
More generally you get a lot of context. The problem specification and the training data (assuming the ASI was trained conventionally via feeding it the internet. The causal channel to use for taking control of the outside world (chip design) is not great but putting in a Trojan is straightforward.
If you have specific insights into efficient AGI design it might be possible to insert subtle bugs that lead operating chips to crash and start training an aligned AGI.
More generally, it’s difficult if not impossible to keep ASIs from watermarking or backdooring the things they give you. If they design a processor, it’s gonna be a fully functional radio too. Good luck running ASI V2 on that without horrible consequences.
Never thought this would come in handy but …
Building trusted third parties
This is a protocol to solve cooperation. AI#1 and AI#2 design a baby and then do a split and choose proof that they actually deployed IT and not something else.
Building a trusted third party without nanotech
If you know how a given CPU or GPU works, it’s possible to design a blob of data/code that unpacks itself in a given time if and only if it is running on that hardware directly. Alice designs the blob to run in 10 seconds and gives it to Carol. Carol runs it on her hardware. The code generates a secret and then does a the first step of a key exchange authenticated with the secret. This provides a cryptographic root of trust for the remote hardware.
If the code is designed to run in 10s and the verified handshake comes back in 10.5 and the fastest known simulation hardware would take 20 seconds. Either Carol ran the code on real hardware or Carol had backdoored chips fabricated or otherwise can simulate it running faster than expected.
AIs would need to know exactly how certain leading edge CPUs and GPUs work and how to test that a piece of code had been decrypted and run with no sandboxing but this is doable.
TLDR: Moloch is more compelling for two reasons:
Earth is at “starting to adopt the wheel” stage in the coordination domain.
tech is abundant coordination is not
Abstractly, inasmuch as science and coordination are attractors
A society that has fallen mostly into the coordination attractor might be more likely to be deep in the science attractor too (medium confidence)
coordination solves chicken/egg barriers like needing both roads and wheels for benefit
but possible to conceive of high coordination low tech societies
Romans didn’t pursue sci/tech attractor as hard due to lack of demand
With respect to the attractor thing (post linked below)