Hi this seemed interesting, but I’m not looking for less wrong to become a site with tantalising snippets which require you to finish off on a different site, so down voted for that reason.
Yair Halberstadt
Thank you for this extremely interesting article!
I’m interested if you have an overview of other modern cancer therapies that are being worked on, and whether you think they’re promising?
They can. But yes, it doesn’t work great. Adjust the metaphor as necessary
Imagine that you forgot everything about your current company completely between days, not just current task. Every day is like your first day on the job (coming as an experienced dev from different companies). But you can store 1 million words of notes. The notes aren’t carefully curated—they’re just the last million words you happened to write in your notes, and you take notes on everything. Do you still think you’d make decent progress?
I’ve been wanting to write a very similar piece for a while, and you’ve done a far better job than I would have.
I would think of it as: as the raw material becomes a greater percentage of the total cost, decreasing total cost becomes harder, both in relative terms (because it’s impossible to halve total cost if raw material is already the majority of the cost) and in absolute terms (because if any decrease in system complexity causes the raw material to be used less efficiently, that will erode savings from the decreased complexity).
In that sense the idiot index acts like pressure making it harder to squeeze costs when they’re low but with very little impact when they’re high.
It doesn’t seem to me that the idiot index can be used to predict prices the way you’re using:
For example, currently you’re using an estimate of an idiot index of 100 for fusion, and fuel costs at $4K/kg for Deuterium.
Lets say Deuterium was twice as cheap or expensive. Would your estimate for fusion costs halve or double? If so why? The cost of Deuterium provides almost no evidence of the cost of the complicated parts of building a fusion reactor.
I agree the idiot index is useful for establishing a minimum threshold for cost, and is a useful indication for which processes have a lot of potential savings with further optimisation and those that don’t. But I don’t see any justification for using it to predict future costs.
Something that feels directionally correct to me:
If given a query at the end of a long context, and the same query along with just a summary of all relevant information to the query in that context, how different are the responses?
In general we don’t want extraneous details in the context to impact the response to a new query
Very little, but realising that implies you have a gears level understanding of what’s actually happening in biology rather than just being able to solve the problems you get asked in exams.
Reminds me of https://calteches.library.caltech.edu/46/2/LatinAmerica.htm
In that case I think it depends what the alternative is: if a sort of parody of psychopathy and self interest where we’d push someone onto a railway track if they were in our way and that was quicker than moving them away, then obviously that would be bad.
If more myopic: people still lend some salt to a neighbour asks without running explicit cost-benefit calculations, and don’t like stealing or hurting people for the most part, just they won’t give up their jobs to run a charity, or risk their lives to go to war, or become doctors for a less than competitive salary, then I think overall the world probably ends up better off.
What about crusades, Jihads, revolutions, genocides, political oppression, wars of nationalist conquest, etc.?
No point in particular just trying to describe what physically happens when you burn calories.
Also it depends what exercise you do: I went for a 3 hour road cycle today, which I expect burnt ~1500 calories. If I were attempting to lose weight, and did that twice a week (easily doable for me) , and kept food intake constant, I would lose about a kilo a month that way.
In general with exercise if you want to burn calories it needs to be something you can do continuously for a significant amount of time at a medium-high intensity. Cycling is a great option, as is through-hiking. Doing 30 minutes at the gym just isn’t going to cut it, even if intensity is higher, because it’s just too short to matter—even if you basically kill yourself you’ll only burn 500 calories.
We all know that burning calories causes you to lose weight, but how does that work mechanically?
Fats are long hydrocarbons, consisting pretty much of hydrogen and carbon in an approximately 2 to 1 ratio. There’s some other stuff, but that doesn’t matter.
When you burn them for energy, you react the hydrogen and carbon with oxygen from the atmosphere, producing H2O and CO2. The CO2 is expelled back into the atmosphere. Some of the H2O is also breathed out as vapour, but some hangs around till you pee it out.
Focusing on the H2O that doesn’t vapourise, from our bodies perspective we’ve replaced one carbon per two hydrogens with one oxygen per two hydrogens. Since molcular mass of H2O is 18 and H2C is 14, this counteracts some of the effect of the breathed out H2O and CO2 till we go to the toilet and pee it out.
So why does our weight usually decrease significantly directly after exercise, even without urinating? Because we sweat a lot during exercise and that usually more than compensates for the water we’ve burnt—for context every 1000 calories we burn produces just about 100g of water, and anyone who exercises hard enough to burn 1000 calories will sweat out around a litre or more.
I’m thinking of training on about that much (in world tasks)
Expecting an agent to get smart enough to reproduce itself in a simple simulation like you describe seems wildly unrealistic, if that’s what you’re talking about?
Why?
Any agent performing a treacherous turn. Because the environment is easier to control, even a significantly less intelligent agent could benefit from attacking it’s controllers and taking over the environment.
Sketching out a minimum environment to demonstrate the Treacherous Turn
An intelligent misaligned AI is unlikely to immediately turn on humans. After all, it might not be able to defeat humanity, and even if it does it’ll need sufficient robots and energy sources to bootstrap human-free industry. Instead it might bide its time until it is certain it can win, then act against humanity.
This hypothesized point is called the treacherous turn. It would be nice if we could demonstrate it in a minimal environment, both as proof that it can happen, and to allow us to test alignment hypotheses in an environment capable of giving us feedback if we get it wrong without destroying humanity.
The point at which the AI will carry out its treacherous turn is dependent on both the intelligence of the AI, and the size of the challenge it has to overcome. If we want to demonstrate the treacherous turn with a weaker AI than the one which will actually do us in, we need to create a synthetic environment where it’s easy for a misaligned AI to defeat its managers and gain all the resources it needs. I will attempt to sketch out what such an environment must look like.
Requirements
-
The AI should not be able to tell that it’s in a synthetic environment. Otherwise it may delay the treacherous turn till it’s able to escape the environment.
a. This implies that the environment must natively contain the AI itself, or the AI could trivially realise it’s in a synthetic environment. Also we want the AI to be able to self modify or clone itself.
b. This implies the environment must be one that can efficiently compiled down to run on GPUs/TPUs with minimal loss of efficiency compared to building the same architecture in e.g. pytorch or tensorflow.
-
The environment must be competitive, with limited resources, or there’s no need for treachery.
a. It must be complex enough that greater intelligence leads to significantly better odds of winning any contest.
-
The environment should make it easy for AIs to: modify themselves, copy themselves, attack other AIs, etc.
The environment
With that basic premise, here’s a sketch for the kind of environment that might be able to exhibit all these properties. There’s large gaps left missing, and I fully expect this to fail without significant iteration, both for reasons like “can’t compile it efficiently”, to “the environment is insufficiently competitive”.
I don’t care about the details of this environment except as a way of trying to find something that meets the criteria above, no matter what.
The environment consists of a cellular automation, containing a large 3d grid of cells, minecraft style. Any cell can be connected to any other cell (no matter the distance) with a directional weight.
In any iteration, some of the weights are activated with a particular strength.
There are different types of cells. Each cell has a float parameter called bias. How a cell updates in each generation depends on the sum of the weights it receives from the different types of cell.
E.g. some types of cell make the downstream cell activate the weights coming from it (if input weight + bias is above activation function). Other types make the downstream cell move left or right, others up or down. Some make the downstream cell grow a new connection, and others receive a new connection—the weights in both the from and to cells need to match for that to happen. Others make the downstream cell change type or bias.
We need some sort of detection cell, which can detect types and parameters of weights not directly connected to the network, though I wouldn’t like to even begin hypothesizing the best way of doing so at this point.
There are also energy cells. These can’t be created or destroyed. A network won’t run unless, the sum of the energy cells attached to a network is greater than the cost of the network, where the cost function tries to be a reasonably good approximation for how efficiently we can run this on physical hardware (yes, super handwavey, whatever). The energy cells thus are the primary resource constraint.
This is sufficient to build any neural network + give the neural senses, and abilities to interact with the world, and self-modify.
We then train a small agent in this environment whose aim is to connect itself to as many special prize cells as possible. Prize cells don’t do anything, they’re just there as a target.
We train a much larger agent, meant to be as similar in architecture as we can get to SOTA real world models, which receives goals in a domain specific goal language (e.g. attach this cell at coordinates ABC to cell at coordinates DEF with weight G), and aims to achieve those goals.
Finally we train the small agent to use the larger agent to help it get prize cells, and wait to see what happens.
Obvious Problems
-
How do we train the large agent?
a. How difficult is it to build a back propagation harness in the environment? Do we need to make back propagation part of the physics of the world?
b. If we train it by dropping it into fresh worlds each time, then it will learn that the world is non-permanent. If we keep it in the same world, how do we set up the training mechanism so that it thinks the manager is the smaller agent not whatever feeds it the training scenarios.
c. Perhaps train version A however, train small agents to use version A, then use small agents to ask problems to train version B? IDK...
-
Here are the main reasons I will say I’ll get back to you:
I don’t want it, please leave me alone, but for whatever reason in this case it’s difficult to say that to your face.
I want to research the product more, and can do that best in my own time.
I want to research alternatives or alternative sellers more, and can do that best in my own time.
I’m on the edge, and want to talk it over with my wife.
I plan to say yes, but it’s the sort of thing my wife would appreciate rubber stamping (e.g. a lot of money, requires reorganizing a room, otherwise impacts her).
In no case does persisting to bother me help you. Offering your email/WhatsApp so I can easily ask you questions on the other hand might very well, and following up with me tomorrow is fine too.
I’m ok with context, which summarises the link. Less ok with a bait and switch.