# Prometheus

Karma: 473
• A man asks one of the members of the tribe to find him some kindling so that he may start a fire. A few hours pass, and the second man returns, walking with a large elephant.

“I asked for kindling.” Says the first.

“Yes.” Says the second.

“Where is it?” Asks the first, trying to ignore the large pachyderm in the room.

The second gestures at the elephant, grinning.

“That’s an elephant.”

“I see that you are uninformed. You see, elephants are quite combustible, despite their appearance. Once heat reaches the right temperature, its skin, muscles, all of it will burn. Right down to its bones.”

“What is the ignition temperature for an elephant”

“I don’t know, perhaps 300-400°C”

The first held up two stones.

“This is all I have to start a fire.” He says, “It will only create a few sparks at best… I’m not even sure how I can get it to consistently do that much, given how hard this will be for people thousands of years from now to replicate.”

“That is the challenge.” The second nodded solemnly, “I’m glad you understand the scope of this. We will have to search for ways to generate sparks at 400° so that we can solve the Elephant Kindling Problem.”

“I think I know why you chose the elephant. I think you didn’t initially understand that almost everything is combustible, but only notice things are combustible once you pay enough attention to them. You looked around the Savana, and didn’t understand that dry leaves would be far more combustible, and your eyes immediately went to the elephant. Because elephants are interesting. They’re big and have trunks. Working on an Elephant Problem just felt way more interesting than a Dry Leaves Problem, so you zeroed all of your attention on elephants, using the excuse that elephants are technically combustible, failing to see the elegant beauty in the efficient combustibility of leaves and their low ignition temperature.”

“Leaves might be combustible. But think of how fast they burn out. And how many you would have to gather to start a fire. An elephant is very big. It might take longer to get it properly lit, but once you do, you will have several tons of kindling! You could start any number of fires with it!”

“Would you have really made these conclusions if you had searched all the possible combustible materials in the Savana, instead of immediately focusing on elephants?”

“Listen, we can’t waste too much time on search. There are thousands of things in the Savana! If we tested the combustibility and ignition temperature of every single one of them, we’d never get around to starting any fires. Are elephants the most combustible things in the Universe? Probably not. But should I waste time testing every possible material instead of focusing on how to get one material to burn? We have finite time, and finite resources to search for combustible materials. It’s better to pick one and figure out how to do it well.”

“I still think you only chose elephants because they’re big and interesting.”

“I imagine that ‘big’ and ‘useful as kindling material’ are not orthogonal. We shouldn’t get distracted by the small, easy problems, such as how to burn leaves. These are low hanging fruit that anyone can pick. But my surveys of the tribe have found that figuring out the combustibility of elephants remains extremely neglected.”

“What about the guy who brought me a giraffe yesterday?”

“A giraffe is not an elephant! I doubt anything useful will ever come from giraffe combustibility. Their necks are so long that they will not even fit inside our caves!”

“What I am saying is that others have brought in big, interesting-looking animals, and tried to figure out how to turn them into kindling. Sure, no one else is working on the Elephant Kindling Problem. But that’s also what the guy with the giraffe said, and the zebra, and the python.”

“Excuse me,” Said a third, poking his head into the cave, “But the Python Kindling Problem is very different from the Elephant one. Elephants are too girthy to be useful. But with a python, you can roll it into a coil, which will make it extremely efficient kindling material.”

The second scratched his chin for a moment, looking a bit troubled.

“What if we combined the two?” He asked. “If we wound the python around a leg of the elephant, the heat could be transferred somewhat efficiently.”

“No, no, no.”, argued the third, “I agree combining these two problems might be useful. But it would be far better to just cut the trunk off the elephant, and intertwine it with the python. This could be very useful, since elephant hide is very thick and might burn slower. This gives us the pros of a fast-burning amount of kindling, mixed with a more sustained blaze from the elephant.”

“Might I interject.” Said a fourth voice, who had been watching quietly from the corner, but now stepped forward, “I have been hard at work on the Giraffe Kindling problem, but think that we are actually working on similar things. The main issue has always been the necks. They simply won’t fit inside the cave. We need a solution that works in all edge cases, after all. If it’s raining, we can’t start a fire outside. But if we use the python and the elephant trunk to tie the neck of the giraffe against the rest of its body, we could fit the whole thing in!”

“I think this will be a very fruitful collaboration.” Said the second, “While at first it seemed as though we were all working on different problems, it turns out by combining them, we have found an elegant solution.”

“But we still can’t generate sparks hot enough to combust any of them!” Shouted the first, “All you’ve done is made this even more complicated and messy!”

“I am aware it might seem that way to a novice.” Said the second, “But we have all gained great knowledge in our own domains. And now it is time for our fields to evolve into a true science. We are not amateurs anymore, simply playing around with fire. We are now establishing expertise, creating sub-domains, arriving at a general consensus of the problem and its underlying structure! To an outsider, it will probably look daunting. But so does every scientific field once it matures. And we will continue to make new ground by standing on the shoulders of elephants!”

“Giraffes.” Corrected the fourth.

• I strongly doubt we can predict the climate in 2100. Actual prediction would be a model that also incorporates the possibility of nuclear fusion, geoengineering, AGIs altering the atmosphere, etc.

• I got into AI at the worst time possible

2023 marks the year AI Safety went mainstream. And though I am happy it is finally getting more attention, and finally has highly talented people who want to work in it; personally, it could not have been worse for my professional life. This isn’t a thing I normally talk about, because it’s a very weird thing to complain about. I rarely permit myself to even complain about it internally. But I can’t stop the nagging sensation that if I had just pivoted to alignment research one year sooner than I did, everything would have been radically easier for me.

I hate saturated industries. I hate hyped-up industries. I hate fields that constantly make the news and gain mainstream attention. This was one of the major reasons why I had to leave the crypto scene, because it had become so saturated with attention, grift, and hype, that I found it completely unbearable. Alignment and AGI was one of those things almost no one even knew about, and fewer even talked about, which made it ideal for me. I was happy with the idea of doing work that might never be appreciated or understood by the rest of the world.

Since 2015, I had planned to get involved, but at the time I had no technical experience or background. So I went to college, majoring in Computer Science. Working on AI and what would later be called “Alignment” was always the plan, though. I remember having a shelf in my college dorm, which I used to represent all my life goals and priorities: AI occupied the absolute top. My mentality, however, was that I needed to establish myself enough, and earn enough money, before I could transition to it. I thought I had all the time in the world.

Eventually, I got frustrated with myself for dragging my feet for so long. So in Fall 2022, I quit my job in cybersecurity, accepted a grant from the Long Term Future Fund, and prepared for spending a year of skilling-up to do alignment research. I felt fulfilled. When my brain normally nagged me about not doing enough, or how I should be working on something more important, I finally felt content. I was finally doing it. I was finally working on the Extremely Neglected, Yet Conveniently Super Important Thing.

And then ChatGPT came out two months later, and even my mother was talking about AI.

If I believed in fate, I would say it seems as though I was meant to enter AI and Alignment during the early days. I enjoy fields where almost nothing has been figured out. I hate prestige. I embrace the weird, and hate any field that starts worrying about its reputation. I’m not a careerist. I can imagine many alternative worlds where I got in early, maybe ~2012 (I’ve been around the typical lesswrong/​rationalist/​transhumanist group for my entire adult life). I’d get in, start to figure out the early stuff, identify some of the early assumptions and problems, and then get out once 2022/​2023 came around. It’s the weirdest sensation to feel like I’m too old to join the field now, and also feel as though I’ve been part of the field for 10+ years. I’m pretty sure I’m just 1-2 Degrees from literally everyone in the field.

The shock of the field/​community going from something almost no one was talking about to something even the friggin’ Pope is weighing in on is something I think I’m still trying to adjust to. Some part of me keeps hoping the bubble will burst, AI will “hit a wall”, marking the second time in history Gary Marcus was right about something, and I’ll feel as though the field will have enough space to operate in again. As it stands now, I don’t really know what place it has for me. It is no longer the Extremely Neglected, Yet Conveniently Super Important Thing, but instead just the Super Important Thing. When I was briefly running an AI startup (don’t ask), I was getting 300+ applicants for each role we were hiring in. We never once advertised the roles, but they somehow found them anyway, and applied in swarms. Whenever I get a rejection email from an AI Safety org, I’m usually told they receive somewhere in the range of 400-700 applications for every given job. That’s, at best, a 0.25% chance of acceptance: substantially lower than Harvard. It becomes difficult for me to answer why I’m still trying to get into such an incredibly competitive field, when literally doing anything else would be easier. “It’s super important” is not exactly making sense as a defense at this point, since there are obviously other talented people who would get the job if I didn’t.

I think it’s that I could start to see the shape of what I could have had, and what I could have been. It’s vanity. Part of me really loved the idea of working on the Extremely Neglected, Yet Conveniently Super Important Thing. And now I have a hard time going back to working on literally anything else, because anything else could never hope to be remotely as important. And at the same time, despite the huge amount of new interest in alignment, and huge number of new talent interested in contributing to it, somehow the field still feels undersaturated. In a market-driven field, we would see more jobs and roles growing as the overall interest in working in the field did, since interest normally correlates with growth in consumers/​investors/​etc. Except we’re not seeing that. Despite everything, by most measurements, there seems to still be fewer than 1000 people working on it fulltime, maybe as low as ~300, depending on what you count.

So I oscillate between thinking I should just move on to other things; and thinking I absolutely should be working on this at all cost. It’s made worse by sometimes briefly doing temp work for an underfunded org, sometimes getting to the final interview stage for big labs, and overall thinking that doing the Super Important Thing is just around the corner… and for all I know, it might be. It’s really hard for me to tell if this is a situation where it’s smart for me to be persistent in, or if being persistent is dragging me ever-closer to permanent unemployment, endless poverty/​homelessness/​whatever-my-brain-is-feeling-paranoid-about… which isn’t made easier by the fact that, if the AI train does keep going, my previous jobs in software engineering and cybersecurity will probably not be coming back.

Not totally sure what I’m trying to get out of writing this. Maybe someone has advice about what I should be doing next. Or maybe, after a year of my brain nagging me each day about how I should have gotten involved in the field sooner, I just wanted to admit that: despite wanting the world to be saved, despite wanting more people to be working on the Extremely Neglected, Yet Conveniently Super Important Thing, some selfish, not-too-bright, vain part of me is thinking “Oh, great. More competition.”

• Going to the moon

Say you’re really, really worried about humans going to the moon. Don’t ask why, but you view it as an existential catastrophe. And you notice people building bigger and bigger airplanes, and warn that one day, someone will build an airplane that’s so big, and so fast, that it veers off course and lands on the moon, spelling doom. Some argue that going to the moon takes intentionality. That you can’t accidentally create something capable of going to the moon. But you say “Look at how big those planes are getting! We’ve gone from small fighter planes, to bombers, to jets in a short amount of time. We’re on a double exponential of plane tech, and it’s just a matter of time before one of them will land on the moon!”

Contra Scheming AIs

There is a lot of attention on mesaoptimizers, deceptive alignment, and inner misalignment. I think a lot of this can fall under the umbrella of “scheming AIs”. AIs that either become dangerous during training and escape, or else play nice until humans make the mistake of deploying them. Many have spoken about the lack of an indication that there’s a “humanculi-in-a-box”, and this is usually met with arguments that we wouldn’t see such things manifest until AIs are at a certain level of capability, and at that point, it might be too late, making comparisons to owl eggs, or baby dragons. My perception is that getting something like a “scheming AI” or “humanculi-a-box” isn’t impossible, and we could (and might) develop the means to do so in the future, but that it’s a very, very different kind of thing than current models (even at superhuman level), and that it would take a degree of intentionality.

• “To the best of my knowledge, Vernor did not get cryopreserved. He has no chance to see the future he envisioned so boldly and imaginatively. The near-future world of Rainbows End is very nearly here… Part of me is upset with myself for not pushing him to make cryonics arrangements. However, he knew about it and made his choice.”

• I agree that consequentialist reasoning is an assumption, and am divided about how consequentialist an ASI might be. Training a non-consequentialist ASI seems easier, and the way we train them seems to actually be optimizing against deep consequentialism (they’re rewarded for getting better with each incremental step, not for something that might only be better 100 steps in advance). But, on the other hand, humans don’t seem to have been heavily optimized for this either*, yet we’re capable of forming multi-decade plans (even if sometimes poorly).

*Actually, the Machiavellian Intelligence Hypothesis does seem to be optimizing consequentialist reasoning (if I attack Person A, how will Person B react, etc.)

• This is the kind of political reasoning that I’ve seen poisoning LW discourse lately and gets in the way of having actual discussions. Will posits essentially an impossibility proof (or, in it’s more humble form, a plausibility proof). I humor this being true, and state why the implications, even then, might not be what Will posits. The premise is based on alignment not being enough, so I operate on the premise of an aligned ASI, since the central claim is that “even if we align ASI it may still go wrong”. The premise grants that the duration of time it is aligned is long enough for the ASI to act in the world (it seems mostly timescale agnostic), so I operate on that premise. My points are not about what is most likely to actually happen, the possibility of less-than-perfect alignment being dangerous, the AI having other goals it might seek over the wellbeing of humans, or how we should act based on the information we have.

• 11 Mar 2024 19:02 UTC
−1 points
−2

I’m not sure who are you are debating here, but it doesn’t seem to be me.

First, I mentioned that this was an analogy, and mentioned that I dislike even using them, which I hope implied I was not making any kind of assertion of truth. Second, “works to protect” was not intended to mean “control all relevant outcomes of”. I’m not sure why you would get that idea, but that certainly isn’t what I think of first if someone says a person is “working to protect” something or someone. Soldiers defending a city from raiders are not violating control theory or the laws of physics. Third, the post is on the premise that “even if we created an aligned ASI”, so I was working with that premise that the ASI could be aligned in a way that it deeply cared about humans. Four, I did not assert that it would stay aligned over time… the story was all about the ASI not remaining aligned. Five, I really don’t think control theory is relevant here. Killing yourself to save a village does not break any laws of physics, and is well within most human’s control.

My ultimate point, in case it was lost, was that if we as human intelligences could figure out an ASI would not stay aligned, an ASI could also figure it out. If we, as humans, would not want this (and the ASI was aligned with what we want), then the ASI presumably would also not want this. If we would want to shut down an ASI before it became misaligned, the ASI (if it wants what we want) would also want this.

None of this requires disassembling black holes, breaking the laws of physics, or doing anything outside of that entities’ control.

• I’ve heard of many such cases of this from EA Funds (including myself). My impression is that they only had one person working full-time managing all three funds (no idea if this has changed since I applied or not).

• An incapable man would kill himself to save the village. A more capable man would kill himself to save the village AND ensure no future werewolves are able to bite villagers again.

• Though I tend to dislike analogies, I’ll use one, supposing it is actually impossible for an ASI to remain aligned. Suppose a villager cares a whole lot about the people in his village, and routinely works to protect them. Then, one day, he is bitten by a werewolf. He goes to the Shammon, he tells him when the Full Moon rises again, he will turn into a monster, and kill everyone in the village. His friends, his family, everyone. And that he will no longer know himself. He is told there is no cure, and that the villagers would be unable to fight him off. He will grow too strong to be caged, and cannot be subdued or controlled once he transforms. What do you think he would do?

• MIRI “giving up” on solving the problem was probably a net negative to the community, since it severely demoralized many young, motivated individuals who might have worked toward actually solving the problem. An excellent way to prevent pathways to victory is by convincing people those pathways are not attainable. A positive, I suppose, is that many have stopped looking to Yudkowsky and MIRI for the solutions, since it’s obvious they have none.

• Fair. What would you call a “mainstream ML theory of cognition”, though? Last I checked, they were doing purely empirical tinkering with no overarching theory to speak of (beyond the scaling hypothesis

It tends not to get talked about much today, but there was the PDP (connectionist) camp of cognition vs. the camp of “everything else” (including ideas such as symbolic reasoning, etc). The connectionist camp created a rough model of how they thought cognition worked, a lot of cognitive scientists scoffed at it, Hinton tried putting it into actual practice, but it took several decades for it to be demonstrated to actually work. I think a lot of people were confused by why the “stack more layers” approach kept working, but under the model of connectionism, this is expected. Connectionism is kind of too general to make great predictions, but it doesn’t seem to allow for FOOM-type scenarios. It also seems to favor agents as local optima satisficers, instead of greedy utility maximizers.

• “My position is that there are many widespread phenomena in human cognition that are expected according to my model, and which can only be explained by the more mainstream ML models either if said models are contorted into weird shapes, or if they engage in denialism of said phenomena.”

Such as? I wouldn’t call Shard Theory mainstream, and I’m not saying mainstream models are correct either. On human’s trying to be consistent decision-makers, I have some theories about that (and some of which are probably wrong). But judging by how bad humans are at it, and how much they struggle to do it, they probably weren’t optimized too strongly biologically to do it. But memetically, developing ideas for consistent decision-making was probably useful, so we have software that makes use of our processing power to be better at this, even if the hardware is very stubborn at times. But even that isn’t optimized too hard toward coherence. Someone might prefer pizza to hot dogs, but they probably won’t always choose pizza over any other food, just because they want their preference ordering of food to be consistent. And, sure, maybe what they “truly” value is something like health, but I imagine even if they didn’t, they still wouldn’t do this.

But all of this is still just one piece on the Jenga tower. And we could debate every piece in the tower, and even get 90% confidence that every piece is correct… but if there are more than 10 pieces on the tower, the whole thing is still probably going to come crashing down. (This is the part where I feel obligated to say, even though I shouldn’t have to, that your tower being wrong doesn’t mean “everything will be fine and we’ll be safe”, since the “everything will be fine” towers are looking pretty Jenga-ish too. I’m not saying we should just shrug our shoulders and embrace uncertainty. What I want is to build non-Jenga-ish towers)

• This isn’t what I mean. It doesn’t mean you’re not using real things to construct your argument, but that doesn’t mean the structure of the argument reflects something real. Like, I kind of imagine it looking something like a rationalist Jenga tower, where if one piece gets moved, it all crashes down. Except, by referencing other blog posts, it becomes a kind of Meta-Jenga: a Jenga tower composed of other Jenga towers. Like “Coherent decisions imply consistent utilities”. This alone I view to be its own mini Jenga tower. This is where I think String Theorists went wrong. It’s not that humans can’t, in theory, form good reasoning based on other reasoning based on other reasoning and actually arrive at the correct answer, it’s just that we tend to be really, really bad at it.

The sort of thing that would change my mind: there’s some widespread phenomenon in machine learning that perplexes most, but is expected according to your model, and any other model either doesn’t predict it as accurately, or is more complex than yours.

• I dislike the overuse of analogies in the AI space, but to use your analogy, I guess it’s like you keep assigning a team of engineers to build a car, and two possible things happen. Possibility One: the engineers are actually building car engines, which gives us a lot of relevant information for how to build safe cars (toque, acceleration, speed, other car things), even if we don’t know all the details for how to build a car yet. Possibility Two: they are actually just building soapbox racers, which doesn’t give us much information for building safe cars, but also means that just tweaking how the engineers work won’t suddenly give us real race cars.

• If progress in AI is continuous, we should expect record levels of employment. Not the opposite.

My mentality is if progress in AI doesn’t have a sudden, foom-level jump, and if we all don’t die, most of the fears of human unemployment are unfounded… at least for a while. Say we get AIs that can replace 90% of the workforce. The productivity surge from this should dramatically boost the economy, creating more companies, more trading, and more jobs. Since AIs can be copied, they would be cheap, abundant labor. This means anything a human can do that an AI still can’t becomes a scarce, highly valued resource. Companies with thousands or millions of AI instances working for them would likely compete for human labor, because making more humans takes much longer than making more AIs. Then say, after a few years, AIs are able to automate 90% of the remaining 10%. Then that creates even more productivity, more economic growth, and even more jobs. This could continue for even a few decades. Eventually, humans will be rendered completely obsolete, but by that point (most) of them might be so filthy rich that they won’t especially care.

This doesn’t mean it’ll all be smooth-sailing or that humans will be totally happy with this shift. Some people probably won’t enjoy having to switch to a new career, only for that new career to be automated away after a few years, and then have to switch again. This will probably be especially true for people who are older, those who have families, want a stable and certain future, etc. None of this will be made easier by the fact it’ll probably be hard to tell when true human obsolescence is on the horizon, so some might be in a state of perpetual anxiety, and others will be in constant denial.