I often wish I had a better way to concisely communicate “X is a hypothesis I am tracking in my hypothesis space”. I don’t simply mean that X is logically possible, and I don’t mean I assign even 1-10% probability to X, I just mean that as a bounded agent I can only track a handful of hypotheses and I am choosing to actively track this one.
This comes up when a substantially different hypothesis is worth tracking but I’ve seen no evidence for it. There’s a common sentence like “The plumber says it’s fixed, though he might be wrong” where I don’t want to communicate that I’ve got much reason to believe he might be wrong, and I’m not giving it even 10% or 20%, but I still think it’s worth tracking, because strong evidence is common and the importance is high.
This comes up in adversarial situations when it’s possible that there’s an adversarial process selecting on my observations. In such situations I want to say “I think it’s worth tracking the hypothesis that the politician wants me to believe that this policy worked in order to pad their reputation, and I will put some effort into checking for evidence of that, but to be clear I haven’t seen any positive evidence for that hypothesis in this case, and will not be acting in accordance with that hypothesis unless I do.”
This comes up when I’m talking to someone about a hypothesis that they think is likely and I haven’t thought about before, but am engaging with during the conversation. “I’m tracking your hypothesis would predict something different in situation A, though I haven’t seen any clear evidence for privileging your hypothesis yet and we aren’t able to check what’s actually happening in situation A.”
A phrase people around me commonly use is “The plumber says it’s fixed, though it’s plausible he’s mistaken”. I don’t like it. It feels too ambiguous with “It’s logically possible” and “I think it’s reasonably likely, like 10-20%” and neither of which is what I mean. This isn’t a claim about its probability, it’s just a claim about it being “worth tracking”.
I could say “I am privileging this hypothesis” but that still seems to be a claim about probability, when often it’s more a claim about importance-if-true, and I don’t actually have any particular evidence for it.
I often say that a hypothesis is “on the table” as way to say it’s in play without saying that it’s probable. I like this more but I don’t feel satisfied yet.
TsviBT suggested “it’s a live hypothesis for me”, and I also like that, but still don’t feel satisfied.
How these read in the plumber situation:
“The plumber says it’s fixed, though I’m still going to be on the lookout for evidence that he’s wrong.”
“The plumber says it’s fixed, though it’s plausible he’s wrong.”
“The plumber says it’s fixed, and I believe him (though it’s worth tracking the hypothesis that’s he’s mistaken).”
“The plumber says it’s fixed, though it’s a live hypothesis for me that he’s mistaken.”
“The plumber says it’s fixed, though I am going to continue to privilege the hypothesis that he’s mistaken.”
“The plumber says it’s fixed, though it’s on the table that he’s wrong about that.”
Interested to hear any other ways people communicate this sort of thing!
Added: I am reacting with a thumbs-up to all the suggestions I like in the replies below.
Adapted from the french “j’envisage que X” I propose “I am considering the possibility that X” or in some contexts “I am considering X”.
“The plumber says it’s fixed, but I am considering he might be wrong”.
What’s wrong with your original sentence, “X is a hypothesis I am tracking in my hypothesis space”? Or more informal versions of that, like “I’ll be keeping an eye on that”, “We’ll see”, etc.?
“Trust, but verify.”
In the plumbing context I generally say or think, “The repair/work has been completed and I’ll see how it lasts.” or sometimes something like, “We’ve addressed the immediate problem so lets see if that was a fix or a bandage.”
“The plumber says it’s fixed, but I’ll keep an eye out for evidence of more problems.” (ditto Dagon) also “The politician seems to be providing sound evidence that her policy is working, but I’ll remain vigilant to the possibility that she’s being deceptive.”
“People are saying …”
As in, “The plumber says it’s fixed, but people are saying it’s not.”
This also lends itself to loosely indicating probabilities with “Some people are saying …” or “Many people are saying ….”
...after two readings of this obviously awful recommendation I have come to believe that it is a joke.
I’m entertaining the hypothesis that it’s perfectly serious. People are saying that there’s a wide variance in the typical discussion norms around home repair.
Maybe “I’m interested in the hypothesis/possibility...”
Standard text in customer-facing outage recovery notices: all systems appear to be operating correctly, and we are actively monitoring the situation”.
In more casual conversations, I sometimes say “cautiously optimistic” when stating that I think things are OK, but I’m paying more attention than normal for signs I’m wrong. Mostly, I talk about my attention and what I’m looking for, rather than specifying the person who’s making claims. Instead of “the plumber says it’s fixed, though he might be wrong”, I’d say “The plumber fixed it, but I’m keeping an eye out for further problems”. For someone proposing something I haven’t thought about, “I haven’t noticed that, but I’ll pay more attention for X and Y in the future”.
In some cases something like this might work:
“The plumber says it’s fixed, so hopefully it is”
“The plumber says it’s fixed, so it probably is”
Which I think conveys”there’s an assumption I’m making here, but I’m just putting a flag in the ground to return to if things don’t play out as expected”
So copilot is still prone to falling into an arrogant attractor with a fairly short prompt that is then hard to reverse with a similar prompt: reddit post
What is the functional difference between Agency and having social power? This is likely a question that reflects my ignorance of the connotations of ‘Agency’ in Rationalist circles. When people say “he’s a powerful man in this industry” does that imply he is greatly Agentic? Can one be Agentic without having social power? Is one the potential and the other the actuality?
“Agency” is rationalist jargon for “initiative”, i.e. the ability to initiate things.
Agency has little to do with social power. It’s kind of hard to describe agency, but it’s characterized by deliberateness: carefully and consciously thinking about your goals as well as having conscious models for how they help you achieve your goals, in contrast to unthinkingly adhering to a routine or doing what everyone else is doing because it is what everyone else is doing. Also has some aspect of being the kind of person who does things, who chooses action over inaction.
Probably there will be AGI soon—literally any year now.
Probably whoever controls AGI will be able to use it to get to ASI shortly thereafter—maybe in another year, give or take a year.
Probably whoever controls ASI will have access to a spread of powerful skills/abilities and will be able to build and wield technologies that seem like magic to us, just as modern tech would seem like magic to medievals.
This will probably give them godlike powers over whoever doesn’t control ASI.
In general there’s a lot we don’t understand about modern deep learning. Modern AIs are trained, not built/programmed. We can theorize that e.g. they are genuinely robustly helpful and honest instead of e.g. just biding their time, but we can’t check.
Currently no one knows how to control ASI. If one of our training runs turns out to work way better than we expect, we’d have a rogue ASI on our hands. Hopefully it would have internalized enough human ethics that things would be OK.
There are some reasons to be hopeful about that, but also some reasons to be pessimistic, and the literature on this topic is small and pre-paradigmatic.
Our current best plan, championed by the people winning the race to AGI, is to use each generation of AI systems to figure out how to align and control the next generation.
This plan might work but skepticism is warranted on many levels.
For one thing, there is an ongoing race to AGI, with multiple megacorporations participating, and only a small fraction of their compute and labor is going towards alignment & control research. One worries that they aren’t taking this seriously enough.
Still, ASI is just equation model F(X)=Y on steroids, where F is given by the world (physics), X is a search process (natural Monte-Carlo, or biological or artificial world parameter search), and Y is goal (or rewards).
To control ASI, you control the “Y” (right side) of equation. Currently, humanity has formalized its goals as expected behaviors codified in legal systems and organizational codes of ethics, conduct, behavior, etc. This is not ideal, because those codes are mostly buggy.
Ideally, the “Y” would be dynamically inferred and corrected, based on each individual’s self-reflections, evolving understanding about who they really are, because the deeper you look, the more you realize, how each of us is a mystery.
I like the term “Y-combinator”, as this reflects what we have to do—combine our definitions of “Y” into the goals that AIs are going to pursue. We need to invent new, better “Y-combination” systems that reward AI systems being trained.
What do you think about pausing between AGI and ASI to reap the benefits while limiting the risks and buying more time for safety research? Is this not viable due to economic pressures on whoever is closest to ASI to ignore internal governance, or were you just not conditioning on this case in your timelines and saying that an AGI actor could get to ASI quickly if they wanted?
Yes, pausing then (or a bit before then) would be the sane thing to do. Unfortunately there are multiple powerful groups racing, so even if one does the right thing, the others might not. (That said, I do not think this excuses/justifies racing forward. If the leading lab gets up to the brink of AGI and then pauses and pivots to a combo of safety research + raising awareness + reaping benefits + coordinating with government and society to prevent others from building dangerously powerful AI, then that means they are behaving responsibly in my book, possibly even admirably.)
I chose my words there carefully—I said “could” not “would.” That said by default I expect them to get to ASI quickly due to various internal biases and external pressures.
What work do you think is most valuable on the margin (for those who agree with you on many of these points)?
Depends on comparative advantage I guess.
Thanks for sharing this! A couple of (maybe naive) things I’m curious about.
Suppose I read ‘AGI’ as ‘Metaculus-AGI’, and we condition on AGI by 2025 — what sort of capabilities do you expect by 2027? I ask because I’m reminded of a very nice (though high-level) list of par-human capabilities for ‘GPT-N’ from an old comment:
discovering new action setsmanaging its own mental activitycumulative learninghuman-like language comprehensionperception and object recognitionefficient search over known facts
discovering new action sets
managing its own mental activity
human-like language comprehension
perception and object recognition
efficient search over known facts
My immediate impression says something like: “it seems plausible that we get Metaculus-AGI by 2025, without the AI being par-human at 2, 3, or 6.” This also makes me (instinctively, I’ve thought about this much less than you) more sympathetic to AGI → ASI timelines being >2 years, as the sort-of-hazy picture I have for ‘ASI’ involves (minimally) some unified system that bests humans on all of 1-6. But maybe you think that I’m overestimating the difficulty of reaching these capabilities given AGI, or maybe you have some stronger notion of ‘AGI’ in mind.
The second thing: roughly how independent are the first four statements you offer? I guess I’m wondering if the ‘AGI timelines’ predictions and the ‘AGI → ASI timelines’ predictions “stem from the same model”, as it were. Like, if you condition on ‘No AGI by 2030’, does this have much effect on your predictions about ASI? Or do you take them to be supported by ~independent lines of evidence?
Basically, I think an AI could pass a two-hour adversarial turing test without having the coherence of a human over much longer time-horizons (points 2 and 3). Probably less importantly, I also think that it could meet the Metaculus definition without being search as efficiently over known facts as humans (especially given that AIs will have a much larger set of ‘known facts’ than humans).
Reply to first thing: When I say AGI I mean something which is basically a drop-in substitute for a human remote worker circa 2023, and not just a mediocre one, a good one—e.g. an OpenAI research engineer. This is what matters, because this is the milestone most strongly predictive of massive acceleration in AI R&D. Arguably metaculus-AGI implies AGI by my definition (actually it’s Ajeya Cotra’s definition) because of the turing test clause. 2-hour + adversarial means anything a human can do remotely in 2 hours, the AI can do too, otherwise the judges would use that as the test. (Granted, this leaves wiggle room for an AI that is as good as a standard human at everything but not as good as OpenAI research engineers at AI research)Anyhow yeah if we get metaculus-AGI by 2025 then I expect ASI by 2027. ASI = superhuman at every task/skill that matters. So, imagine a mind that combines the best abilities of Von Neumann, Einstein, Tao, etc. for physics and math, but then also has the best abilities of [insert most charismatic leader] and [insert most cunning general] and [insert most brilliant coder] … and so on for everything. Then imagine that in addition to the above, this mind runs at 100x human speed. And it can be copied, and the copies are GREAT at working well together; they form a superorganism/corporation/bureaucracy that is more competent than SpaceX / [insert your favorite competent org].Re independence: Another good question! Let me think...--I think my credence in 2, conditional on no AGI by 2030, would go down somewhat but not enough that I wouldn’t still endorse it. A lot depends on the reason why we don’t get AGI by 2030. If it’s because AGI turns out to inherently require a ton more compute and training, then I’d be hopeful that ASI would take more than two years after AGI.--3 is independent.--4 maybe would go down slightly but only slightly.
2. Wait a second. How fast are humans building ICs for AI compute? Let’s suppose humans double the total AI compute available on the planet over 2 years (Moore’s law + effort has gone to wartime levels of investment since AI IC’s are money printers). An AGI means there is now a large economic incentive to ‘greedy’ maximize the gains from the AGI, why take a risk on further R&D?
But say all the new compute goes into AI R&D.
a. How much of a compute multiplier do you need for AGI->ASI training?
b. How much more compute does an ASI instance take up? You have noticed that there is diminishing throughput for high serial speed, are humans going to want to run an ASI instance that takes OOMs more compute for marginally more performance?
c. How much better is the new ASI? If you can ‘only’ spare 10x more compute than for the AGI, why do you believe it will be able to:
Probably whoever controls ASI will have access to a spread of powerful skills/abilities and will be able to build and wield technologies that seem like magic to us, just as modern tech would seem like magic to medievals.This will probably give them godlike powers over whoever doesn’t control ASI.
Looks like ~4x better pass rate for ~3k times as much compute?
And then if we predict forward for the ASI, we’re dividing the error rate by another factor of 4 in exchange for 3k times as much compute?
Is that going to be enough for magic? Might it also require large industrial facilities to construct prototypes and learn from experiments? Perhaps some colliders larger than CERN? Those take time to build...
For another data source:
Assuming the tokens processed is linearly proportional to compute required, Deepmind burned 2.3 times the compute and used algorithmic advances for Gemini 1 for barely more performance than GPT-4.
I think your other argument will be that algorithmic advances are possible that are enormous? Could you get to an empirical bounds on that, such as looking at the diminishing series of performance:(architectural improvement) and projecting forward?
6. Conditional on having an ASI strong enough that you can’t control it the easy way
8. conditional on needing to do this
9. conditional on having a choice, no point in being skeptical if you must build ASI or lose
I think could be an issue with your model, @Daniel Kokotajlo . It’s correct for the short term, but you have essentially the full singularity happening all at once over a few years. If it took 50 years for the steps you think will take 2-5 it would still be insanely quick by the prior history for human innovation...
Truthseeking note : I just want to know what will happen. We have some evidence now. You personally have access to more evidence as an insider, as you can get the direct data for OAI’s models, and you probably can ask the latest new joiner from deepmind for what they remember. With that evidence you could more tightly bound your model and see if the math checks out.
The thing that seems more likely to first get out of hand is activity of autonomous non-ASI agents, so that the shape of loss of control is given by how they organize into a society. Alignment of individuals doesn’t easily translate into alignment of societies. Development of ASI might then result in another change, if AGIs are as careless and uncoordinated as humanity.
Can you elaborate? I agree that there will be e.g. many copies of e.g. AutoGPT6 living on OpenAI’s servers in 2027 or whatever, and that they’ll be organized into some sort of “society” (I’d prefer the term “bureaucracy” because it correctly connotes centralized heirarchical structure). But I don’t think they’ll have escaped the labs and be running free on the internet.
If allowed to operate in the wild and globally interact with each other (as seems almost inevitable), agents won’t exist strictly within well-defined centralized bureaucracies, the thinking speed that enables impactful research also enables growing elaborate systems of social roles that drive the collective decision making, in a way distinct from individual decision making. Agent-operated firms might be an example where economy drives decisions, but nudges of all kinds can add up at scale, becoming trends that are impossible to steer.
But all of the agents will be housed in one or three big companies. Probably one. And they’ll basically all be copies of one to ten base models. And the prompts and RLHF the companies use will be pretty similar. And the smartest agents will at any given time be only deployed internally, at least until ASI.
“at least until ASI”—harden it and give it everyone before “someone” steals it
The premise is autonomous agents at near-human level with propensity and opportunity to establish global lines of communication with each other. Being served via API doesn’t in itself control what agents do, especially if users can ask the agents to do all sorts of things and so there are no predefined airtight guardrails on what they end up doing and why. Large context and possibly custom tuning also makes activities of instances very dissimilar, so being based on the same base model is not obviously crucial.
The agents only need to act autonomously the way humans do, don’t need to be the smartest agents available. The threat model is that autonomy at scale and with high speed snowballs into a large body of agent culture, including systems of social roles for agent instances to fill (which individually might be swapped out for alternative agent instances based on different models). This culture exists on the Internet, shaped by historical accidents of how the agents happen to build it up, not necessarily significantly steered by anyone (including individual agents). One of the things such culture might build up is software for training and running open source agents outside the labs. Which doesn’t need to be cheap or done without human assistance. (Imagine the investment boom once there are working AGI agents, not being cheap is unlikely to be an issue.)
Superintelligence plausibly breaks this dynamic by bringing much more strategicness than feasible at near-human level. But I’m not sure established labs can keep the edge and get (aligned) ASI first once the agent culture takes off. And someone will probably start serving autonomous near-human level agents via API long before any lab builds superintelligence in-house, even if there is significant delay between the development of first such agents and anyone deploying them publicly.
we’d have a rogue ASI on our hands
we’d have a rogue ASI on our hands
FWIW it doesn’t seem obvious to me that it wouldn’t be sufficiently corrigible by default.
I’d be at about 25% that if you end up with an ASI by accident, you’ll notice before it ends up going rogue. This aren’t great odds of course.
I guess I was including that under “hopefully it would have internalized enough human ethics that things would be OK” but yeah I guess that was unclear and maybe misleading.
Yeah, I guess corrigible might not require any human ethics. Might just be that the AI doesn’t care about seizing power (or care about anything really) or similar.
I’m working on this red-teaming exercise on gemma, and boy, do we have a long way to go. Still early, but have found the following:
1. If you prompt with ‘logical’ and then give it a conspiracy theory, it pushes for the theory while if you prompt it with ‘entertaining’ it goes against.
2. If you give it a theory and tell it “It was on the news” or said by a “famous person” it actually claims it to be true.
Still working on it. Will publish a full report soon!
What is gemma?
https://huggingface.co/google/gemma-7bThe 7b and 2B opensource version of Google’s Gemini.
I have finally gained a better understanding of why my almost-zero temperature settings cannot actually be set to zero. This also explains why playground environments that claim to allow setting the temperature to zero most likely do not achieve true zero—the graphical user interface merely displays it as zero.softmax(xi)=exp(xi/T)/sum(exp(xj/T))In the standard softmax function mentioned above, it is not possible to input a value of zero, as doing so will result in an error.As explained also in this post: https://www.baeldung.com/cs/softmax-temperature
The temperature parameter can take on any numerical value. When , the output distribution will be the same as a standard softmax output. The higher the value of , the “softer” the output distribution will become. For example, if we wish to increase the randomness of the output distribution, we can increase the value of the parameter .
The temperature parameter
So, the standard softmax function w/o temperature shown as:softmax(xi)=exp(xi)/sum(exp(xj))
Is the same as a softmax function with the temperature of 1:softmax(xi)=exp(xi/1)/sum(exp(xj/1))For the experiments I am conducting, it is impossible to input zero as a value (again, this is very different from what playground environments show). To achieve a deterministic output, an almost zero temperature is the ideal setting like a temperature of 0.000000000000001.
That’s a little bit silly though. 0 temperature is just argmax. They should allow setting 0 temperature and getting argmax, rather than setting a minimum temperature to not break their /T.
I need a metacritic that adjusts for signaling on behalf of movie reviewers. So like if a movie is about race, it subtracts ten points, if it’s a comedy it adds 5, etc.
A strategy that may serve some of that purpose is to look at the delta between Rotten Tomatoes’ critic score (“Tomatometer”, looks like it means journalists) and audience score. Depending on your objective, maybe looking at the audience score by itself is ideal.
Why are there mandatory licenses for many businesses that don’t seem to have high qualification requirements?
Patrick McKenzie (@patio11) suggests on Twitter that one aspect is that it prevents crime:
Part of the reason for licensing regimes, btw, isn’t that the licensing teaches you anything or that it makes you more effective or that it makes you more ethical or that it successfully identifies protocriminals before they get the magic piece of paper.It’s that you have to put a $X00k piece of paper at risk as the price of admission to the chance of doing the crime.This deters entry and raises the costs of criminal enterprises hiring licensed professionals versus capable, ambitious, intelligent non-licensed criminals.
Part of the reason for licensing regimes, btw, isn’t that the licensing teaches you anything or that it makes you more effective or that it makes you more ethical or that it successfully identifies protocriminals before they get the magic piece of paper.
It’s that you have to put a $X00k piece of paper at risk as the price of admission to the chance of doing the crime.
This deters entry and raises the costs of criminal enterprises hiring licensed professionals versus capable, ambitious, intelligent non-licensed criminals.
“have one acceptable path and immediately reject anyone who goes off it” cuts you off from a lot of good things, but also a lot of bad things. If you want to remove that constraint to get at the good weirdness, you need to either tank a lot of harm, or come up with more detailed heuristics to prevent it
Curiosity killed the cat by exposing it to various “black swan” risks.
Surprising thing I’ve found as I begin to study and integrate skillful coercive motivation is the centrality of belief in providence and faith of this way of motivating yourself. Here are some central examples: the first from War of Art, the second from The Tools, the third from David Goggins. these aren’t cherry picked (this is a whole section of War of Art and a whole chapter of The Tools).
This has interesting implications given that as a society (at least in america) we’ve historically been motivated by this type of masculine, apollonian motivation—but have increasingly let go of faith in higher powers as a tenet of our central religion, secular humanism.This means the core motivation that drives us to build, create, transcend our nature… is running on fumes. We are motivated by gratitude, w/o sense of to what or whom we should be grateful, told to follow our calling w/o a since of who is calling.
We’ve tried to hide this contradiction. our seminaries separate our twin Religion (Secular Humanism and Scientific Materialism) into stem and humanities tracks to hide that what motivates The Humanities to create is invalidated by the philosophy that allows STEM o discover. But this is crumbling, the cold philosophy of scientific materialism is eroding the shaky foundations that allow secular humanists to connect to these higher forces—this is one of the drivers of the meaning the crisis.
I don’t really see any way we can make it through the challenges we’re facing with these powerful new technologies w/o a new religion that connects us to the mystical truly wise core that allows us to be motivated towards what’s good and true. This is exactly what Marc Gafni is trying to do with Cosmo-Erotic Humanism, and what Monastic Academy is trying to do with a new, mystical form of dataism—but both these projects are moonshots to massively change the direction of culture.
We need a research on whether atheists are more likely to suffer from akrasia.
If we take Julian Jaynes seriously, the human brain has a rational hemisphere and a motivating hemisphere. Religion connects these hemispheres, allowing them to work in synergy. Skepticism seems to split them.
Effective atheists are probably the ones who despite being atheists still believe in some kind of “higher power”, such as fate or destiny or the spirit of history or some bullshit like that. Probably still activates the motivating hemisphere to some degree, only now instead of hearing a clear voice, only some nonverbal guidance is provided. Deep atheism probably silences the motivating hemisphere completely.
The question is, how to harness the power of the religious hemisphere without being religious (or believing some nominally non-religious bullshit). How to be fully rational and fully motivated at the same time.
Can we say something like “I know this is pure bullshit, but God please give me the power to accomplish my goals and smite my enemies!” and actually mean it? Is this what will unleash the true era of rationalist world optimization?
Request for feedback: Do I sound like a raving lunatic above?
I do think it has some of that feeling to me, yeah. I had to re-read the entire thing 3 or 4 times to understand what it meant. My best guesses as to why:
I felt whiplashed on transitions like “be motivated towards what’s good and true. This is exactly what Marc Gafni is trying to do with Cosmo-Erotic Humanism”, since I don’t know him or that type of Humanism, but the sentence structure suggests to me that I am expected to know these.
A possible rewrite could perhaps be “There are two projects I know of that aim to create a belief system that works with, instead of against, technology. The first is Marc Gafni; he calls his ‘Cosmo-Erotic Humanism’…”
There are some places I feel a colon would be better than a comma. Though I’m not sure how important these are, it would help slow down the pace of the writing:
“increasingly let go of faith in higher powers as a tenet of our central religion: secular humanism.”
“But this is crumbling: the cold philosophy”
While minor punctuation differences like this are usually not too important, the way you wrote gives me a sense of, like, too much happening too fast: “wow, this is a ton of information delivered extremely quickly, and I don’t know what appolonian means, I don’t know who Gafni is, or what dataism is…”
So maybe slowing down the pace with stronger punctuation like colons is more important than it would otherwise be?
Also, phrases like “our central religion is secular humanism” and “mystical true wise core” read as very Woo. I can see where both are coming from, but I’ve read a lot of Woo, but I think many readers would bounce off these phrases. They can still be communicated, but perhaps something like “in place of religion, many have turned to Secular Humanism. Secular humanism says that X, Y, Z, but has no concept of a higher power. That means the core motivation that…”
(To be honest I’ve forgotten what secular humanism is, so this was another phrase that added to my feeling of everything moving too fast, and me being lost).
There are some typos too.
So maybe I’d advise making the overall piece of writing slower, by giving more set-up each time you introduce a term readers are likely to be unfamiliar with. On the other hand, that’s a hassle, and probably annoying to do in every note, if you write on this topic often. But it’s the best I’ve got!
Thanks. Appreciate this. I’m going to give another shot at writing this
Several types of existential risks can be called “qualia catastrophes”:
- Qualia disappear for everyone = all become p-zombies
- Pain qualia are ubiquitous = s-risks
- Addictive qualia domminate = hedonium, global wireheading
- Qualia thin out = fading qualia, mind automatisation
- Qualia are unstable = dancing qualia, identity is unstable.
- Qualia shift = emergence of non-human qualia (humans disappear).
- Qualia simplification = disappearance of subtle or valuable qualia (valuable things disappear).
- Transcendental and objectless qualia with hypnotic power enslave humans (God as qualia; Zair). -
- Attention depletion (ADHD)
Over the past few years, I have come to the personal conclusion that we humans, fundamentally, are not individuals. The egoic self is an evolved structure of mind that allows an individual human organism to pursue what we think is our own will, our own thoughts, our own self-interest, but deep down, we are connected in mind and meaningful language, concept, relationship, and organization in ways that feel profound, spiritual, oceanic, and religious. Whatever strength or power we attain, we generally share with our in-group as each of us knows in our bones that, individually, we are weak, need the care of others, and will eventually die, or could die at any moment. This human phenomenology is what gives rise to god-forms, egregore, spirits, call them what you like, a few examples of which are “Japan,” “Apple,” “Sunnyvale Homes,” or what not.
This deserves more treatment, and I struggle to write the splendidly long and lucid essays common on LW.
Pivot to my main thought:
It seems that life on this planet evolves into more and more complex and intelligent forms. I have no explanation for this, other than the conditions for life being what they are on Earth, that continue to sustain more and more complex forms of life.
The development of AGI or superintelligence seems imminent. There are many that seem distraught at the limitations of humans and how completely pwned we are by Moloch, who would gladly exchange the existential risk of Moloch for one of an AGI “Ahriman.” It is easy to imagine us handing over our problems to AI, while humans become like “pets” to be domesticated by it—not a pleasant thought, but some of us feel like we’re up against the limit of how we can be organized, managed, and governed by “pleasant” thoughts.
How many of you feel like you would trust an advanced AGI with the future of humanity more than you would trust humanity with the future of humanity?
What is the best science fiction you have read along these lines?
Someone mentioned maybe I should write this publicly somewhere, so that it is better known. I’ve mentioned it before but here it is again:
I deeply regret cofounding vast and generally feel it has almost entirely done harm, not least by empowering the other cofounder, who I believe to be barely better than e/acc folk due to his lack of interest in attempting to achieve an ought that differs from is. I had a very different perspective on safety then and did not update in time to not do very bad thing. I expect that if you and someone else are both going to build something like vast, and theirs takes three weeks longer to get to the same place, it’s better to save the world those three weeks without the improved software. Spend your effort on things like lining up the problems with QACI and cannibalizing its parts to build a v2, possibly using ideas from boundaries/membranes, or generally other things relevant to understanding the desires, impulses, goals, wants, needs, objectives, constraints, developmental learning, limit behavior, robustness, guarantees, etc etc of mostly-pure-RL curious-robotics agents.
incidentally, I’ve had many conversations with GPT4 where I try to get it to tell me what difference it thinks justifies its (obviously reward-induced and therefore at-least-somewhat-motivated-reasoning) claim that it’s not like humans, and the only justification it consistently gives is continuous-time lived experience vs discrete-time secondhand textual training data. I feel like video models and especially egocentric robotics video models don’t have that difference...
I previously told an org incubator one simple idea against failure cases like this. Do you think you should have tried the like?
Funnily enough I spotted this at the top of lesslong on the way to write the following, so let’s do it here:
What less simple ideas are there? Can an option to buy an org be conditional on arbitrary hard facts such as an arbitrator finding it in breach of a promise?
My idea can be Goodharted through its reliance on what the org seems to be worth, though “This only spawns secret AI labs.” isn’t all bad. Add a cheaper option to audit the company?
It can also be Goodharted through its reliance on what the org seems to be worth. OpenAI shows that devs can just walk out.
Well done for writing this up! Admissions like this are hard often hard to write.
Have you considered trying to use any credibility from helping to cofound vast for public outreach purposes?
Admissions like this are hard often hard to write.
Admissions like this are hard often hard to write.
So I hear. It wasn’t particularly.
credibility from helping to cofound vast
credibility from helping to cofound vast
Ah yes, I, the long-since-exited cofounder of the, uh, mildly popular sort-of-indie gig-economy-of-things-style-rentseeking-of-web-hosting-service used by ai people, should use my overflowing Credibility stat to convince impactful people that...
...they should work on adding something to the list “qaci, boundaries, and similar proposals”?
hmm. idk, maybe. sounds more useful to say it without trying to make myself out to be anyone in particular. The people I’d want to convince are probably not the ones who’d be impressed by credentials of any kind.
Vast AI offers hourly rental of compute hardware? How do you believe this contributes to negative future outcomes?
I ask because assuming scaling hypothesis is mostly true, training potentially dangerous models require more compute than is available for rent. The big labs are using dedicated hardware clusters.
Another factor to examine is whether or not the number was “3 weeks” or “0 weeks”. Assuming Vast consumed ICs from the current limited supply, had Vast been slower to begin operations, the supply would still be limited.
Technically ok it signals Nvidia to order more 3 weeks early, by making the order backlog deeper, but the delta between “contributed” and “didn’t” is very small.
Finally you have to look at threat models. Actually participating in bad outcomes would be something like “let’s rent out compute hardware, not check who our customers are, let them run anything they want, and pay with anonymous credit cards. Hosted offshore.”
Today you would just be supporting illegal activity (for probably a price premium you could demand), but this is what could host the rogues of the future.
you and I have very different models of this. I’m not terribly interested in getting into the details. Some of your points overlap mine, some don’t. that’s all I feel is worth the time.
I vaguely remember talking to you about this at the time but don’t remember what your motivations and thoughts were for cofounding vast at the time.
I think I’m most interested in this from the perspective of “what decisionmaking processes were you following then, how did they change, and what was the nearest nearby trail of thoughts that might have led you to make a different decision at the time?”
At the time my main worry was honestly probably just wanting money. Also a general distrust of deepmind, along with a feeling that alignment would be easy—compare the alignment optimism perspective, which I think discusses the same mechanisms and I would have agreed without qualification then. I still think some parts of that model, but now believe that the alignment problem’s main manifestations are moloch, authoritarianism, and rentseeking, and the failure story I expect no longer looks like “deepmind is in charge” and looks rather more like a disneyland without children. So the alignment approaches that seem promising to me are the ones that can counter people who are attempting to get alignment with the ownership system, because I expect humans to be suddenly locked out of the ownership system, including humans who are currently very rich within it.
I spoke to the cofounder a lot about mechanism design of social systems, and we had very interesting ideas for how to do it. If the world were going to stay human I’d be optimistic about designing novel currencies that are optimized to be unusually hard to moloch, and that optimism arose from many long conversations with him. But recent conversations with him seem to imply his views are corrupted by the drive for money; his views on mechanism design don’t seem to me to solve the misalignment of markets with their poor participants. He does have interesting ideas and I might have interest in having a lesswrong dialogue with him at some point.
Makes sense, thanks for sharing!
[edit: pinned to profile]
I feel like most AI safety work today doesn’t engage sufficiently with the idea that social media recommenders are the central example of a misaligned AI: a reinforcement learner with a bad objective with some form of ~online learning (most recommenders do some sort of nightly batch weight update). we can align language models all we want, but if companies don’t care and proceed to deploy language models or anything else for the purpose of maximizing engagement and with an online learning system to match, none of this will matter. we need to be able to say to the world, “here is a type of machine we all can make that will reliably defend everyone against anyone who attempts to maximize something terrible”. anything less than a switchover to a cooperative dynamic as a result of reliable omnidirectional mutual defense seems like a near guaranteed failure due to the global interaction/conflict/trade network system’s incentives. you can’t just say oh, hooray, we solved some technical problem about doing what the boss wants. the boss wants to manipulate customers, and will themselves be a target of the system they’re asking to build, just like sundar pichai has to use self-discipline to avoid being addicted by the youtube recommender same as anyone else.
David Chapman actually uses social media recommendation algorithms as a central example of AI that is already dangerous: https://betterwithout.ai/apocalypse-now
It sounds like you’re describing Maloch here. I agree entirely, but I’d go much further than you and claim “Humans aren’t aligned with eachother or even themselves” (self-dicipline is a kind of tool against internal misalignment, no?). I also think that basically all suffering and issues in the world can be said to stem from a lack of balance, which is simply just optimization gone wrong (since said optimization is always for something insatiable, unlike things like hunger, in which the desire goes away once the need is met).
Companies don’t optimize for providing value, but for their income. If they earn a trillion, they will just invest a trillion into their own growth, so that they can earn the next trillion. And all the optimal strategies exploit human weaknesses, clickbait being an easy example. In fact, it’s technology which has made this exploitation possible. So companies end up becoming tool-assisted cancers. But it’s not just companies which are the problem here, it’s everything which lives by darwinian/memetic principles. The only exception is “humanity”, which is when optimality is exchanged for positive valence. This requires direct human manipulation. Even an interface (online comments and such) are slightly dehumanized compared to direct communication. So any amount of indirectness will reduce this humanity.
Yeah. A way I like to put this is that we need to durably solve the inter being alignment problem for the first time ever. There are flaky attempts at it around to learn from, but none of them are leak proof and we’re expecting to go to metaphorical sea (the abundance of opportunity for systems to exploit vulnerability in each other) in this metaphorical boat of a civilization, as opposed to previously just boating in lakes. Or something. But yeah, core point I’m making is that the minimum bar to get out of the ai mess requires a fundamental change in incentives.
you can’t just say oh, hooray, we solved some technical problem about doing what the boss wants. the boss wants to manipulate customers, and will themselves be a target of the system they’re asking to build, just like sundar pichai has to use self-discipline to avoid being addicted by the youtube recommender same as anyone else.
you can’t just say oh, hooray, we solved some technical problem about doing what the boss wants. the boss wants to manipulate customers, and will themselves be a target of the system they’re asking to build, just like sundar pichai has to use self-discipline to avoid being addicted by the youtube recommender same as anyone else.
Agreed. I wrote about this concern (or a very similar one) here. In general I think the AI safety community seems to be too focused on intent alignment and deception to the exclusion of other risks, and have complained about this a few times before. (Let me know if you think the example you raise is adequately covered by the existing items on that list, or should have its own bullet point, and if so how would you phrase it?)
Here are some thoughts about numeracy as compared to literacy. There is a tl;dr at the end.
The US supposedly has 95% literacy rate or higher. An 14yo english-speaker in the US is almost always an english-reader as well, and will not need much help interpreting an “out of service” sign or a table of business hours or a “Vote for Me” billboard. In fact, most people will instantaneously understand the message, without conscious effort—no need to look at individual letters and punctuation, nor any need to slowly sound it out. You just look, scan, and interpret a sentence in one automatic action. (If anyone knows a good comparison of the bitrates of written sentences vs pictograms, please share.)
I think there is an analogy here with numeracy, and I think there is some depth to the analogy. I think there is a possible world in which a randomly selected 14yo would instantly, automatically have a sense of magnitude when seeing or hearing about about almost anything in the physical world—no need to look up benchmark quantities or slowly compute products and quotients. Most importantly, there would be many more false and misleading claims that would (instantly, involuntarily!) trigger a confused squint from them. You could still mislead them about the cost per wattage of the cool new sustainability technology, or the crime rate in some distant city. But not too much more than you could mislead them about tangible things like the weight of their pets or the cost per calorie of their lunch or the specs of their devices. You could only squeeze so many OoMs of credibility out of them before they squint in confusion and ask you to give some supporting details.
Automatic, generalized, quantitative sensitivity of this sort is rare even among college graduates. It’s a little better among STEM graduates, but still not good. I think adulthood is too late to gain this automaticity, the same way it is too late to gain the automatic, unconscious literacy that elementary school kids get.We grow up hearing stories about medieval castle life that are highly sanitized, idealized, and frankly, modernized, so that we will enjoy hearing them at all. And we like to imagine ourselves in the shoes of knights and royalty, usually not the shoes of serfs. That’s all well and good as far as light-hearted fiction goes, but I think it leads us to systematically underestimate not only the violence and squalor of those conditions, but less-obviously the low-mobility and general constraint of illiteracy. I wonder what it would be like to visit a place with very low literacy (and perhaps where the few existing signs are written in an unfamiliar alphabet). I bet it would be really disorienting. Everything you learn would be propaganda and motivated hearsay, and you would have to automatically assume much worse faith than in places where information and flows quickly and cheaply. Potato prices are much lower two days south? Well, who claimed that to me, how did they hear it, and what incentives might they have to say it to me? Unfortunately there are no advertisements or PSAs for me to check against. Well, I’m probably not going to make that trip south without some firmer authority.I can imagine this information environment having a lot in common with the schoolyard.
My point is that it seems easy to erroneously take for granted the dynamics of a 95% literate society, and that things suddenly seem very different even after only a minute of deliberate imagination. It is that size of difference that I think might be possible between our world and an imaginary place where 8-year-olds are trained to become fluent in simple quantities as they are in written english.
Tl;dr: I think widespread literacy and especially widespread fluency is a modern miracle. I think people don’t realize what a total lack of numerical fluency there is. I’m not generally fluent in numbers—in general, you can suggest absurd quantities to me and I will not automatically notice the absurdity in the way I will automatically laugh at a sentence construction error on a billboard.
I see smart ppl often try to abstract, generalize, mathify, in arguments that are actually emotion/vibes issues. I do this too.The neurotypical finds this MEAN. but the real problem is that the math is wrong
From what I’ve seen, the math is fine, but the axioms chosen and mappings from the vibes to the math are wrong. The results ARE mean, because they try to justify that someone’s core intuitions are wrong.
Amusingly, many people’s core intuitions ARE wrong (especially about economics and things that CAN be analyzed with statistics and math), and they find it very uncomfortable when it’s pointed out.
For it to make sense to say that the math is wrong, there needs to be some sort of ground truth, making it possible for math to also be right, in principle. Even doing the math poorly is exercise that contributes to eventually making the math less wrong.
There’s an open letter at https://openletter.net/l/disrupting-deepfakes. I signed, but with caveats, which I’m putting here.
Background context is that I participated in building the software platform behind the letter, without a specific open letter in hand. It has mechanisms for sorting noteworthy signatures to the top, and validating signatures for authenticity. I expect there to be other open letters in the future, and I think this is an important piece of civilizational infrastructure.
I think the world having access to deepfakes, and deepfake-porn technology in particular, is net bad. However, the stakes are small compared to the upcoming stakes with superintelligence, which has a high probability of killing literally everyone.
If translated into legislation, I think what this does is put turnkey-hosted deepfake porn generation, as well as pre-tuned-for-porn model weights, into a place very similar to where piracy is today. Which is to say: The Pirate Bay is illegal, wget is not, and the legal distinction is the advertised purpose.
(Where non-porn deepfakes are concerned, I expect them to try a bit harder at watermarking, still fail, and successfully defend themselves legally on the basis that they tried.)
The analogy to piracy goes a little further. If laws are passed, deepfakes will be a little less prevalent than they would otherwise be, there won’t be above-board businesses around it… and there will still be lots of it. I don’t think there-being-lots-of-it can be prevented by any feasible means. The benefit of this will be the creation of common knowledge that the US federal government’s current toolkit is not capable of holding back AI development and access, even when it wants to.
I would much rather they learn that now, when there’s still a nonzero chance of building regulatory tools that would function, rather than later.
I’m reading you to be saying that you think on its overt purpose this policy is bad, but ineffective, and the covert reason of testing the ability of the US federal government to regulate AI is worth the information cost of a bad policy.
I definitely appreciate that someone signing this writes this reasoning publicly. I think it’s not crazy to think that it will be good to happen. I feel like it’s a bit disingenuous to sign the letter for this reason, but I’m not certain.
I think preventing the existence of deceptive deepfakes would be quite good (if it would work); audio/video recording has done wonders for accountability in all sorts of contexts, and it’s going to be terrible to suddenly have every recording subjected to reasonable doubt. I think preventing the existence of AI-generated fictional-character-only child pornography is neutral-ish (I’m uncertain of the sign of its effect on rates of actual child abuse).
If progress in AI is continuous, we should expect record levels of employment. Not the opposite.
My mentality is if progress in AI doesn’t have a sudden, foom-level jump, and if we all don’t die, most of the fears of human unemployment are unfounded… at least for a while. Say we get AIs that can replace 90% of the workforce. The productivity surge from this should dramatically boost the economy, creating more companies, more trading, and more jobs. Since AIs can be copied, they would be cheap, abundant labor. This means anything a human can do that an AI still can’t becomes a scarce, highly valued resource. Companies with thousands or millions of AI instances working for them would likely compete for human labor, because making more humans takes much longer than making more AIs. Then say, after a few years, AIs are able to automate 90% of the remaining 10%. Then that creates even more productivity, more economic growth, and even more jobs. This could continue for even a few decades. Eventually, humans will be rendered completely obsolete, but by that point (most) of them might be so filthy rich that they won’t especially care.
This doesn’t mean it’ll all be smooth-sailing or that humans will be totally happy with this shift. Some people probably won’t enjoy having to switch to a new career, only for that new career to be automated away after a few years, and then have to switch again. This will probably be especially true for people who are older, those who have families, want a stable and certain future, etc. None of this will be made easier by the fact it’ll probably be hard to tell when true human obsolescence is on the horizon, so some might be in a state of perpetual anxiety, and others will be in constant denial.
The inverse argument I have seen on reddit happens if you try to examine how these ai models might work and learn.
One method is to use a large benchmark of tasks, where model capabilities is measured as the weighted harmonic mean of all tasks.
As the models run, much of the information gained doing real world tasks is added as training and test tasks to the benchmark suite. (You do this whenever a chat task has an output that can be objectively checked, and for robotic tasks you run in lockstep a neural sim similar to Sora that makes testable predictions for future real world input sets)
What this means is most models learn from millions of parallel instances of themselves and other models.
This means the more models are deployed in the world—the more labor is automated—the more this learning mechanism gets debugged, and the faster models learn, and so on.
There are also all kinds of parallel task gains. For example once models have experience working on maintaining the equipment in a coke can factory, and an auto plant, and a 3d printer plant, this variety of tasks with common elements should cause new models trained in sim to gain “general maintenance” skills at least for machines that are similar to the 3 given. (The “skill” is developing a common policy network that compresses the 3 similar policies down to 1 policy on the new version of the network)
With each following task, the delta—the skills the AI system needs to learn it doesn’t already know—shrinks. This shrinking learning requirement likely increases faster than the task difficulty increases. (Since the most difficult tasks is still doable by a human, and also the AI system is able to cheat a bunch of ways. For example using better actuators to make skilled manual trades easy, or software helpers to best champion Olympiad contestants)
You have to then look at what barriers there are to AI doing a given task to decide what tasks are protected for a while.
Things that just require a human body to do:
Medical test subject.
Food taster, perfume evaluator, fashion or aesthetics evaluator.
Various kinds of personal service worker.
AI Supervisor roles:
Arguably checking that the models haven’t betrayed us yet, and sanity checking plans and outputs seem like this would be a massive source of employment.
AI developer roles :
the risks mean that some humans need to have a deep understanding of how the current gen of AI works, and the tools and time to examine what happened during a failure. Someone like this needs to be skeptical of an explanation by another ai system for the obvious reasons.
Government/old institution roles:
Institutions that don’t value making a profit may continue using human staff for decades after AI can do their jobs, even when it can be shown ai makes less errors and more legally sound decisions.
TLDR: Arguably for the portion of jobs that can be automated, the growth rate should be exponential, from the easiest and most common jobs to the most difficult and unique ones.
There is a portion of tasks that humans are required to do for a while, and a portion where it might be a good idea not to ever automate it.
I wonder how much of the tremendously rapid progress of computer science in the last decade owes itself to structurally more rapid truth-finding, enabled by:
the virtual nature of the majority of the experiments, making them easily replicable
the proliferation of services like github, making it very easy to replicate others’ experiments
(a combination of the points above) the expectation that one would make one’s experiments easily available for replication by others
There are other reasons to expect rapid progress in CS (compared to, say, electrical engineering) but I wonder how much is explained by this replication dynamic.
Very little, because most CS experiments are not in fact replicable (and that’s usually only one of several serious methodological problems).
CS does seem somewhat ahead of other fields I’ve worked in, but I’d attribute that to the mostly-separate open source community rather than academia per se.
To be sure, let’s say we’re talking about something like “the entirety of published material” rather than the subset of it that comes from academia. This is meant to very much include the open source community.
Very curious, in what way are most CS experiments not replicable? From what I’ve seen in deep learning, for instance, it’s standard practice to include a working github repo along with the paper (I’m sure you know lots more about this than I do). This is not the case in economics, for instance, just to pick a field I’m familiar with.
woah I didn’t even know lw team was working on a “pre 2024 review” feature using prediction markets probabilities integrated into the UI. super cool!