When tracking an argument in a comment section, I like to skip to the end to see if either of the arguers winds up agreeing with the other. Which tells you something about how productive the argument is. But when using the “hide names” feature on LW, I can’t do that, as there’s nothing distinguishing a cluster of comments as all coming from the same author.
I’d like a solution to this problem. One idea that comes to mind is to hash all the usernames in a particular post and a particular session, so you can check if the author is debating someone in the comments without knowing the author’s LW username. This is almost as good as full anonymity, as my status measures take a while to develop, and I’ll still get the benefits of being able to track how beliefs develop in the comments.
@habryka
Algon
I’m not sure what you mean by operational vs axiomatic definitions.
But Shannon was unaware of the usage of in statistical mechanics. Instead, he was inspired by Nyquist and Hartley’s work, which introduced ad-hoc definitions of information in the case of constant probability distributions.
And in his seminal paper, “A mathematical theory of communication”, he argued in the introduction for the logarithm as a measure of information because of practicality, intuition and mathematical convenience. Moreover, he explicitly derived the entropy of a distribution from three axioms:
1) that it be continuous wrt. the probabilities,
2) that it increase monotonically for larger systems w/ constant probability distributions,
3) and that it be a weighted sum the entropy of sub-systems.
See section 6 for more details.
I hope that answers your question.
Whether I would get an article written, or a part of a website setup, by Friday. I was sure I wouldn’t, and I didn’t. But the predictions I made weren’t cruxy.
If this feels at least somewhat compelling, what if you just got yourself to Fatebook right now, and make a couple predictions that’ll resolve within couple days, or a week? Fatebook will send you emails reminding you about it, which can help bootstrap a habit.
Done.
I find it ironic that Simplicia’s position in this comment is not too far from my own, and yet my reaction to it was “AIIIIIIIIIIEEEEEEEEEE!”. The shrieking is about everyone who thinks about alignment having illegible models from the perspective of almost everyone else, of which this thread is an example.
EEA
What the heck does “EEA” mean?
I was thinking of units tests generated from some spec for helping with that part. If someone could build such a spec/tool and share it, said spec/tool could be extensively analysed and iterated upon.
I’d like to try another analogy, which makes some potential problems for verifying output in alignment more legible.
Imagine you’re a customer and ask a programmer to make you an app. You don’t really know what you want, so you give some vague design criteria. You ask the programmer how the app works, and they tell you, and after a lot of back and forth discussion, you verify this isn’t what you want. Do you know how to ask for what you want, now? Maybe, maybe not.
Perhaps the design space you’re thinking of is small, perhaps you were confused in some simple way that the discussion resolved, perhaps the programmer worked with you earnestly to develop the design you’re really looking for, and pointed out all sorts of unknown unknowns. Perhaps.
I think we could wind up in this position. The position of a non-expert verifying an experts’ output, with some confused and vague ideas about what we want from the experts. We won’t know the good questions to ask the expert, and will have to rely on the expert to help us. If ELK is easy, then that’s not a big issue. If it isn’t, then that seems like a big issue.
generation can be easier than validation because when generating you can stay within a subset of the domain that you understand well, whereas when verifying you may have to deal with all sorts of crazy inputs.
Attempted rephrasing: you control how you generate things, but not how others do, so verifying their generations can expose you to stuff you don’t know how to handle.
Example:
“Writing code yourself is often easier than validating someone else’s code”
I messed up. I meant to comment on another comment of yours, the one replying to niplav’s post about fat tails disincentivizing compromise. That was the one I really wished I could bookmark.
This comment is making me wish I could bookmark comments on LW. @habryka,
I’m working on this right now, actually. Will hopefully post in a couple of weeks.
This sounds cool.
That seems reasonable. But I do think there’s a group of people who have internalized bayesian rationalism enough that the main blocker is their general epistemology, rather than the way they reason about AI in particular.
I think your OP didn’t give enough details as to why internalizing Bayesian rationalism leads to doominess by default. Like, Nora Belrose is firmly Bayesian and is decidedly an optimist. Admittedly, I think she doesn’t think a Kolmogorov prior is a good one, but I don’t think that makes you much more doomy either. I think Jacob Cannel and others are also Bayesian and non-doomy. Perhaps I’m using “Bayesian rationalism” differently than you are, which is why I think your claim, as I read it, is invalid.
I think the point of 6 is not to say “here’s where you should end up”, but more to say “here’s the reason why this straightforward symmetry argument doesn’t hold”.
Fair enough. However, how big is the asymmetry? I’m a bit sceptical there is a large one. Based off my interactions, it seems like ~ everyone who has seriously thought about this topic for a couple of hours has radically different models, w/ radically different levels of doominess. This holds even amongst people who share many lenses (e.g. Tyler Cowen vs Robin Hanson, Paul Christiano vs. Scott Aaronson, Steve Hsu vs Michael Nielsen etc.).
There’s still something importantly true about EU maximization and bayesianism. I think the changes we need will be subtle but have far-reaching ramifications. Analogously, relativity was a subtle change to newtonian mechanics that had far-reaching implications for how to think about reality.
I think we’re in agreement over this. (I think Bayesianism less wrong than EU maximization, and probably a very good approximation in lots of places, like Newtonian physics is for GR.) But my contention is over Bayesian epistemology tripping many rats up when thinking about AI x-risk. You need some story which explains why sticking to Bayesian epistemology is tripping up very many people here in particular.
Any epistemology will rule out some updates, but a problem with bayesianism is that it says there’s one correct update to make. Whereas radical probabilism, for example, still sets some constraints, just far fewer.
Right, but in radical probabilism the type of beliefs is still a real valued function, no? Which is in tension w/ many disparate models that don’t get compressed down to a single number. In that sense, the refined formalism is still rigid in a way that your description is flexible. And I suspect the same is true for Infra-Bayesianism, though I understand that even less well than radical probabilism.
I think this post doesn’t really explain why rats have high belief in doom, or why they’re wrong to do so. Perhaps ironically, there is a better a version of this post on both counts which isn’t so focused on how rats get epistemology wrong and the social/meta-level consequences. A post which focuses on the object-level implications for AI of a theory of rationality which looks very different from the AIXI-flavoured rat-orthodox view.
I say this because those sorts of considerations convinced me that we’re much less likely to be buggered. I.e. I no longer believe EU maximization is/will be a good description by default of TAI or widely economically productive AGI, mildly superhuman AGI or even ASI, depending on the details. Which is partly due to a recognition that the arguments for EU maximization are weaker than I thought, arguments for LDT being convergent are lacking, the notions of optimality we do have are very weak, the existence and behaviour of GPT-4, Claude Opus etc.
6 seems too general a claim to me. Why wouldn’t it work for 1% vs 10%, and likewise 0.1% vs 1% i.e. why doesn’t this suggest that you should round down P(doom) to zero. Also, I don’t even know what you mean by “most” here. Like, are we quantifying over methods of reasoning used by current AI researchers right now? Over all time? Over all AI researchers and engineers? Over everyone in the West? Over everyone who’s ever lived? Etc.
And it seems to me like you’re implicitly privileging ways of combining these opinions that get you 10% instead of 1% or 90%, which is begging the question. Of course, you could reply that a P(doom) of 10% is confused, that isn’t really your state of knowledge, lumping in all your sub-agents models into a single number is too lossy etc. But then why mention that 90% is a much stronger prediction than 10% instead of saying they’re roughly equally confused?
7 I kinda disagree with. Those models of idealized reasoning you mention generalize Bayesianism/Expected Utility Maximization. But they are not far from the Bayesian framework or EU frameworks. Like Bayesianism, they do say there are correct and incorrect ways of combining beliefs, that beliefs should be isomorphic to certain structures, unless I’m horribly mistaken. Which sure is not what you’re claiming to be the case in your above points.
Also, a lot of rationalists already recognize that these models are addressing flaws in Bayesianism like logical omniscience, embeddedness etc. Like, I believed this at least around 2017, and probably earlier. Also, note that these models of epistemology are not in tension with a strong belief that we’re buggered. Last I checked, the people who invented these models believe we’re buggered. I think they may imply that we’re a little less than the EU maximization theory though, but I don’t think this is a big difference. IMO this is not a big enough departure to do the work that your post requires.
The AI Optimists (i.e. the people in the associated Discord server) have a lot of internal disagreement[1], to the point that I don’t think it’s meaningful to talk about the delta between John and them. That said, I would be interested in specific deltas e.g. with @TurnTrout, in part because he thought we’d get death by default and now doesn’t think that, has distanced himself from LW, and if he replies, is more likely to have a productive argument w/ John than Quintin Pope or Nora Belrose would. Not because he’s better, but because I think John and him would be more legible to each other.
- ^
Source: I’m on the AI Optimists Discord server and haven’t seen much to alter my prior belief that ~ everyone in alignment disagrees with everyone else.
- ^
This sounds like the sequence that I have wanted to write on corrigibility since ~2020 when I stopped working on the topic. So I am excited to see someone finally writing the thing I wish existed!
However, this also feels like a feature of the “valley of confused abstractions”. Humans didn’t evolve based on individual snapshots of reality, we evolved with moving pictures as input data. If a human in the ancestral environment, when faced with a tiger leaping out of the bushes, decided to analyse its fur patterns rather than register the large orange shape moving straight for them, it’s unlikely they would have returned from their first hunting trip. When we see a 2D image, we naturally “fill in the gaps”, and create a 3D model of it in our heads, but current image classifiers don’t do this, they just take a snapshot. For reasons of architectural bias, they are more likely to identify the abstraction of texture than the arguably far more natural abstraction of shape. Accordingly, we might expect video-based models to be more likely to recognise shapes than image-based models, and hence be more robust to this failure mode.
If this were true, then you’d expect NeRF’s and SORA to be a lot less biased towards textures, right?
I’ve observed much the same thing. And my conclusion is that we don’t have any proofs that ever greater intelligence converges to EU-maximizing agents.
Edit: We don’t even have an argument that meets the standards of rigour in physics.
What were the benefits?
Google’s auto-translation tools
Are there chrome-native tools which are better than google translate for webpages? Because you can use the latter on Firefox and it works pretty well.
EDIT: Fixed the link to google translate. Thanks, @the gears to ascension for pointing that out!
I’m writing a page for AIsafety.info on scaffolding, and was struggling to find a principled definition. Thank you for this!