I would be very interested to know what the monks think about this.
Chris_Leong
I think it’s much easier to talk about boundaries than preferences because true boundaries don’t really contradict between individuals
I’m quite curious about this. What if you’re stuck on an island with multiple people and limited food?
Very Wittgensteinian:
“What is your aim in Philosophy?”
“To show the fly the way out of the fly-bottle” (Philosophical Investigations)
Oh, they’re definitely valid questions. The problem is that the second question is rather vague. You need to either state what a good answer would look like or why existing answers aren’t satisifying.
I downvoted this post. I claim it’s for the public good, maybe you find this strange, but let me explain my reasoning.
You’ve come on Less Wrong, a website that probably has more discussion of this than any other website on the internet. If you want to find arguments, they aren’t hard to find. It’s a bit like walking into a library and saying that you can’t find a book to read.The trouble isn’t that you literally can’t find any book/arguments, it’s that you’ve got a bunch of unstated requirements that you want satisfied. Now that’s perfectly fine, it’s good to have standards. At the same time, you’ve asked the question in a maximally vague way. I don’t expect you to be able to list all your requirements. That’s probably impossible and when it is. possible, it’s often a lot of work. At the same time, I do believe that it’s possible to do better than maximally vague.
The problem with maximally vague questions is that they almost guarantee that any attempt to provide an answer will be unsatisfying both for the person answering and the person receiving the answer. Worse, you’ve framed the question in such a way that some people will likely feel compelled to attempt to answer anyway, lest people who think that there is such a risk come off as unable to respond to critics.
If that’s the case, downvoting seems logical. Why support a game where no-one wins?
Sorry if this comes off as harsh, that’s not my intent. I’m simply attempting to prompt reflection.
The Best Essay (Paul Graham)
I have access to Gemini 1.5 Pro. Willing to run experiments if you provide me with an exact experiment to run, plus cover what they charge me (I’m assuming it’s paid, I haven’t used it yet).
“But also this person doesn’t know about internal invariances in NN space or the compressivity of the parameter-function map (the latter in particular is crucial for reasoning about inductive biases), then I become extremely concerned”
Have you written about this anywhere?
Have you tried talking to professors about these ideas?
Is there anyone who understand GFlowNets who can provide a high-level summary of how they work?
Nabgure senzr gung zvtug or hfrshy:
Gurer’f n qvssrerapr orgjrra gur ahzore bs zngurzngvpny shapgvbaf gung vzcyrzrag n frg bs erdhverzragf naq gur ahzore bs cebtenzf gung vzcyrzrag gur frg bs erdhverzragf.
Fvzcyvpvgl vf nobhg gur ynggre, abg gur sbezre.
Gur rkvfgrapr bs n ynetr ahzore bs cebtenzf gung cebqhpr gur rknpg fnzr zngurzngvpny shapgvba pbagevohgrf gbjneqf fvzcyvpvgl.
I wrote up my views on the principle of indifference here:
https://www.lesswrong.com/posts/3PXBK2an9dcRoNoid/on-having-no-clue
I agree that it has certain philosophical issues, but I don’t believe that this is as fatal to counting arguments as you believe.
Towards the end I write:
“The problem is that we are making an assumption, but rather than owning it, we’re trying to deny that we’re making any assumption at all, ie. “I’m not assuming a priori A and B have equal probability based on my subjective judgement, I’m using the principle of indifference”. Roll to disbelieve.
I feel less confident in my post than when I wrote it, but it still feels more credible than the position articulated in this post.
Otherwise: this was an interesting post. Well done on identifying some arguments that I need to digest.
Maybe just say that you’re tracking the possibility?
Is there going to be a link to this from somewhere to make it accessible?
I think an important crux here is whether you think that we can build institutions which are reasonably good at checking the quality of AI safety work done by humans
Why is this an important crux? Is it necessarily the case that if we can reliably check AI safety work done by humans that we we reliably check AI safety work done by Ai’s which may be optimising against us?
Updated
Second, it is also possible to robustly verify the outputs of a superhuman intelligence without superhuman intelligence.
Why do you believe that a superhuman intelligence wouldn’t be able to deceive you by producing outputs that look correct instead of outputs that are correct?
[Question] Can we get an AI to do our alignment homework for us?
I guess the main doubt I have with this strategy is that even if we shift the vast majority of people/companies towards more interpretable AI, there will still be some actors who pursue black-box AI. Wouldn’t we just get screwed by those actors? I don’t see how CoEm can be of equivalent power to purely black-box automation.
That said, there may be ways to integrate CoEm’s into the Super Alignment strategy.
Doing stuff manually might provide helpful intuitions/experience for automating it?