Fascinating paper! I wonder how much they would agree that holography means sparse tensors and convolution, or that the intuitive versus reflexive thinking basically amount to visuo-spatial versus phonological loop. Can’t wait to hear which other idea you’d like to import from this line of thought.
Ilio
I have no idea whether or not Hassibis is himself dismissive of that work
Well that’s a problem, don’t you think?
but many are.
Yes, as a cognitive neuroscientist myself, you’re right that many within my generation tend to dismiss symbolic approaches. We were students during a winter that many of us thought caused by the over promising and under delivering of the symbolic approach, with Minsky as the main reason for the slow start of neural networks. I bet you have a different perspective. What’s your three best points for changing the view of my generation?
Because I agree, and because « strangely » sounds to me like « with inconstancies ».
In other words, in my view the orthodox view on orthogonality is problematic, because it suppose that we can pick at will within the enormous space of possible functions, whereas the set of intelligent behavior that we can construct is more likely sparse and by default descriptible using game theory (think tit for tat).
This is a sort of positive nihilism. Because value is not inherent in the physical world, you can assign value to whatever you want, with no inconsistency.
Say we construct a strong AI that attributes a lot of value to a specific white noise screenshot. How would you expect it to behave?
Your point is « Good AIs should have a working memory, a concept that comes from psychology ».
DH point is « Good AIs should have a working memory, and the way to implement it was based on concepts taken from neuroscience ».
That’s indeed orthogonal notions, if you will.
I’m a bit annoyed that Hassabis is giving neuroscience credit for the idea of episodic memory.
That’s not my understanding. To me he is giving neuroscience credit for the ideas that made possible to implement a working memory in LLM. I guess he didn’t want to use words like thalamocortical, but from a neuroscience point of view transformers indeed look inspired by the isocortex, e.g. by the idea that a general distributed architecture can process any kind of information relevant to a human cognitive architecture.
I’d be happy if you could point out a non competitive one, or explain why my proposal above does not obey your axioms. But we seem to get diminished returns to sort these questions out, so maybe it’s time to close at this point and wish you luck. Thanks for the discussion!
Saying fuck you is helpful when the aim is to exclude whoever disagree with your values. This is often instrumental to construct a social group, or to get accepted in a social group that includes high status toxic characters. I take be nice as the claim that there are always better objectives.
This is aiming at a different problem than goal agnosticism; it’s trying to come up with an agent that is reasonably safe in other ways.
Well, assuming a robust implementation, I still think it obeys your criterions, but now you mention « restrictive », my understanding is that you want this expression to specifically refers to pure predictors. Correct?
If yes, I’m not sure that’s the best choice for clarity (why not « pure predictors »?) but of course that’s your choice. If not, can you give some examples of goal agnostic agents other than pure predictors?
You forgot to explain why these arguments only apply to strangers. Is there a reason to think medical research and economical incentives are better when it’s a family member who need a kidney?
Nope, my social media presence is very very low. But I’m open to suggestion since I realized there’s a lot of toxic characters with high status here. Did you try EA forums? Is it better?
(The actual question is about your best utilitarian model, not your strategy given my model.)
Uniform distribution of donating kidney sounds also the result when a donor is 10^19 more likely to set the example. Maybe I should precise that the donor is unlikely to take the 1% risk unless someone else is more critical to war effort.
Good laugh! But they’re also 10^19 times more likely to get the difference between donating one kidney and donating both.
[Question] What’s your best utilitarian model for risking your best kidneys?
Nope, but one of my son suggests discord.
Thanks for organizing this, here’s the pseudocode for my entry.
Robot 1: Cooperate at first, then tit for tat for 42 rounds, then identify yourself by playing: [0, 1, 1, 0, 0, 0, 1,1, 1, 1], then cooperate if the opponent did the same, otherwise defect.
Robot 2: Same as robot 1, ending with: … otherwise tit for tat
Robot 3 (secret): Same as robot 1, with a secret identifying sequence and number of initial ronds (you pick Isaac).
No problem with the loading here. The most important files seems positive and pseudocode. In brief, this seems an attempt to guess which algorithm the cerebellum implements, waiting for more input from neuroscientists and/or coders to implement and test the idea. Not user friendly indeed. +1 for clarifications needed.
I waited Friday so that you won’t sleep at school because of me, but yes I enjoyed both style and freshness of ideas!
Look, I think you’re a young & promising opinion writer, but if you stay on LW I would expect you’ll get beaten by the cool kids (for lack of systematic engagement with both spirit and logical details of the answers you get). What about finding some place more about social visions and less about pure logic? Send me where and I’ll join for more about the strengths and some pitfalls maybe.
…but I thought the criterion was unconditional preference? The idea of nausea is precisely because agents can decide to act despite nausea, they’d just rather find a better solution (if their intelligence is up to the task).
I agree that curiosity, period seems highly vulnerable (You read Scott Alexander? He wrote an hilarious hit piece about this idea a few weeks or months ago). But I did not say curious, period. I said curious about what humans will freely chose next.
In other words, the idea is that it should prefer not to trick humans, because if it does (for example by interfering with our perception) then it won’t know what we would have freely chosen next.
It also seems to cover security (if we’re dead it won’t know), health (if we’re incapacitated it won’t know) and prosperity (if we’re under economical constraints that impacts our free will). But I’m interested to consider possible failure modes.
(« Sorry, I’d rather not do your wills, for that would impact the free will of other humans. But thanks for letting me know that was your decision! You can’t imagine how good it feels when you tell me that sort of things! »)
Notice you don’t see me campaigning for this idea, because I don’t like any solution that does not also take care of AI well being. But when I first read « goal agnosticism » it strikes me as an excellent fit for describing the behavior of an agent acting under these particular drives.
It’s a key faith I used to share, but I’m now agnostic about that. To take a concrete exemple, everyone knows that blues and reds get more and more polarized. Grey type like old me would thought there must be a objective truth to extract with elements from both sides. Now I’m wondering if ethics should ends with: no truth can help deciding whether future humans should be able to live like bees or like dolphins or like the blues or like the reds, especially when living like the reds means eating the blues and living like the blues means eating the dolphins and saving the bees. But I’m very open to hear new heuristics to tackle this kind of question
Very true, unless we nitpick definitions for « largely understand ».
Very interesting link, thank you.