lberglund

Karma: 586

Views are my own.

lberglund 19 Oct 2021 1:38 UTC
3 points
on: Prioritization Research for Advancing Wisdom and Intelligence
FYI, the link at the top of the post isn’t working for me.

lberglund 17 Nov 2021 16:32 UTC
15 points
on: Ngo and Yudkowsky on alignment difficulty
[I may be generalizing here and I don’t know if this has been said before.]
It seems to me that Eliezer’s models are a lot more specific than people like Richard’s. While Richard may put some credence on superhuman AI being “consequentialist” by default, Eliezer has certain beliefs about intelligence that make it extremely likely in his mind.
I think Eliezer’s style of reasoning which relies on specific, thought-out models of AI makes him more pessimistic than others in EA. Others believe there are many ways that AGI scenarios could play out and are generally uncertain. But Eliezer has specific models that make some scenarios a lot more likely in his mind.
There are many valid theoretical arguments for why we are doomed, but maybe other EAs put less credence in them than Eliezer does.

lberglund 17 Nov 2021 20:19 UTC
1 point
in reply to: RamblinDash’s comment on: Quadratic Voting and Collusion
The “collusion” issue leads to a state of affairs that two political groups can gain more political power if they can organize and get along well enough to actively coordinate. Why should two groups have more power just because they can cooperate?

lberglund 30 Nov 2021 0:23 UTC
3 points
on: Soares, Tallinn, and Yudkowsky discuss AGI cognition
It seems pretty obvious to me that what “slow motion doom” looks like in this sense is a period during which an AI fully conceals any overt hostile actions while driving its probability of success once it makes its move from 90% to 99% to 99.9999%, until any further achievable decrements in probability are so tiny as to be dominated by the number of distant galaxies going over the horizon conditional on further delays.
Wouldn’t another consideration be that the AI is more likely to be caught the longer it prepares? Or is this chance negligible since the AI could just execute its plan the moment people try to prevent it?

lberglund 17 Dec 2021 10:18 UTC
2 points
on: The Case for Radical Optimism about Interpretability

I think many people here are already familiar with the circuits line of research at OpenAI. Though I think it’s now mostly been abandoned

I wasn’t aware that the circuits approached was abandoned. Do you know why they abandoned it?

lberglund 13 Jan 2022 18:01 UTC
1 point
AF
on: Prizes for ELK proposals
Potentially silly question:
In the first counterexample you describe the desired behavior as
Intuitively, we expect each node in the human Bayes net to correspond to a function of the predictor’s Bayes net. We’d want the reporter to simply apply the relevant functions from subsets of nodes in the predictor’s Bayes net to each node in the human Bayes net [...]
After applying these functions, the reporter can answer questions using whatever subset of nodes the human would have used to answer that question.
Why doesn’t the reporter skip the step of mapping the predictor’s Bayes net to the human’s and instead just answer the question using its own nodes? What’s the benefit of having the intermediate step that maps the predictor’s net to the human’s?

lberglund 17 Jan 2022 15:56 UTC
1 point
AF
in reply to: Mark Xu’s comment on: Prizes for ELK proposals
I see, thanks for answering. To further clarify, given the reporter’s only access to the human’s nodes is through the human’s answers, would it be equally likely for the reporter to create a mapping to some other Bayes net that is similarly consistent with the answers provided? Is there a reason why the reporter would map to the human’s Bayes net in particular?

lberglund 23 Feb 2022 0:38 UTC
5 points
in reply to: gwern’s comment on: Why I’m co-founding Aligned AI
Another difference is the geographic location! As someone who grew up in Germany, living in England is a lot more attractive to me since it will allow me to be closer to my family. Others might feel similarly.

lberglund 3 Apr 2022 20:29 UTC
3 points
in reply to: jessicata’s comment on: MIRI announces new “Death With Dignity” strategy
I had a similar thought. Also, in an expected value context it makes sense to pursue actions that succeed when your model is wrong and you are actually closer to the middle of the success curve, because if that’s the case you can increase our chances of survival more easily. In the logarithmic context doing so doesn’t make much sense, since your impact on the logistic odds is the same no matter where on the success curve you are.
Maybe this objective function (and the whole ethos of Death with Dignity) is way to justify working on alignment even if you think our chances of success are close to zero. Personally, I’m not compelled by it.

lberglund 16 Apr 2022 17:36 UTC
3 points
in reply to: shminux’s comment on: Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon
I mostly agree with that relying on real world data is necessary for better understanding our messy world and that in most cases this approach is favorable.
There’s a part of me that thinks AI is a different case though, since getting it even slightly wrong will be catastrophic. Experimental alignment research might get us most of the way to aligned AI, but there will probably still be issues that aren’t noticeable because the AIs we are experimenting on won’t be powerful enough to reveal them. Our solution to the alignment problem can’t be something imperfect that does the job well enough. Instead is has to be something that can withstand immense optimization pressure. My intuition tells me that the single-hose solution is not enough for AGI and we instead need something that is flawless in practice and in theory.

lberglund 5 Jun 2022 8:52 UTC
1 point
in reply to: Andrew_Critch’s comment on: “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments
The link doesn’t work. I think you are linking to a draft version of the post or something.

lberglund 17 Jun 2022 16:21 UTC
3 points
0
in reply to: paulfchristiano’s comment on: Why all the fuss about recursive self-improvement?
Can someone clarify what “k>1” refers to in this context? Like, what does k denote?

lberglund 18 Jun 2022 16:58 UTC
15 points
on: Pivotal outcomes and pivotal processes
It’s worth emphasizing your point about the negative consequences of merely aiming for a pivotal act.
Additionally, if a lot of people in the AI safety community advocate for a pivotal act, it makes people less likely to cooperate with and trust that community. If we want to make AGI safe, we have to be able to actually influence the development of AGI. To do that, we need to build a cooperative relationship with decision makers. Planning a pivotal act runs counter to these efforts.

lberglund 23 Jun 2022 23:49 UTC
3 points
AF
on: [AN #155]: A Minecraft benchmark for algorithms that learn without reward functions
2. There’s lots of Minecraft videos on YouTube, so you could test a “GPT-3 for Minecraft” approach.
OpenAI just did this exact thing.

lberglund 26 Jun 2022 17:35 UTC
4 points
0
in reply to: Rob Bensinger’s comment on: The inordinately slow spread of good AGI conversations in ML
The original stated rationale behind OpenAI was https://medium.com/backchannel/how-elon-musk-and-y-combinator-plan-to-stop-computers-from-taking-over-17e0e27dd02a.
This link is dead for me. I found this link that points to the same article.

lberglund 26 Jun 2022 17:42 UTC
2 points
on: Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
This is the same flawed approach that airport security has, which is why travelers still have to remove shoes and surrender liquids: they are creating blacklists instead of addressing the fundamentals.
Just curious, what would it look like to “address the fundamentals” in airport security?

lberglund 27 Jun 2022 21:49 UTC
1 point
in reply to: elspood’s comment on: Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
This is very interesting. Thanks for taking the time to explain :)

lberglund 12 Jul 2022 16:48 UTC
5 points
1
on: On how various plans miss the hard bits of the alignment challenge
I was a bit confused about this quote, so I tried to expand on the ideas a bit. I’m posting it here in case anyone benefits from is or disagrees.
To which I say: I expect many of the cognitive gains to come from elsewhere, much as a huge number of the modern capabilities of humans are encoded in their culture and their textbooks rather than in their genomes. Because there are slopes in capabilities-space that an intelligence can snowball down, picking up lots of cognitive gains, but not alignment, along the way.
I guess saying is saying that an AI will develop a way to learn things without gradient descent, just like humans learned things outside of our genetic update. Some ways to do this would be
- Develop the ability to read things on the internet and learn from them
- Spend cognitive energy on things like doing math or programming
- Do things to actually gain power in the world, like accumulating money or compute
I guess the argument is that, for objectives, only gradient descent is pushing you in the correct direction, whereas for capabilities, the system will develop ways to push itself in the right direction in addition to SGD. Like, it’s true that for any objective function its good to be more powerful.;It’s not true that for any level of power the system is incentivized to have the more correct objective.
A system wants to be more powerful, but it doesn’t want to have a more “correct” objective.

lberglund 25 Jul 2022 20:56 UTC
3 points
2
on: Reward is not the optimization target
Another reason to not expect the selection argument to work is that it’s instrumentally convergent for most inner agent values to not become wireheaders, for them to not try hitting the reward button.
To me this implies that as the AI becomes more situationally aware it learns to avoid rewards that reinforce away its current goals (because it wants to preserve its goals.) As a result, throughout the training process, the AIs goals start out malleable and “harden” once the AI gains enough situational awareness. This implies that goals have to be simple enough for the agent to be able to model them early on in its training process.

lberglund 8 Aug 2022 23:28 UTC
9 points
3
on: Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
The way I see it having a lower level understanding of things allows you to create abstractions about their behavior that you can use to understand them on a higher level. For example, if you understand how transistors work on a lower level you can abstract away their behavior and more efficiently examine how they wire together to create memory and processor. This is why I believe that a circuits-style approach is the most promising one we have for interpretability.
Do you agree that a lower level understanding of things is often the best way to achieve a higher level understanding, in particular regarding neural network interpretability, or would you advocate for a different approach?