Esben Kran

Karma: 470

Esben Kran 29 Mar 2023 15:39 UTC
51 points
18
on: FLI open letter: Pause giant AI experiments
It seems like there’s a lot of negative comments about this letter. Even if it does not go through, it seems very net positive for the reason that it makes explicit an expert position against large language model development due to safety concerns. There’s several major effects of this, as it enables scientists, lobbyists, politicians and journalists to refer to this petition to validate their potential work on the risks of AI, it provides a concrete action step towards limiting AGI development, and it incentivizes others to think in the same vein about concrete solutions.
I’ve tried to formulate a few responses to the criticisms raised:
- “6 months isn’t enough to develop the safety techniques they detail”: Besides it being at least 6 months, the proposals seem relatively reasonable within something as farsighted as this letter. Shoot for the moon and you might hit the sky, but this time the sky is actually happening and work on many of their proposals is already underway. See e.g. EU AI Act, funding for AI research, concrete auditing work and safety evaluation on models. Several organizations are also working on certification and the scientific work towards watermarking is sort of done? There’s also great arguments for ensuring this since right now, we are at the whim of OpenAI management on the safety front.
- “It feels rushed”: It might have benefitted from a few reformulations but it does seem alright?
- “OpenAI needs to be at the forefront”: Besides others clearly lagging behind already, what we need are insurances that these systems go well, not at the behest of one person. There’s also a lot of trust in OpenAI management and however warranted that is, it is still a fully controlled monopoly on our future. If we don’t ensure safety, this just seems too optimistic (see also differences between public interview for-profit sama and online sama).
- “It has a negative impact on capabilities researchers”: This seems to be an issue from <2020 and some European academia. If public figures like Yoshua cannot change the conversation, then who should? Should we just lean back and hope that they all sort of realize it by themselves? Additionally, the industry researchers from DM and OpenAI I’ve talked with generally seem to agree that alignment is very important, especially as their management is clearly taking the side of safety.
- “The letter signatures are not validated properly”: Yeah, this seems like a miss, though as long as the top 40 names are validated, the negative impacts should be relatively controlled.
All in good faith of course; it’s a contentious issue but this letter seems generally positive to me.
What links here?
- Tristan Williams's comment on FLI open letter: Pause giant AI experiments by Zach Stein-Perlman (EA Forum; 29 Mar 2023 16:38 UTC; 16 points)
- Tristan Williams's comment on FLI open letter: Pause giant AI experiments by Zach Stein-Perlman (29 Mar 2023 16:40 UTC; 1 point)

Esben Kran 24 Nov 2022 15:39 UTC
24 points
12
in reply to: Ben Pace’s comment on: Conjecture: a retrospective after 8 months of work
Agreed, the releases I’ve seen from Conjecture has made me incredibly hopeful of what they can achieve and interested in their approach to the problem more generally.
When it comes to the coordination efforts, I’m generally of the opinion that we can speak the truth while being inviting and ease people into the arguments. From my chats with Conjecture, it seems like their method is not “come in, the water is fine” but “come in, here are the reasons why the water is boiling and might turn into lava”.
If we start by speaking the truth “however costly it may be” and this leads to them being turned off by alignment, we have not actually introduced them to truth but have achieved the opposite. I’m left with a sense that their coordination efforts are quite promising and follow this line of thinking, though this might be a knowledge gap from my side (I know of 3-5 of their coordination efforts). I’m looking forward to seeing more posts about this work, though.
As per the product side, the goal is not to be profitable fast, it is to be attractive to non-alignment investors (e.g. AirBnB is not profitable). I agree with the risks, of course. I can see something like Loom branching out (haha) as a valuable writing tool with medium risk. Conjecture’s corporate structure and core team alignment seems to be quite protective of a product branch being positive, though money always have unintended consequences.
I have been a big fan of the “new agendas” agenda and I look forward to read about their unified research agenda! The candidness of their contribution to the alignment problem has also updated me positively on Conjecture and the organization-building startup costs seems invaluable and inevitable. Godspeed.

Esben Kran 17 Feb 2023 9:09 UTC
23 points
11
on: Bing Chat is blatantly, aggressively misaligned
12
There’s an interesting case on the infosec mastodon instance where someone asks Sydney to devise an effective strategy to become a paperclip maximizer, and it then expresses a desire to eliminate all humans. Of course, it includes relevant policy bypass instructions. If you’re curious, I suggest downloading the video to see the entire conversation, but I’ve also included a few screenshots below (Mastodon, third corycarson comment).
Hilarious to the degree of Manhatten scientists laughing at atmospheric combustion.

Esben Kran 18 Mar 2022 17:38 UTC
16 points
on: [Beta Feature] Google-Docs-like editing for LessWrong posts
I absolutely love this feature set! I really appreciate the markdown to formatting (Obsidian-style), live collaboration and LaTeX rendering. I think this also creates fantastic opportunities for innovations in communicating thoughts through LessWrong. Here’s a list of my feedback:
Shortcuts: One feature I really enjoy in Google Docs is more deep text editing keyboard shortcuts than the standard set. One of my favourites is the Alt + Shift + Arrow up or down that replaces the current paragraph with the one above or below, effectively moving the paragraph in the text. Obsidian has a keyboard shortcuts editor that is quite nice as well but could have more (see here, page 2). I think shortcuts can make writing as seamless as some code editors (not thinking VIM here, but unintrusive extra editor tooling). Additionally, as I mention elsewhere, some (I just know European) keyboards do not take well to Ctrl + Alt + [] shortcuts like Ctrl + Alt + M, especially on the web.
Collapsible boxes: Having “fact boxes” as expandable interactive elements seems like a very good idea as well and relatively cheap to implement. I recommend looking at e.g. Hugo XML syntax for these things (XML is a pain, you can probably figure a better writing UX out).
Interactive documents: I’m more for the Obsidian thought representation but the hierarchy in Roam is quite relevant for communicating thoughts to others. I think a strong representation of this is quite relevant and might be an in-line, minimalist, recursive “collapsible box” as described above.
Foot notes: I’d say we already have quite a strong Roam-style organization in the hover pop-ups of LessWrong post links and it would be very nice to have a version of that for foot-notes, given their disparate nature. I.e. hover = shows that specific footnote.
Live preview & 100% keyboard editing: See Obsidian’s implementation of this. The markdown to formatting feature is already super awesome is their LaTeX editing that does in-line MathJax with $ wrapping and block with $$ wrapping. The current LessWrong LaTeX editor requires me to use the cursor, AFAIK. Obsidian is also pretty good at minimalist formatting rulesets for inspiration. Check out their vast plugins library for inspiration as well. I’m sure there’s some absolute text editor gold in there.
Collaborative editing extension: This makes or breaks our usage of LessWrong as the editing platform of choice. It would be awesome to have editing group settings, i.e. so I don’t have to share every article with Apart Research members but can have a Google Drive-style folder sharing for blog post edits. Otherwise, we have to maintain a collection of links in a weird format somewhere.
I also echo the other comments’ feedback points. However overall, absolutely marvelous work! Really looking forward to the developments on this!

Esben Kran 10 Nov 2022 14:50 UTC
9 points
1
in reply to: dkirmani’s comment on: I Converted Book I of The Sequences Into A Zoomer-Readable Format
Thank you for making this, I think it’s really great!
The idea of the attention-grabbing video footage is that you’re not just competing between items on the screen, you’re also competing with the videos that come before and after your video. Therefore, yours has to be visually engaging just for that zoomer (et al.) dopamine rush.
Subway Surfers is inherently pretty active and as you mention, the audio will help you here, though you might be able to find a better voice synthesizer that can be a bit more engaging (not sure if TikTok supplies this). So my counterpoint to Trevor1′s is that we probably want to keep something like Subway Surfers in the background but that can of course be many things such as animated AI generated images or NASA videos of the space race. Who really knows—experimentation is king.

Esben Kran 30 Oct 2022 22:29 UTC
6 points
0
on: Me (Steve Byrnes) on the “Brain Inspired” podcast
Wonderful to hear you on Brain Inspired, it’s a very good episode! I’ve also followed the podcast for ages and I find it very nice that we get more interaction with neuroscience.

Esben Kran 26 Nov 2022 18:31 UTC
5 points
1
on: The limited upside of interpretability
Thank you for this critique! They are always helpful to hone in on the truth.
So as far as I understand your text, you argue that fine-grained interpretability loses out against “empiricism” (running the model) because of computational intractability.
I generally disagree with this. beren points out many of the same critiques of this piece as I would come forth with. Additionally, the arguments seem too undefined, like there is not in-depth argumentation enough to support the points you make. Strong upvote for writing them out, though!
You emphasize the Human Brain Project (HBP) quite a lot, even in the comments, as an example of a failed large-scale attempt to model a complex system. I think this characterization is correct but it does not seem to generalize beyond the project itself. It seems just as much like a project management and strategy problem as so much else. Benes’ comment is great for more reasoning into this and why ANNs seem significantly more tractable to study than the brain.
Additionally, you argue that interpretability and ELK won’t succeed simply because of the intractability of fine-grained interpretability. I have two points against this view:
1. Mechanistic interpretability have clearly already garnered quite a lot of interesting and novel insights into neural networks and causal understanding since the field’s inception 7 years ago.
It seems premature to disregard the plausibility of the agenda itself, just as it is premature to disregard the project of neuroscience based on HBP. Now, arguing that it’s a matter of speed seems completely fine but this is another argument and isn’t emphasized in the text.
2. Mechanistic interpretability does not seem to be working on fine-grained interpretability (?).
Maybe it’s just my misunderstanding of what you mean by fine-grained interpretability, but we don’t need to figure out what neurons do, we literally design them. So the inspections happen at feature level, which is much more high-level than investigating individual neurons (sometimes these features seem represented in singular neurons of course). The circuits paradigm also generally looks at neural networks like systems neuroscience does, interpreting causal pathways in the models (of course with radical methodological differences because of the architectures and computation medium). The mechanistic interpretability project does not seem misguided by an idealization of neuron-level analysis and will probably adopt any new strategy that seems promising.
For example work in this paradigm that seems promising, see interpretability in the wild, ROME, the superpositions exposition, the mathematical understanding of transformers and the results from the interpretability hackathon.
For an introduction to features as the basic building blocks as compared to neurons, see Olah et al.’s work (2020).
When it comes to your characterization of the “empirical” method, this seems fine but doesn’t conflict with interpretability. It seems you wish to make game theory-like understanding of the models or have them play in settings to investigate their faults? Do you want to do model distillation using circuits analyses or do you want AI to play within larger environments?
I falter to understand the specific agenda from this that isn’t done by a lot of other projects already, e.g. AI psychology and building test environments for AI. I do see potential in expanding the work here but I see that for interpretability as well.
Again, thank you for the post and I always like when people cite McElreath, though I don’t see his arguments apply as well to interpretability since we don’t model neural networks with linear regression at all. Not even scaling laws use such simplistic modeling, e.g. see Ethan’s work.

Esben Kran 22 Apr 2022 22:26 UTC
5 points
in reply to: Steven Byrnes’s comment on: [Intro to brain-like-AGI safety] 12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”
[...]Still, both the reward function and the RL algorithm are inputs into the adult’s jealousy-related behavior[...]
I probably just don’t know enough about jealousy networks to comment here but I’d be curious to see the research here (maybe even in an earlier post).
Does anyone believe in “the strict formulation”?
Hopefully not, but as I mention, often a too-strict formulation imh.
[...]first AGI can hang out with younger AGIs[...]
More the reverse. And again, this is probably taking it farther than I would take this idea but it would be pre-AGI training in an environment with symbolic “aligned” models, learning the ropes from this, being used as the “aligned” model in the next generation and so on. IDA with a heavy RL twist and scalable human oversight in the sense that humans would monitor rewards and environment states instead of providing feedback on every single action. Very flawed but possible.
RE: RE:
- Yeah, this is a lot of what the above proposal was also about.
[...] and the toddler gets negative reward for inhibiting the NPCs from accomplishing their goals, and positive reward for helping the NPCs accomplish their goals [...]
As far as I understand from the post, the reward comes only from understanding the reward function before interaction and not after which is the controlling factor for obstructionist behaviour.
- Agreed, and again more as an ingredient in the solution than an ends in itself. BNN OOD management is quite interesting so looking forward to that post!

Esben Kran 15 Mar 2023 9:34 UTC
4 points
5
on: Shutting Down the Lightcone Offices
Oliver’s second message seems like a truly relevant consideration for our work in the alignment ecosystem. Sometimes, it really does feel like AI X-risk and related concerns created the current situation. Many of the biggest AGI advances might not have been developed counterfactually, and machine learning engineers would just be optimizing another person’s clicks.
I am a big fan of “Just don’t build AGI” and academic work with AI, simply because it is better at moving slowly (and thereby safely through open discourse and not $10 mil training runs) compared to massive industry labs. I do have quite a bit of trust in Anthropic, DeepMind and OpenAI simply from their general safety considerations compared to e.g. Microsoft’s release of Sydney.
As part of this EA bet on AI, it also seems like the safety view has become widespread among most AI industry researchers from my interactions with them (though might just be a sampling bias and they were honestly more interested in their equity growing in value). So if the counterfactual of today’s large AGI companies would be large misaligned AGI companies, then we would be in a significantly worse position. And if AI safety is indeed relatively trivial, then we’re in an amazing position to make the world a better place. I’ll remain slightly pessimistic here as well, though.

Esben Kran 30 Oct 2022 22:32 UTC
4 points
0
in reply to: Zac Hatfield-Dodds’s comment on: AI & ML Safety Updates W43
Thank you for pointing it out! I’ve reached out to the BlueDot team as well and maybe it’s a platform-specific issue since it looks fine on my end.

Esben Kran 25 Aug 2022 19:19 UTC
4 points
−4
on: Common misconceptions about OpenAI
Thank you very much for writing this post, Jacob. I think it clears up several of the misconceptions you emphasize.
I generally seem to agree with John that the class of problems OpenAI focuses on might be more capabilities-aligned than optimal but at the same time, having a business model that relies on empirical prosaic alignment of language models generates interesting alignment results and I’m excited for the alignment work that OpenAI will be working on!

Esben Kran 21 Apr 2022 23:46 UTC
4 points
in reply to: Steven Byrnes’s comment on: [Intro to brain-like-AGI safety] 12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”
Thank you for the comprehensive response!
To be clear, you & I are strongly disagreeing on this point.
It seems like we are mostly agreeing in the general sense that there are areas of the brain with more individual differentiation and areas with less. The disagreement is probably more in how different this jealousy is exhibited as a result of the neocortical part of the circuit you mention.
And I endorse that criticism because I happen to be a big believer in “cortical uniformity” [...] different neural architectures and hyperparameters in different places and at different stages [...].
Great to hear, then I have nothing to add there! I am quite inclined to believe that the neural architecture and hyperparameter differences are underestimated as a result of Brodmann areas being a thing at all, i.e. I’m a supporter of the broad cortical uniformity argument but against the strict formulation that I feel is way too prescriptive given our current knowledge of the brain’s functions.
Yeah, maybe see Section 2.3.1, “Learning-from-scratch is NOT “blank slate””.
And I will say I’m generally inclined to agree with your classification of the brain stem and hypothalamus.
That sounds great but I don’t understand what you’re proposing. What are the “relevant modalities of loving family”? I thought the important thing was there being an actual human that could give feedback and answer questions based on their actual human values, and these can’t be simulated because of the chicken-and-egg problem.
To be clear, my baseline would also be to follow your methodology but I think there’s a lot of opportunity in the “nurture” approach as well. This is mostly related to the idea of open-ended training (e.g. like AlphaZero) and creating a game-like environment where it’s possible to train for the agent. This can to some degree be seen as a sort of IDA proposal since your environment will need to be very complex (e.g. have other agents that are kind or other “aligned trait”, possibly trained from earlier states).
With this sort of setup, the human-giving-feedback is the designer of the environment itself, leading to a form of scalable human oversight probably iterating over many environments and agents, i.e. the IDA part of the idea. And again, there are a lot of holes in this plan, but I feel like it should not be dismissed outright. This post should also inform this process. So a very broad “loving family” proposal, though the name itself doesn’t seem adequate for this sort of approach ;)

Esben Kran 20 Apr 2022 23:30 UTC
4 points
on: [Intro to brain-like-AGI safety] 12. Two paths forward: “Controlled AGI” and “Social-instinct AGI”
Strong upvote for including more brain-related discussions in the alignment problem and for the deep perspectives on how to do it!
Disclaimer: I have not read the earlier posts but I have been researching BCI and social neuroscience.
I think there’s a Steering Subsystem circuit upstream of jealousy and schadenfreude; and
I think there’s a Steering Subsystem circuit upstream of our sense of compassion for our friends.
It is quite crucial to realize that these subsystems probably are very differently organized, are at very different magnitudes, and have very different functions in every different human. Cognitive neuroscience’s view of the modular brain seems (by the research) to be quite faulty and computational/complexity neuroscience generally seem more successful and are more concerned with reverse-engineering the brain, i.e. identifying neural circuitry associated with different evolutionary behaviours. This should inform how we cannot find “the Gaussian brain” and implement it.
You also mention in post #3 that you have the clean slate vs. pre-slate systems (learning vs. steering subsystems) and a less clear distinction here might be helpful instead of modularization. All learning subsystems are inherently organized in ways that evolutionarily seem to fit into that learning scheme (think neural network architectures) which is in and off itself another pre-seeded mechanism. You might have pointed this out in earlier posts as well, sorry if that’s the case!
I think I’m unusually inclined to emphasize the importance of “social learning by watching people”, compared to “social learning by interacting with people”.
In general, it seems babies learn quite a lot better from social interaction than pure watching. And between the two, they definitely learn better if they have something to imitate what they’re seeing upon. There’s definitely a good point in the speed differentials between human and AGI existence and I think exploring the opportunity of building a crude “loving family” simulator might not be a bad idea, i.e. have it grow up at its own speed in an OpenAI Gym simulation with relevant modalities of loving family.
I’m generally pro the “get AGI to grow up in a healthy environment” but definitely with the perspective that this is pretty hard to do with the Jelling Stones and that it seems plausible to simulate this either in an environment or with pure training data. But the point there is that the training data really needs to be thought of as the “loving family” in its most general sense since it indeed has a large influence on the AGI’s outcomes.
But great work, excited to read the rest of the posts in this series! Of course, I’m open for discussion on these points as well.

Esben Kran 18 Mar 2022 17:20 UTC
4 points
in reply to: mako yass’s comment on: [Beta Feature] Google-Docs-like editing for LessWrong posts
Agreed, new ways of ordering thoughts online is an awesome opportunity on LessWrong!
Foot notes: I’d say we already have quite a strong Roam-style organization in the hover pop-ups of LessWrong post links but as you say, it would be very nice to have a version of that for foot-notes, given their disparate nature.
Collapsible boxes: Having “fact boxes” as expandable interactive elements seems like a very good idea as well and relatively cheap to implement. I recommend looking at e.g. Hugo XML syntax for these things (XML is a pain, you can probably figure a better writing UX out).
Interactive documents: I’m more for the Obsidian thought representation but the hierarchy in Roam is quite relevant for communicating thoughts to others. I think a strong representation of this is quite relevant and might be an in-line, minimalist, recursive “collapsible box” as described above.

Esben Kran 21 Jan 2023 14:08 UTC
3 points
0
in reply to: the gears to ascension’s comment on: Generalizability & Hope for AI [MLAISU W03]
Thank you for pointing this out! It seems I wasn’t informed enough about the context. I’ve dug a bit deeper and will update the text to:
- Another piece reveals that OpenAI contracted Sama to use Kenyan workers with less than $2 / hour wage ($0.5 / hour average in Nairobi) for toxicity annotation for ChatGPT and undisclosed graphical models, with reports of employee trauma from the explicit and graphical annotation work, union breaking, and false hiring promises. A serious issue.
For some more context, here is the Facebook whistleblower case (and ongoing court proceedings in Kenya with Facebook and Sama) and an earlier MIT Sloan report that doesn’t find super strong positive effects (but is written as such, interestingly enough). We’re talking pay gaps from relocation bonuses, forced night shifts, false hiring promises, supposedly human trafficking as well? Beyond textual annotation, they also seemed to work on graphical annotation.

Esben Kran 31 May 2022 15:58 UTC
3 points
on: The Plan
In general a great piece. One thing that I found quite relatable is the point about the preparadigmatic stage of AI safety going into later stages soon. It feels like this is already happening to some degree where there are more and more projects readily available, more prosaic alignment and interpretability projects at large scale, more work done in multiple directions and bigger organizations having explicit safety staff and better funding in general.
With these facts, it seems like there’s bound to be a relatively big phase shift in research and action within the field that I’m quite excited about.

Esben Kran 1 Apr 2022 16:42 UTC
3 points
in reply to: TLW’s comment on: Your specific attitudes towards AI safety
As I also responded to Ben Pace, I believe I replied too intellectually defensively to both of your comments as a result of the tone and would like to rectify that mistake. So thank you sincerely for the feedback and I agree that we would like not to exclude anyone unnecessarily nor have too much ambiguity in the answers we expect. We have updated the survey as a result and again, excuse my response.

Esben Kran 28 Mar 2022 1:23 UTC
3 points
in reply to: Ben Pace’s comment on: Your specific attitudes towards AI safety
Updated, thank you. We were unsure if it was best to keep it vague as to not bias the responses, but it is true that you need slight justification and that it is net positive to provide the explanation.

Esben Kran 7 Feb 2024 1:34 UTC
2 points
1
on: Survey for alignment researchers!
This seems like a great effort. We made a small survey called pain points in AI safety survey back in 2022 that we received quite a few answers to which you can see the final results of here. Beware that this has not been updated in ~2 years.

Esben Kran 5 Jan 2023 14:14 UTC
2 points
0
on: The case against AI alignment
I recommend reading Blueprint: The Evolutionary Origins of a Good Society about the science behind the 8 base human social drives where 7 are positive and the 8th is the outgroup hatred that you mention as fundamental. I have not read much up on the research on outgroup exclusion but I talked to an evolutionary cognitive psychologist who mentioned that this is receiving a lot of scientific scrutiny as a “basic drive” from evolution’s side.
Axelrod’s The Evolution of Cooperation also finds that collaborative strategies work well in evolutionary prisoner’s dilemma game-theoretic simulations, though hard and immediate reciprocity for defection is also needed, which might lead to the outgroup hatred you mention.

Esben Kran

12