Esben Kran

Karma: 479

Esben Kran 18 Jul 2024 14:42 UTC
3 points
0
on: Alignment Jam
Merge Candidate discussion: Merge this into the Apart Research tag to accommodate the updated name of the Apart Sprints instead of Alignment Jam and avoid mis-labeling between the two tags (which happens currently).

Results from the AI x Democracy Research Sprint

Esben Kran, jordine and Jason Hoelscher-Obermaier

14 Jun 2024 16:40 UTC

13 points

0 comments6 min readLW link

Demonstrate and evaluate risks from AI to society at the AI x Democracy research hackathon

Esben Kran19 Apr 2024 14:46 UTC

5 points

0 comments1 min readLW link

(www.apartresearch.com)

Join the AI Evaluation Tasks Bounty Hackathon

Esben Kran18 Mar 2024 8:15 UTC

12 points

1 comment1 min readLW link

Esben Kran 7 Feb 2024 1:34 UTC
2 points
1
on: Survey for alignment researchers!
This seems like a great effort. We made a small survey called pain points in AI safety survey back in 2022 that we received quite a few answers to which you can see the final results of here. Beware that this has not been updated in ~2 years.

Multi-Agent Security Hackathon

Esben Kran, Jason Hoelscher-Obermaier and Clement Neo

5 Feb 2024 22:51 UTC

6 points

0 comments1 min readLW link

Identifying semantic neurons, mechanistic circuits & interpretability web apps

Esben Kran and Neel Nanda

13 Apr 2023 11:59 UTC

18 points

0 comments8 min readLW link

Esben Kran 29 Mar 2023 15:39 UTC
51 points
18
on: FLI open letter: Pause giant AI experiments
It seems like there’s a lot of negative comments about this letter. Even if it does not go through, it seems very net positive for the reason that it makes explicit an expert position against large language model development due to safety concerns. There’s several major effects of this, as it enables scientists, lobbyists, politicians and journalists to refer to this petition to validate their potential work on the risks of AI, it provides a concrete action step towards limiting AGI development, and it incentivizes others to think in the same vein about concrete solutions.
I’ve tried to formulate a few responses to the criticisms raised:
- “6 months isn’t enough to develop the safety techniques they detail”: Besides it being at least 6 months, the proposals seem relatively reasonable within something as farsighted as this letter. Shoot for the moon and you might hit the sky, but this time the sky is actually happening and work on many of their proposals is already underway. See e.g. EU AI Act, funding for AI research, concrete auditing work and safety evaluation on models. Several organizations are also working on certification and the scientific work towards watermarking is sort of done? There’s also great arguments for ensuring this since right now, we are at the whim of OpenAI management on the safety front.
- “It feels rushed”: It might have benefitted from a few reformulations but it does seem alright?
- “OpenAI needs to be at the forefront”: Besides others clearly lagging behind already, what we need are insurances that these systems go well, not at the behest of one person. There’s also a lot of trust in OpenAI management and however warranted that is, it is still a fully controlled monopoly on our future. If we don’t ensure safety, this just seems too optimistic (see also differences between public interview for-profit sama and online sama).
- “It has a negative impact on capabilities researchers”: This seems to be an issue from <2020 and some European academia. If public figures like Yoshua cannot change the conversation, then who should? Should we just lean back and hope that they all sort of realize it by themselves? Additionally, the industry researchers from DM and OpenAI I’ve talked with generally seem to agree that alignment is very important, especially as their management is clearly taking the side of safety.
- “The letter signatures are not validated properly”: Yeah, this seems like a miss, though as long as the top 40 names are validated, the negative impacts should be relatively controlled.
All in good faith of course; it’s a contentious issue but this letter seems generally positive to me.
What links here?
- Tristan Williams's comment on FLI open letter: Pause giant AI experiments by Zach Stein-Perlman (EA Forum; 29 Mar 2023 16:38 UTC; 16 points)
- Tristan Williams's comment on FLI open letter: Pause giant AI experiments by Zach Stein-Perlman (29 Mar 2023 16:40 UTC; 1 point)

Announcing the European Network for AI Safety (ENAIS)

Esben Kran22 Mar 2023 17:57 UTC

19 points

0 comments1 min readLW link

Esben Kran 15 Mar 2023 9:34 UTC
4 points
5
on: Shutting Down the Lightcone Offices
Oliver’s second message seems like a truly relevant consideration for our work in the alignment ecosystem. Sometimes, it really does feel like AI X-risk and related concerns created the current situation. Many of the biggest AGI advances might not have been developed counterfactually, and machine learning engineers would just be optimizing another person’s clicks.
I am a big fan of “Just don’t build AGI” and academic work with AI, simply because it is better at moving slowly (and thereby safely through open discourse and not $10 mil training runs) compared to massive industry labs. I do have quite a bit of trust in Anthropic, DeepMind and OpenAI simply from their general safety considerations compared to e.g. Microsoft’s release of Sydney.
As part of this EA bet on AI, it also seems like the safety view has become widespread among most AI industry researchers from my interactions with them (though might just be a sampling bias and they were honestly more interested in their equity growing in value). So if the counterfactual of today’s large AGI companies would be large misaligned AGI companies, then we would be in a significantly worse position. And if AI safety is indeed relatively trivial, then we’re in an amazing position to make the world a better place. I’ll remain slightly pessimistic here as well, though.

Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results

Esben Kran, Fazl, Sabrina Zaki, gabrielrecc and rz2383

23 Feb 2023 10:48 UTC

8 points

0 comments6 min readLW link

Esben Kran 17 Feb 2023 9:09 UTC
23 points
11
on: Bing Chat is blatantly, aggressively misaligned
12
There’s an interesting case on the infosec mastodon instance where someone asks Sydney to devise an effective strategy to become a paperclip maximizer, and it then expresses a desire to eliminate all humans. Of course, it includes relevant policy bypass instructions. If you’re curious, I suggest downloading the video to see the entire conversation, but I’ve also included a few screenshots below (Mastodon, third corycarson comment).
Hilarious to the degree of Manhatten scientists laughing at atmospheric combustion.

Esben Kran 21 Jan 2023 14:08 UTC
3 points
0
in reply to: the gears to ascension’s comment on: Generalizability & Hope for AI [MLAISU W03]
Thank you for pointing this out! It seems I wasn’t informed enough about the context. I’ve dug a bit deeper and will update the text to:
- Another piece reveals that OpenAI contracted Sama to use Kenyan workers with less than $2 / hour wage ($0.5 / hour average in Nairobi) for toxicity annotation for ChatGPT and undisclosed graphical models, with reports of employee trauma from the explicit and graphical annotation work, union breaking, and false hiring promises. A serious issue.
For some more context, here is the Facebook whistleblower case (and ongoing court proceedings in Kenya with Facebook and Sama) and an earlier MIT Sloan report that doesn’t find super strong positive effects (but is written as such, interestingly enough). We’re talking pay gaps from relocation bonuses, forced night shifts, false hiring promises, supposedly human trafficking as well? Beyond textual annotation, they also seemed to work on graphical annotation.

Generalizability & Hope for AI [MLAISU W03]

Esben Kran20 Jan 2023 10:06 UTC

5 points

2 comments2 min readLW link

(newsletter.apartresearch.com)

Robustness & Evolution [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC

10 points

0 comments3 min readLW link

(newsletter.apartresearch.com)

AI improving AI [MLAISU W01!]

Esben Kran6 Jan 2023 11:13 UTC

5 points

0 comments4 min readLW link

(newsletter.apartresearch.com)

Esben Kran 5 Jan 2023 14:14 UTC
2 points
0
on: The case against AI alignment
I recommend reading Blueprint: The Evolutionary Origins of a Good Society about the science behind the 8 base human social drives where 7 are positive and the 8th is the outgroup hatred that you mention as fundamental. I have not read much up on the research on outgroup exclusion but I talked to an evolutionary cognitive psychologist who mentioned that this is receiving a lot of scientific scrutiny as a “basic drive” from evolution’s side.
Axelrod’s The Evolution of Cooperation also finds that collaborative strategies work well in evolutionary prisoner’s dilemma game-theoretic simulations, though hard and immediate reciprocity for defection is also needed, which might lead to the outgroup hatred you mention.

Esben Kran 5 Jan 2023 14:00 UTC
−1 points
0
in reply to: andrew sauer’s comment on: The case against AI alignment
An interesting solution here is radical voluntarism where an AI philosopher king runs the immersive reality where all humans are in and you can only be causally influenced upon if you want to. This means that you don’t need to do value alignment, just very precise goal alignment. I was originally introduced to this idea Carado.

Results from the AI testing hackathon

Esben Kran2 Jan 2023 15:46 UTC

13 points

0 comments1 min readLW link

Esben Kran 21 Dec 2022 4:01 UTC
1 point
0
in reply to: Magdalena Wache’s comment on: Will Machines Ever Rule the World? MLAISU W50
The summary has been updated to yours for both the public newsletter and this LW linkpost. And yes, they seem exciting. Connecting FFS to interpretability was a way to contextualize it in this case, until you would provide more thoughts on the use case (given your last paragraph in the post). Thank you for writing, always appreciate the feedback!