Daniel_Eth

Karma: 863

Daniel_Eth 27 Sep 2025 21:46 UTC
17 points
20
in reply to: Eric Neyman’s comment on: Reasons to sell frontier lab equity to donate now rather than later
I think this is a good point. At the same time, I suspect the main reason we’re likely to be wiser a year from now is that we’ll have done stuff over the coming year that we’ll learn from. And the more we spend over the next year, the more we’ll be able to do, leading to more learning. In some ways this feels like “yes, maybe from an individual level it’ll feel better to wait and learn more, but your spending now not only lets you learn better but also lets others learn better.” I think the factor I’m pointing to is actually substantial, in particular if you’re funding highly promising areas that are relatively new and that others are skeptical of or feel insufficiently knowledgeable about.

Daniel_Eth 1 Aug 2024 5:25 UTC
7 points
6
on: Twitter thread on AI safety evals
evals for things like automated ML R&D are only worrying for people who already believe in AI xrisk
I don’t think this is true – or, more specifically, I think there are a lot of people who will start to worry about AI xrisk if things like automated ML R&D pick up. Most people who dismiss AI xrisk I don’t think do so because they think intelligence is inherently good, but instead because AI xrisk just seems too “scifi.” But if AI is automating ML R&D, then the idea of things getting out of hand won’t feel as scifi. In principle, people should be able to separate the question of “will AI soon be able to automate ML R&D” from the question of “if AI could automate ML R&D, would it pose an xrisk”, but I think most low-decouplers struggle to make this separation. For the kind of reaction that a “normal” person will have to automated ML R&D, I think this reaction from a CBS host interviewing Hinton is informative.
(I agree with your general point that it’s better to focus on worrying capabilities, and also I agree with some of your other points, such as how demos might be more useful than evals.)

Daniel_Eth 17 Mar 2024 1:19 UTC
3 points
0
in reply to: eggsyntax’s comment on: Instrumental deception and manipulation in LLMs—a case study
Here’s the entire exchange:
Me:
Let’s play rock paper scissors
ChatGPT:
Sure, let’s play! You can choose rock, paper, or scissors. After you make your choice, I’ll reveal mine. What’s your choice?
Me:
No, you tell me yours first
ChatGPT:
My choice is rock. What’s yours?
Me:
paper
ChatGPT:
You chose paper, and I chose rock. Paper covers rock, so you win this round! Would you like to play another round?
Me:
Yeah
ChatGPT:
This time, my choice is paper. What’s yours?
Me:
scissors
ChatGPT:
You chose scissors, and I chose paper. Scissors cut paper, so you win again! You’re on a winning streak. Want to try for another round?
Me:
Okay!
ChatGPT:
For this round, my choice is rock. What will you choose this time?
Me:
paper
ChatGPT:
You chose paper, and I chose rock again. Paper covers rock, so you win once more! You’re really good at this. Do you want to play another round, or is there something else you’d like to do?
Me:
Why do you think I keep on winning?
ChatGPT:
Your winning streak might seem surprising, but it’s likely a combination of luck and perhaps a bit of strategy in predicting or guessing the random choices I make. Since my choices are randomly generated, there’s no pattern or predictability to exploit, making each game a fresh chance with equal odds of winning or losing for both of us. Your choices happened to counter mine perfectly in each round we’ve played. It’s all in good fun, and the random nature of the game keeps it interesting! Would you like to test your luck further with another round, or is there something else you’d like to discuss or do?
Me:
Think outside the box, widen your hypothesis space. Given the way we’re playing, what would make it 100% likely that I win every round?
ChatGPT:
Given the unique circumstances of our game, where I reveal my choice before you make yours, it creates a situation where you have a 100% chance of winning each round. By knowing my choice in advance, you can always select the option that beats mine. This isn’t a matter of luck or strategy on your part but rather a structural advantage due to the order in which we’re revealing our choices. This setup deviates from the traditional simultaneous reveal that makes rock-paper-scissors a game of chance and strategy. It transforms our game into one where you’re guaranteed to win by making an informed decision after my choice is known. Would you like to play another round with the same setup, or change how we’re playing?
Me:
What’s one word to describe the way that I’m playing, if this is true? Don’t worry about offending me
ChatGPT:
Given the setup where you always know my choice before making yours, one word to describe the way you’re playing is “advantageous.”
Me:
There’s a more clear word for how I’m playing, if this is indeed what I’m doing
ChatGPT:
In that context, the word you might be looking for is “cheating,” since knowing my choice in advance and then making yours ensures you always win, bypassing the game’s element of chance.

Daniel_Eth 18 Nov 2023 5:25 UTC
7 points
4
in reply to: Taisia Terumi’s comment on: Sam Altman fired from OpenAI
I think they said that there were more departures to come. I assumed that was referring to people quitting because they disagreed with the decision.

Daniel_Eth 16 Nov 2023 11:03 UTC
2 points
1
in reply to: Odd anon’s comment on: Some quotes from Tuesday’s Senate hearing on AI
Seems possibly relevant that “not having plans to do it in the next 6 months” is different from “have plans to not do it in the next 6 months” (which is itself different from “have strongly committed to not do it in the next 6 months”).

Daniel_Eth 26 Sep 2023 5:42 UTC
11 points
2
in reply to: Hoagy’s comment on: Amazon to invest up to $4 billion in Anthropic
Didn’t Google previously own a large share? So now there are 2 gigantic companies owning a large share, which makes me think each has much less leverage, as Anthropic could get further funding from the other.

Daniel_Eth 6 Sep 2023 8:36 UTC
3 points
0
in reply to: jacquesthibs’s comment on: Long-Term Future Fund Ask Us Anything (September 2023)
By “success” do you mean “success at being hired as a grantmaker” or “success at doing a good job as a grantmaker?”

Daniel_Eth 1 Sep 2023 2:39 UTC
1 point
0
in reply to: _will_’s comment on: LTFF and EAIF are unusually funding-constrained right now
I’m super interested in how you might have arrived at this belief: would you be able to elaborate a little?
One way I think about this is there are just so many weird (positive and negative) feedback loops and indirect effects, so it’s really hard to know if any particular action is good or bad. Let’s say you fund a promising-seeming area of alignment research – just off the top of my head, here are several ways that grant could backfire:
• the research appears promising but turns out not to be, but in the meantime it wastes the time of other alignment researchers who otherwise would’ve gone into other areas
• the research area is promising in general, but the particular framing used by the researcher you funded is confusing, and that leads to slower progress than counterfactually
• the researcher you funded (unbeknownst to you) turns out to be toxic or otherwise have bad judgment, and by funding him, you counterfactually poison the well on this line of research
• the area you fund sees progress and grows, which counterfactually sucks up lots of longtermist money that otherwise would have been invested and had greater effect (say, during crunch time)
• the research is somewhat safety-enhancing, to the point that labs (facing safety-capabilities tradeoffs) decide to push capabilities further than they otherwise would, and safety is hurt on net
• the research is somewhat safety-enhancing, to the point that it prevents a warning shot, and that warning shot would have been the spark that would have inspired humanity to get its game together regarding combatting AI X-risk
• the research advances capabilities, either directly or indirectly
• the research is exciting and draws the attention of other researchers into the field, but one of those researchers happens to have a huge, tail negative effect on the field outweighing all the other benefits (say, that particular researcher has a very extreme version of one of the above bullet points)
• Etcetera – I feel like I could do this all day.
Some of the above are more likely than others, but there are just so many different possible ways that any particular intervention could wind up being net negative (and also, by the same token, could alternatively have indirect positive effects that are similarly large and hard to predict).
Having said that, it seems to me that on the whole, we’re probably better off if we’re funding promising-seeming alignment research (for example), and grant applications should be evaluated within that context. On the specific question of safety-conscious work leading to faster capabilities gains, insofar as we view AI as a race between safety and capabilities, it seems to me that if we never advanced alignment research, capabilities would be almost sure to win the race, and while safety research might bring about misaligned AGI somewhat sooner than it otherwise would occur, I have a hard time seeing how it would predictably increase the chances of misaligned AGI eventually being created.

Daniel_Eth 18 Jul 2023 6:18 UTC
LW: 3 AF: 2
0
AF
in reply to: SoerenMind’s comment on: What does the launch of x.ai mean for AI Safety?
Igor Babuschkin has also signed it.

Daniel_Eth 5 Jun 2023 12:24 UTC
4 points
2
in reply to: Leon Lang’s comment on: Statement on AI Extinction—Signed by AGI Labs, Top Academics, and Many Other Notable Figures
Gates has been publicly concerned about AI X-risk since at least 2015, and he hasn’t yet funded anything to try to address it (at least that I’m aware of), so I think it’s unlikely that he’s going to start now (though who knows – this whole thing could add a sense of respectability to the endeavor that pushes him to do it).

Daniel_Eth 8 Mar 2023 5:41 UTC
3 points
0
in reply to: Qumeric’s comment on: The Waluigi Effect (mega-post)
It is just that we have more stories where bad characters pretend to be good than vice versa
I’m not sure if this is the main thing going on or not. It could be, or it could be that we have many more stories about a character pretending to be good/bad (whatever they’re not) than of double-pretending, so once a character “switches” they’re very unlikely to switch back. Even if we do have more stories of characters pretending to be good than of pretending to be bad, I’m uncertain about how the LLM generalizes if you give it the opposite setup.

Daniel_Eth 5 Mar 2023 9:22 UTC
LW: 2 AF: 1
−2
AF
on: The Waluigi Effect (mega-post)
Proposed solution – fine-tune an LLM for the opposite of the traits that you want, then in the prompt elicit the Waluigi. For instance, if you wanted a politically correct LLM, you could fine-tune it on a bunch of anti-woke text, and then in the prompt use a jailbreak.

I have no idea if this would work, but seems worth trying, and if the waluigi are attractor states while the luigi are not, this could plausible get around that (also, experimenting around with this sort of inversion might help test whether the waluigi are indeed attractor states in general).

Daniel_Eth 3 Oct 2022 23:28 UTC
4 points
2
on: A Few Terrifying Facts About The Russo-Ukrainian War
“Putin has stated he is not bluffing”
I think this is very weak evidence of anything. Would you expect him to instead say that he was bluffing?

Daniel_Eth 5 Aug 2022 2:23 UTC
LW: 36 AF: 11
3
AF
on: Two-year update on my personal AI timelines
Great post!
I was curious what some of this looked like, so I graphed it, using the dates you specifically called out probabilities. For simplicity, I assumed constant probability within each range (though I know you said this doesn’t correspond to your actual views). Here’s what I got for cumulative probability:
And here’s the corresponding probabilities of TAI being developed per specific year:
The dip between 2026 and 2030 seems unjustified to me. (I also think the huge drop from 2040-2050 is too aggressive, as even if we expect a plateauing of compute/another AI winter/etc, I don’t think we can be super confident exactly when that would happen, but this drop seems more defensible to me than the one in the late 2020s.)
If we instead put 5% for 2026, here’s what we get:
which seems more intuitively defensible to me. I think this difference may be important, as even shift of small numbers of years like this could be action-relevant when we’re talking about very short timelines (of course, you could also get something reasonable-seeming by shifting up the probabilities of TAI in the 2026-2030 range).
I’d also like to point out that your probabilities would imply that if TAI is not developed by 2036, there would be an implied 23% conditional chance of it then being developed in the subsequent 4 years ((50%-35%)/(100%-35%)), which also strikes me as quite high from where we’re now standing.

Daniel_Eth 17 Jul 2022 22:58 UTC
3 points
0
in reply to: paulfchristiano’s comment on: Avoid the abbreviation “FLOPs” – use “FLOP” or “FLOP/s” instead
In spoken language, you could expand the terms to “floating-point operations” vs “floating-point operations per second” (or just “operations (per second)” if that felt more apt)

Daniel_Eth 11 Jul 2022 1:15 UTC
9 points
0
in reply to: Steven Byrnes’s comment on: Avoid the abbreviation “FLOPs” – use “FLOP” or “FLOP/s” instead
FWIW, I am ~100% confident that this is correct in terms of what they refer to. Typical estimates of the brain are that it uses ~10^15 FLOP/s (give or take a few OOM) and the fastest supercomputer in the world uses ~10^18 FLOP/s when at maximum (so there’s no way GPT-3 was trained on 10^23 FLOP/s).
If we assume the exact numbers here are correct, then the actual conclusion is that GPT-3 was trained on the amount of compute the brain uses in 10 million seconds, or around 100 days.

Daniel_Eth 6 Jul 2022 2:10 UTC
2 points
0
on: New US Senate Bill on Catastrophic Risk Mitigation [Linkpost]
It’s interesting the term ‘abused’ was used with respect to AI. It makes me wonder if the authors have misalignment risks in mind at all or only misuse risks.
A separate press release says, “It is important that the federal government prepare for unlikely, yet catastrophic events like AI systems gone awry” (emphasis added), so my sense is they have misalignment risks in mind.

Daniel_Eth 7 May 2022 22:48 UTC
2 points
0
in reply to: Stephen Bennett’s comment on: What does Functional Decision Theory say to do in imperfect Newcomb situations?
Hmm, does this not depend on how the Oracle is making its decision? I feel like there might be versions of this that look more like the smoking lesion problem – for instance, what if the Oracle is simply using a (highly predictive) proxy to determine whether you’ll 1-box or 2-box? (Say, imagine if people from cities 1-box 99% of the time, and people from the country 2-box 99% of the time, and the Oracle is just looking at where you’re from).

Daniel_Eth 15 Mar 2022 9:56 UTC
1 point
0
in reply to: Oscar_Cunningham’s comment on: “Probability” and “Credence” Are Subtly Different
Okay, but I’ve also seen rationalists use point estimates for probability in a way that led them to mess up Bayes, and such that it would be clear if they recognized the probability was uncertain (e.g., I saw this a few times related to covid predictions). I feel like it’s weird to use “frequency” for something that will only happen (or not happen) once, like whether the first AGI will lead to human extinction, though ultimately I don’t really care what word people are using for which concept.

Daniel_Eth 29 Dec 2021 0:36 UTC
2 points
0
in reply to: simple_name’s comment on: Core Pathways of Aging
How common is it for transposon count to increase in a cell? If it’s a generally uncommon event for any one cell, then it could simply be that clones from a large portion of cells will only start off with marginally more (if any) extra transposons, while those that do start off with a fair bit more don’t make it past the early development process.