P.

Karma: 560

P.Mar 29, 2025, 11:22 AM
16 points
13
on: Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle
Sadly, by posting this here, you’ve added this puzzle to the training set of future models. Good benchmarks (e.g. ARC) keep the test set, or at least part of it, private.

P.Mar 17, 2025, 9:32 PM
3 points
2
on: I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?
Firstly, and perhaps most importantly, my advice on what not to do is not to try directly convincing politicians to pause or stop AGI development. A prerequisite for them to take actions drastic enough to actually matter is for them to understand how powerful AGI will truly become. And once that happens, even if they ban all AI development, unless they consider the arguments for doom to be extremely strong, which they won’t^[1], they will race and put truly enormous amounts of resources behind it, and that would be it for the species. Getting mid-sized business owners on board, on the other hand, might be a good idea due to the funding they could provide.
I don’t think any of the big donors are good enough, so if you want to donate to other people’s projects (or maybe become a co-founder), you could try finding interesting projects yourself on Manifund and the Nonlinear Network.
We know for a fact that alignment, at least for human-level intelligences, has a solution because people do actually care, at least in part, about each other. Therefore, it might be worth contacting Steven Byrnes and asking him whether he could usefully use more funding or what similar projects he recommends.
Outside AI, if the reason you care about existential risk isn’t because you want to save the species, but because human extinction implies a lot of people will die, you could try looking into chemical brain preservation and how cheap it is. This could itself be a source of revenue, and you probably won’t have any competitors (established cryonics orgs don’t offer cheap brain preservation and I have asked and Tomorrow Biostasis isn’t interested either).
I also personally have not completely terrible ideas for alignment research and weak (half an SD?) intelligence augmentation. If you’re interested, we can discuss them via DMs.
Finally, if you do fund intelligence augmentation research, please consider whether to keep it secret, if feasible.
1. ^
  Or maybe even if they do.

P.Sep 12, 2024, 8:57 PM
10 points
0
on: Refactoring cryonics as structural brain preservation
Does OBP plan to eventually expand their services outside the USA? And how much would it cost if you didn’t subsidize it? Cost is a common complaint about cryonics so I could see you becoming much bigger than the cryonics orgs, but judging by the website you look quite small. Do you know why that is?
What links here?
- P.'s comment on I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats? by shrimpy (Mar 17, 2025, 9:32 PM; 3 points)

P.May 25, 2024, 5:53 PM
10 points
0
on: Open Thread Spring 2024
Does anyone have advice on how I could work full-time on an alignment research agenda I have? It looks like trying to get a LTFF grant is the best option for this kind of thing, but if after working more time alone on it, it keeps looking like it could succeed, it’s likely that it would become too big for me alone, I would need help from other people, and that looks hard to get. So, any advice from anyone who’s been in a similar situation? Also, how does this compare with getting a job at an alignment org? Is there any org where I would have a comparable amount of freedom if my ideas are good enough?
Edit: It took way longer than I thought it would, but I’ve finally sent my first LTFF grant application! Now let’s just hope they understand it and think it is good.

P.Jan 12, 2024, 1:25 PM
7 points
2
on: Decent plan prize announcement (1 paragraph, $1k)
It depends on what you know about the model and the reason you have to be concerned in the first place (if it’s just “somehow”, that’s not very convincing).
You might be worried that training it leads to the emergence of inner-optimizers, be them ones that are somehow “trying” to be good at prediction in a way that might generalize to taking real-life actions, approximating the searchy part of the humans they are trying to predict, or just being RL agents. If you are just using basically standard architectures with a lot more compute, these all seem unlikely. But if I were you, I might try to test its ability to perform well in a domain it has never seen, where humans start by performing poorly but very quickly learn what to do (think about video games with new mechanics). If it does well, you have a qualitatively new thing on your hands, don’t deploy, study it instead. If a priori for some reason you think it could happen, and only a small subset of all the data is necessary to achieve that, do a smaller training run first with that data.
Or you might be worried about mostly external consequentialist cognition (think explicit textual it-then-elses). In that case, existing systems can already do it to some extent, and you should worry about how good its reasoning actually is, so perform capability evaluations. If it looks that there is some way of getting it to do novel research by any known method or that it’s getting close, don’t deploy, otherwise someone might figure out how to use it to do AI research, and then you get a singularity.
And in any case, you should worry about the effects your system will have on the AI race. Your AI might not be dangerous, but if it is a good enough lawyer or programmer that it starts getting many people out of their jobs, investment in AI research will increase a lot and someone will figure out how to create an actual AGI sooner than they would otherwise.
Edit: And obviously you should also test how useful it could be for people trying to do mundane harm (e.g. with existing pathogens) and, separately, there might not be a hard threshold on how good a model is at doing research that it starts being dangerous, so they might get there little by little and you would be contributing to that.
Edit in response to the second clarification: Downscale the relevant factors, like amount of training data, number of parameters and training time, or use a known-to-be-inferior architecture until the worrying capabilities go away. Otherwise, you need to solve the alignment problem.
Edit in response to Beth Barnes’s comment: You should probably have people reviewing outputs to check the model behaves well, but if you actually think you need measures like “1000 workers with technical undergrad degrees, paid $50/hr” because you are worried it somehow kills you, then you simply shouldn’t deploy it. It’s absurd to have the need to check whether a commercial product is an existential threat, or anything close to that.

P.Dec 3, 2023, 6:48 PM
18 points
1
on: 2023 Unofficial LessWrong Census/Survey
Done! There aren’t enough mysterious old wizards.

P.Nov 9, 2023, 8:35 PM
8 points
0
in reply to: Ben Pace’s comment on: Vote on Interesting Disagreements
You know of a technology that has at least a 10% chance of having a very big novel impact on the world (think the internet or ending malaria) that isn’t included in this list, very similar, or downstream from some element of it: AI, mind uploads, cryonics, human space travel, geo-engineering, gene drives, human intelligence augmentation, anti-aging, cancer cures, regenerative medicine, human genetic engineering, artificial pandemics, nuclear weapons, proper nanotech, very good lie detectors, prediction markets, other mind-altering drugs, cryptocurrency, better batteries, BCIs, nuclear fusion, better nuclear fission, better robots, AR, VR, room-temperature superconductors, quantum computers, polynomial time SAT solvers, cultured meat, solutions to antibiotic resistance, vaccines to some disease, optical computers, artificial wombs, de-extinction and graphene.
Bad options included just in case someone thinks they are good.

P.Nov 8, 2023, 10:10 PM
17 points
0
in reply to: Ben Pace’s comment on: Vote on Interesting Disagreements
Public mechanistic interpretability research is net positive in expectation.

P.Nov 8, 2023, 9:23 PM
2 points
0
in reply to: Ben Pace’s comment on: Vote on Interesting Disagreements
Cultural values are something like preferences over pairs of social environments and things we actually care about. So it makes sense to talk about jointly optimizing them.

P.Nov 8, 2023, 9:09 PM
7 points
−1
in reply to: Ben Pace’s comment on: Vote on Interesting Disagreements
If we had access to a brain upload (and maybe a world simulator too) we could in principle extract something like a utility function, and the theory behind it relates more to agents in general than it does to humans in particular.

P.Nov 8, 2023, 8:55 PM
10 points
0
in reply to: Ben Pace’s comment on: Vote on Interesting Disagreements
Research into getting a mechanistic understanding of the brain for purposes of at least one of: understanding how values/empathy works in people, brain uploading or improving cryonics/plastination is net positive and currently greatly underfunded.

P.Aug 8, 2023, 7:46 PM
2 points
−1
in reply to: Raemon’s comment on: Feedbackloop-first Rationality
Came here to comment that. It seems much more efficient to learn the cognitive strategies smart people use than to try to figure them out from scratch. Ideally, you would have people of different skill levels solve problems (and maybe even do research) while thinking out loud and describing or drawing the images they are manipulating. I know this has been done at least for chess, and it would be nice to have it for domains with more structure. Then you could catalog these strategies and measure the effectiveness of teaching the system 2 process (the whole process they use, not only the winning path) and explicitly train in isolation the individual system 1 steps that make it up.

P.Feb 4, 2023, 8:16 PM
3 points
0
in reply to: DragonGod’s comment on: Empathy as a natural consequence of learnt reward models
Also “indivudals”.

P.Nov 26, 2022, 2:34 PM
9 points
4
on: Why square errors?
Doesn’t minimizing the L1 norm correspond to performing MLE with laplacian errors?

P.Nov 11, 2022, 1:32 PM
2 points
0
on: Prizes for ML Safety Benchmark Ideas
Do you know whether this will be cancelled given the FTX situation?

P.Oct 4, 2022, 4:51 PM
2 points
0
in reply to: Razied’s comment on: Paper+Summary: OMNIGROK: GROKKING BEYOND ALGORITHMIC DATA
If the optimal norm is below the minimum you can achieve just by re-scaling, you are trading-off training set accuracy for weights with a smaller norm within each layer. It’s not that weird that the best known way of making this trade-off is by constrained optimization.

P. 29 Sep 2022 19:39 UTC
2 points
0
in reply to: P.’s comment on: Make-A-Video by Meta AI
And a 3D one by optimizing a differentiable volumetric representation using 2D diffusion: https://dreamfusionpaper.github.io/

P. 29 Sep 2022 19:03 UTC
2 points
1
on: Make-A-Video by Meta AI
And here we have another one: https://phenaki.video/

P. 29 Sep 2022 18:56 UTC
5 points
2
on: Resources to find/register the rationalists that specialize in a given topic?
It’s not quite what you want, but there’s this: https://forum.effectivealtruism.org/community#individuals and this: https://eahub.org/

P. 29 Sep 2022 17:52 UTC
4 points
1
on: Make-A-Video by Meta AI
Emad from Stability AI (the people behind Stable Diffusion) says that they will make a model better than this.