Kabir Kumar’s Shortform

Kabir Kumar3 Nov 2024 17:03 UTC

2 points

203 comments1 min readLW link

Kabir Kumar 2 Jul 2025 15:15 UTC
46 points
22
Has Tyler Cowen ever explicitly admitted to being wrong about anything?
Not ‘revised estimates’ or ‘updated predictions’ but ‘I was wrong’.
Every time I see him talk about learning something new, he always seems to be talking about how this vindicates what he said/thought before.
Gemini 2.5 pro didn’t seem to find anything, when I did a max reasoning budget search with url search on in aistudio.
EDIT: An example was found by Morpheus, of Tyler Cowen explictly saying he was wrong—see the comment and the linked PDF below
- Zach Stein-Perlman 3 Jul 2025 1:45 UTC
  26 points
  3
  Parent
  this is evidence that tyler cowen has never been wrong about anything
- Alfred Harwood 4 Jul 2025 8:52 UTC
  15 points
  3
  Parent
  In the post ‘Can economics change your mind?’ he has a list of examples where he has changed his mind due to evidence:
  1. Before 1982-1984, and the Swiss experience, I thought fixed money growth rules were a good idea. One problem (not the only problem) is that the implied interest rate volatility is too high, or exchange rate volatility in the Swiss case.
  2. Before witnessing China vs. Eastern Europe, I thought more rapid privatizations were almost always better. The correct answer depends on circumstance, and we are due to learn yet more about this as China attempts to reform its SOEs over the next five to ten years. I don’t consider this settled in the other direction either.
  3. The elasticity of investment with respect to real interest rates turns out to be fairly low in most situations and across most typical parameter values.
  4. In the 1990s, I thought information technology would be a definitely liberating, democratizing, and pro-liberty force. It seemed that more competition for resources, across borders, would improve economic policy around the entire world. Now this is far from clear.
  5. Given the greater ease of converting labor income into capital income, I no longer am so convinced that a zero rate of taxation on capital income is best.
  6. The social marginal value of health care is often quite low, much lower than I used to realize. By the way, hardly anyone takes this on consistently to guide their policy views, no matter how evidence-driven they may claim to be.
  7. Mormonism, and other relatively strict religions, can have big anti-poverty effects. I wouldn’t say I ever believed the contrary, but for a long time I simply didn’t give the question much attention. I now think that Mormonism has a better anti-poverty agenda than does the Progressive Left.
  8. There are positive excess returns to some momentum investment strategies.
  I don’t know enough about economics to tell how much these meet your criteria for ‘I was wrong’ rather than ‘revised estimates’ or something else (he doesn’t use the exact phrase ‘I was wrong’) but it seems in the spirit of what you are looking for.
- Morpheus 4 Jul 2025 12:00 UTC
  14 points
  2
  Parent
  Deep Research found this PDF. Search for “I was wrong” in the PDF.
  - Kabir Kumar 4 Jul 2025 12:26 UTC
    15 points
    4
    Parent
    This seems to be a really explicit example of him saying that he wss wrong about something, thank you!
    Didn’t think this would exist/be found, but glad I was wrong.
    - gjm 5 Jul 2025 0:14 UTC
      5 points
      1
      Parent
      It’s still pretty interesting if it turns out that the only clear example to be found of T.C. admitting to error is in a context where everyone involved is describing errors they’ve made: he’ll admit to concrete mistakes, but apparently only when admitting mistakes makes him look good rather than bad.
      (Though I kinda agree with one thing Joseph Miller says, or more precisely implies: perhaps it’s just really rare for people to say publicly that they were badly wrong about anything of substance, in which case it could be that T.C. has seldom done that but that this shouldn’t much change our opinion of him.)
      - Kabir Kumar 5 Jul 2025 9:18 UTC
        1 point
        0
        Parent
        Btw, for Slatestarcodex, found it in the first search, pretty easily.
        gjm 5 Jul 2025 23:13 UTC
        2 points
        1
        Parent
        Sure, but plausibly that’s Scott being unusually good at admitting error, rather than Tyler being unusually bad.
- Joseph Miller 3 Jul 2025 15:25 UTC
  7 points
  −10
  Parent
  Downvoted. This post feels kinda mean. Tyler Cowen has written a lot and done lots of podcasts—it doesn’t seem like anyone has actually checked? What’s the base rate for public intellectuals ever admitting they were wrong? Is it fair to single out Tyler Cowen?
  - Kabir Kumar 3 Jul 2025 22:14 UTC
    5 points
    0
    Parent
    It’s only one datapoint, but did a similar search for SlateStarCodex and almost immediately found him explictly saying he was wrong.
    It’s the title of a post, even: https://slatestarcodex.com/2018/11/06/preschool-i-was-wrong/
    In the post he also says:
    I’ve written before about how when you make an update of that scale, it’s important to publicly admit error before going on to justify yourself or say why you should be excused as basically right in principle or whatever, so let me say it: I was wrong about Head Start.
    That having been said, on to the self-justifications and excuses!
    And then makes a bunch of those.
    Again, this is only one datapoint—sorry for the laziness, it’s 11..12pm and I’m trying to organize an alignment research fellowship atm and just put together another alignment research team at ai plans and had to do management work for it which ended up delaying the fellowship announcement i wanted to do today and had family drama again. Sigh.
    Url link for slatestarcodex search: https://duckduckgo.com/?q=site%3Ahttps%3A%2F%2Fslatestarcodex.com%2F+%22I+was+wrong%22&t=brave&ia=web
- komlowneowna 4 Jul 2025 1:22 UTC
  5 points
  0
  Parent
  Lindsey: Okay, that’s a good answer. In your new book, GOAT, which we’ll talk about more later, one of the criteria you use for judging great economists is that they can’t have been too wrong about too many things. What’s an important thing that you now think you were dead wrong about?
  Cowen: Well, there’s so many things. It’s hard to know where to start. But for instance, in 2007, early part of 2008, I definitely thought the banking system was solvent. That was wrong. I then thought it was the result of a real estate bubble. Everyone leapt on that bandwagon. I now think that was wrong. I was wrong big time twice in a row. Given the way home prices have evolved, I don’t think it was much of a bubble. It was maybe a little ahead of its time, but those prices seem to have been validated. So here’s this event that I paid very close attention to and I’m already wrong twice in a row, and maybe I’m shooting for three times in a row wrong. So I don’t know. There are so many judgments of history that unfold slowly. I think it’s really hard to be sure that you are right about something.
  Like when shock therapy came for Poland, I thought, “Well, this is clearly the right thing to do.” I think it’s enough years. You can say it definitely worked for Poland. Has it worked everywhere? The places where it didn’t work, was it really tried? Was it possible in those places for it to be really tried? They’re very complicated questions, but I think I would have or not would have but did underrate the Chinese model at the time. But from 2023, there’s a point of view that says, well the Chinese model seemed great for 25 years but now they’re stuck with a dictator and all this terrible statism, and it might still blow up in their faces or cause a world war. So I think I’m wrong there but I could actually turn out to be right
  (source: https://brinklindsey.substack.com/p/interview-with-tyler-cowen)
  - Kabir Kumar 6 Jul 2025 11:24 UTC
    1 point
    0
    Parent
    Ok, I was going to say that’s a good one.
    But this line ruins it for me:
    So I think I’m wrong there but I could actually turn out to be right
    Thank you for searching and finding it though!! Do you think other public intellectuals might have more/less examples?
- Garrett Baker 2 Jul 2025 16:29 UTC
  2 points
  −17
  Parent
  He has mentioned the phrase a bunch. I haven’t looked through enough of these links enough to form an opinion though.
  - Kabir Kumar 2 Jul 2025 16:41 UTC
    32 points
    2
    Parent
    thank you for this search. Looking at the results, top 3 are by commentors.
    Then one about not thinking a short book could be this good.
    I don’t think this is Cowen actually saying he made a wrong prediction, just using it to express how the book is unexpectedly good at talking about a topic that might normally take longer, though happy to hear why I’m wrong here.
    Another commentor:
    another commentor:
    Ending here for now, doesn’t seem to be any real instances of Tyler Cowen saying he was wrong about something he thought was true yet.
    - Kabir Kumar 3 Jul 2025 22:18 UTC
      9 points
      0
      Parent
      Btw, I really dont have my mind set on this, if someone finds Tyler Cowen explictly saying he was wrong about something, please link it to me—you dont have to give an explanation to justify it, to prepare for some confirmation biasy ‘here’s why I was actually right and this isnt it’ thing (though, any opinions/thoughts are very welcome), please feel free to just give a link or mention some post/moment.
Kabir Kumar 23 Apr 2025 10:57 UTC
25 points
1
So, apparently, I’m stupid. I could have been making money this whole time, but I was scared to ask for it
i’ve been giving a bunch of people and businesses advice on how to do their research and stuff. one of them messaged me, i was feeling tired and had so many other things to do. said my time is busy.
then thought fuck it, said if they’re ok with a $15 an hour consulting fee, we can have a call. baffled, they said yes.
then realized, oh wait, i have multiple years of experience now leading dev teams, ai research teams, organizing research hackathons and getting frontier research done.
wtf
- Nisan 23 Apr 2025 14:57 UTC
  16 points
  15
  Parent
  Yes, you can ask for a lot more than that :)
  - Kabir Kumar 23 Apr 2025 17:42 UTC
    4 points
    0
    Parent
    Yeah, a friend told me this was low—I’m just scared of asking for money rn I guess.
    I do see people who seem very incompetent getting paid as consultants, so I guess I can charge for more. I’ll see how much my time gets eaten by this and how much money I need. I want to buy some gpus, hopefully this can help.
    - shawnghu 24 Apr 2025 2:31 UTC
      1 point
      0
      Parent
      I’m not trying to be derisive; in fact, I relate to you greatly. But it’s by being on the outside that I’m able to levy a few more direct criticisms:
      
      Were you not paid for the other work that you did, leading dev teams and getting frontier research done? Those things should be a baseline on the worth of your time.
      
      If that, have you ever tried to maximize the amount of money you can get the) other people to acknowledge your time as worth (ie, get a high salary offer)?
      
      Separately, do you know the going rate for consultants with approximately your expertise? Or any other reference class you cna make up. Consulting can cost an incredible amount of money, and that price can be “fair” in a pretty simple sense if it averts the need to do 10s of hours of labor at high wages. It may be one of the highest leverage activities per unit time that exists as a conventional economic activity that a person can simply do.
      
      Aside from market rates or whatever, I suggest you just try asking for unreasonable things, or more money than you feel you’re worth (think of it as an experiment, and maybe observe what happens in your mind when you flinch from this).
      
      Do you have any emotional hangup about the prospect of trading money for labor generally, or money for anything?
      
      Separately, do you have a hard time asserting your worth to others (or maybe just strangers) on some baseline level?
      - Kabir Kumar 9 Jun 2025 9:12 UTC
        1 point
        0
        Parent
        Were you not paid for the other work that you did, leading dev teams and getting frontier research done? Those things should be a baseline on the worth of your time.
        This was running AI Plans, my startup, so makes sense that I wasn’t getting paid, since the same hesitancy for asking for money leads to hesitancy to do that exaggeration thing many AI Safety/EA people seem to do when making funding applications. Also, I don’t like to make the funding applications, or long applications in general.
        If that, have you ever tried to maximize the amount of money you can get the) other people to acknowledge your time as worth (ie, get a high salary offer)?
        I think every time I’ve asked for money, I’ve tried to ask for the lowest amount I can.
        Separately, do you know the going rate for consultants with approximately your expertise? Or any other reference class you cna make up. Consulting can cost an incredible amount of money, and that price can be “fair” in a pretty simple sense if it averts the need to do 10s of hours of labor at high wages. It may be one of the highest leverage activities per unit time that exists as a conventional economic activity that a person can simply do.
        I don’t know—I have a doc of stuff I’ve done that I paste into LLMs when I need to make a funding applications and stuff—just pasted it into Gemini 2.5 Pro and asked what would be a reasonable hourly fee and it said $200 to $400 an hour.
        Aside from market rates or whatever, I suggest you just try asking for unreasonable things, or more money than you feel you’re worth (think of it as an experiment, and maybe observe what happens in your mind when you flinch from this).
        I’ll give it a go—I’ve currently put the asking price on my call link for $50 an hour, feel nervous about actually asking for that though. I need to make a funding application for AI Plans—I can ask for money on behalf of others on the team, but asking for money to be donated so I can get a high salary feels scary. Happy to ask for a high salary for others on the team though, since I want them to get paid what they need.
        Do you have any emotional hangup about the prospect of trading money for labor generally, or money for anything?
        Yeah, I do. Generally, I’m used to doing a lot of free work for family and getting admonished when I ask for money. And when I did get promised money, it was either wayyy below market price or wayy late or didn’t get paid at all. General experience with family was my work not being valued even when I put in extra effort. I’m aware that’s wrong and has taught me wrong lessons, but not fully learnt the true ones yet.
        shawnghu 11 Jun 2025 1:22 UTC
        1 point
        0
        Parent
        I do think that $200-$400 seem like reasonable consulting rates.
        
        I think the situations with family are complicated, because sure, there are social/cultural reasons one might be expected to do those things for family. Usually people hold those cultural norms alongside a stronger distinction between the ingroup (family) and the outgroup (all other people by default), though, so letting your impressions from that culture teach you things about how to behave in a culture with a weaker distinction might be maladaptive.
        
        (I actually was suggesting you try asking for objectively completely unreasonable things just to look at the flinch. For example, you could ask a stranger for $100 for no reason. They would say no, but no harm would be done.)
        
        One frame that might be useful to you is that in a way, it is imperative to at least sufficiently assert your value to others (if not overassert it the socially expected amount). An overly modest estimate is still a miscalibrated one, and people will make suboptimal decisions as a result. (Putting aside the behavior and surpluses given to other people, you are also a player in this game, and your being underallocated resources is globally suboptimal.)
- Viliam 24 Apr 2025 8:54 UTC
  3 points
  7
  Parent
  Ah, I can totally relate to this. Whenever I think about asking for money, the Impostor Syndrome gets extra strong. Meanwhile, there are actual impostors out there collecting tons of money without any shame. (Though they may have better social skills, which is probably the category of skill that ultimately gets paid best.)
  Another important lesson I got once, which might be useful for you at some moment: “If you double your prices, and lose half of your customers as a result, you will still get the same amount of money, but only work half as much.”
  Also, speaking from my personal experience, the relation between how much / how difficult work someone wants you to do, and how much they are willing to pay you, seems completely random. One might naively expect that a job that pays more will be more difficult, but often it is the other way round.
- Kabir Kumar 26 Apr 2025 12:33 UTC
  1 point
  0
  Parent
  Update—consulting went well. He said he was happy with it and got a lot of useful stuff. I was upfront with the fact that I just made up the $15 an hour and might change it, asked him what he’d be happy with, he said it’s up to me, but didn’t seem bothered at all at the price potentially changing.
  I was upfront about the stuff I didn’t know and was kinda surprised at how much I was able to contribute, even knowing that I underestimate my technical knowledge because I barely know how to code.
Kabir Kumar 8 Jun 2025 16:57 UTC
22 points
1
if someone who’s v good at math wants to do some agent foundations stuff to directly tackle the hard part of alignement, what should they do?
- Gurkenglas 8 Jun 2025 21:50 UTC
  18 points
  6
  Parent
  If they’re talented, look for a way to search over search processes without incurring the unbounded loss that would result by default.
  If they’re educated, skim the existing MIRI work and see if any results can be stolen from their own field.
  - the gears to ascension 9 Jun 2025 0:54 UTC
    15 points
    4
    Parent
    I currently think we’re mostly interested in properties that apply at all timesteps, or at least “quickly”, as well as in the limit; rather than only in the limit. I also think it may be easier to get a limit at all by first showing quickness, in this case, but not at all sure of that.
- TsviBT 9 Jun 2025 6:07 UTC
  9 points
  5
  Parent
  The actual hard parts? Math probably doesn’t help much directly, unfortunately. Mathematical thinking is good. You’ll have to learn how to think in novel ways, so there’s not even a vector anyone can point you in, except for pointers with a whole lot of “dereference not included” like “figure out how to understand the fundamental forces involved in what actually determines what a mind ends up trying to do long term” (https://tsvibt.blogspot.com/2023/04/fundamental-question-what-determines.html).
  
  Some of the problems: https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html A meta-philosophy discussion of what might work: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html
- Hastings 9 Jun 2025 13:23 UTC
  3 points
  2
  Parent
  If you are capable of meaningfully pushing capabilities forward and doing literally anything else, that’s already pretty helpful.
Kabir Kumar 18 Jul 2025 21:40 UTC
9 points
0
Hi, I’m running AI Plans, an alignment research lab. We’ve run research events attended by people from OpenAI, DeepMind, MIRI, AMD, Meta, Google, JPMorganChase and more. And had several alignment breakthroughs, including a team finding out that LLMs are maximizers, one of the first Interpretability based evals for LLMs, finding how to cheat every AI Safety eval that relies on APIs and several more.
We currently have 2 in house research teams, one who’s finding out which post training methods actually work to get the values we want into the models and another who’s reward hacking hundreds of AI Safety Evals.
However, we’re in the very weird position of doing a lot of very useful stuff and getting a lot of very elite people applying to and taking part in events, but communicating that absolutely terribly. If you go to our website atm, none of this stuff is obvious at all.
This is partly because we’re trying to make a peer review platform for alignment research in addition to running alignment research events and also because I personally really suck at design.
I’m looking for someone who doesn’t—someone who really understands how to communicate this stuff and make it look good and can dedicate several hours a week, consistently, for at least 3 months. If this is you, please either me or reach out at kabir@ai-plans.com
Please feel free to recommend to others as well.
Thank you,
Kabir
Kabir Kumar 14 Sep 2025 19:22 UTC
8 points
0
Hi, I’m hosting an AI Safety Law-a-Thon on October 25th to 26th. Will be pairing up AI Safety researchers with lawyers to share knowledge and brainstorm risk scenarios. If you’ve ever talked/argued about p doom and know what a mesaoptimizer is, then you’ve already done something very similar to this.
Main difference here is that you’ll be able to reduce p doom in this one! Many of the lawyers taking part are from top, multi-billion dollar companies, advisors to governments, etc. And they know essentially nothing about alignment. You might be concerned that you don’t have some legal expertise that you think you need to contribute. You do **not** need any more knowledge than you already have. I guarantee you that if you’ve read more than 2 AI Safety papers then you can absolutely contribute a lot!
We’re giving free in person tickets to all AI Safety researchers: https://luma.com/8hv5n7t0
Please register. This is a chance to massive reduce the money going to OpenAI and other top labs by billions of dollars, increase the investment to AI Safety as a whole and for you to learn a way to communicate what you already know about alignment risks in a way that companies will pay a *lot* for. And to make contacts with people at the companies who will pay for it.
- Cole Wyeth 14 Sep 2025 23:09 UTC
  4 points
  0
  Parent
  What kind of knowledge specifically are these lawyers looking for?
  - Kabir Kumar 15 Sep 2025 21:43 UTC
    1 point
    0
    Parent
    When signing an enterprise contract with OpenAI, almost all the liability is passed onto them. What are specific risk scenarios/damages that they could face, which they can use to build a countersuit.
    - Kabir Kumar 15 Sep 2025 21:47 UTC
      1 point
      0
      Parent
      Potentially, also for justify for negotiating a better contract, either with OpenAI (unlikely, since OpenAI seems to very rarely negotiate) or another AI company that takes more of the liability (which requires increasing funding for safety, evals, etc).
      Or, seeing if there are non AI solutions that can do what they want (e.g. a senior person at a rail company sincerely asked me ‘we need to copy and paste stuff from our CRM to Excel a lot, do you think an AI Agent could help with that?‘). Had a few interactions like this. It seems that for a lot of businesses atm, what they are spending on ‘AI solutions’ can be done cheaper, faster and more reliably with normal software, but they don’t really know what software is.
      - Cole Wyeth 15 Sep 2025 22:02 UTC
        2 points
        −5
        Parent
        I don’t know that we have much expertise on this sort of thing—we’re mostly worried about X-risk, which it doesn’t really make sense to talk about liability for in a legal sense.
Kabir Kumar 6 Jul 2025 11:31 UTC
8 points
−1
Maybe there’s a filtering effect for public intellectuals.
If you only ever talk about things you really know a lot about, unless that thing is very interesting or you yourself are something that gets a lot of attention (e.g. a polyamorous cam girl who’s very good at statistics, a Muslim Socialist running for mayor in the world’s richest city, etc), you probably won’t become a ‘public intellectual’.
And if you venture out of that and always admit it when you get something wrong, explicitly, or you don’t have an area of speciality and admit to getting things wrong all the time, maybe there’s a cap to how much of a ‘public intellectual’ you can become?
After all, maybe CNN, MSNBC, etc, don’t want to risk having someone on their program who’s likely to say that something they said, and the program broadcasted, was wrong?
Maybe less articles cite them as a source?
- Decaeneus 7 Jul 2025 14:30 UTC
  3 points
  1
  Parent
  One can say that being intellectually honest, which often comes packaged with being transparent about the messiness and nuance of things, is anti-memetic.
- sam 7 Jul 2025 14:26 UTC
  3 points
  0
  Parent
  Seems to rhyme with the criticism of pundits in Superforecasting
  i.e. (iirc), most high profile pundits make general, sweeping, dramatic sounding statements that make good TV but are difficult to falsify after the fact
Kabir Kumar 10 Mar 2025 13:19 UTC
8 points
−1
it’s so unnecessarily hard to get funding in alignment.
they say ‘Don’t Bullshit’ but what that actually means is ‘Only do our specific kind of bullshit’.
and they don’t specify because they want to pretend that they don’t have their own bullshit
- Dagon 10 Mar 2025 13:57 UTC
  5 points
  0
  Parent
  This seems generally applicable. Any significant money transaction includes expectations, both legible and il-, which some participants will classify as bullshit. Those holding the expectations may believe it to be legitimately useful, or semi-legitimately necessary due to lack of perfect alignment.
  If you want to specify a bit, we can probably guess at why it’s being required.
  - Kabir Kumar 10 Mar 2025 16:02 UTC
    4 points
    1
    Parent
    What I liked about applying for VC funding was the specific questions.
    “How is this going to make money?”
    “What proof do you have this is going to make money”
    and it being clear the bullshit that they wanted was numbers, testimonials from paying customers, unambiguous ways the product was actually better, etc. And then standard bs about progress, security, avoiding weird wibbly wobbly talk, ‘woke’, ‘safety’, etc.
    With Alignment funders, they really obviously have language they’re looking for as well, or language that makes them more and less willing to put more effort into understanding the proposal. Actually, they have it more than the VCs. But they act as if they don’t.
- Eva Lu 10 Mar 2025 22:01 UTC
  1 point
  0
  Parent
  Have you felt this from your own experience trying to get funding, or from others, or both? Also, I’m curious what you think is their specific kind of bullshit, and if there’s things you think are real but others thought to be bullshit.
  - Kabir Kumar 11 Mar 2025 12:54 UTC
    1 point
    0
    Parent
    Both. Not sure, its something like lesswrong/EA speak mixed with the VC speak.
    - Kabir Kumar 17 Apr 2025 21:35 UTC
      1 point
      0
      Parent
      If I knew the specific bs, I’d be better at making successful applications and less intensely frustrated.
Kabir Kumar 18 Jun 2026 10:15 UTC
6 points
5
btw, yud failure mode and something i see in a lot of rats:
Lacking humility and lacking the propensity to take time to understand and appreciate that which isnt immediately obviously valuable—especially things that take reworking ones interpretation and realising that ones first interpretation was wrong.
It’s a humility + path dependant interpretation failure solution: when having a feeling of dismissal and high confidence on something from little data, noticing that and training this sense—and also training the sense of noticing when a strong feeling/interpretation is downstream of just one particular viewpoint—similarly to noticing when one either loves/hates a person/thing just because of one person telling them lots and lots of good/bad things about it. But instead of the source being just one person, its one or two types of interpretation.
Combines quite strongly with this: https://www.lesswrong.com/posts/arwATwCTscahYwTzD/the-most-common-bad-argument-in-these-parts
- StanislavKrym 18 Jun 2026 12:00 UTC
  1 point
  0
  Parent
  I wonder on which sense Yudkowsky’s first interpretation was bad. When his long list of reasons for AGI to be deadly was reevaluated in 2026, the most load-bearing crux from the point 25 stayed unsolved and Yudkowsky’s other disproven ideas didn’t seem enough to super-reliably avert the disaster.
Kabir Kumar 21 Sep 2025 14:49 UTC
6 points
3
Working on a meta plan for solving alignment, I’d appreciate feedback & criticism please—the more precise the better. Feel free to use the emojis reactions if writing a reply you’d be happy with feels taxing.
Diagram for visualization—items in tables are just stand-ins, any ratings and numbers are just for illustration, not actual rankings or scores at this moment.

Red and Blue teaming Alignment evals
Make lots of red teaming methods to reward hack alignment evals
Use this to find actually useful alignment evals, then red team and reward hack them too, find better methods of reward hacking and red teaming, then finding better ways of doing alignment evals that are resistant to that and actually find the preferences of the model reliably too—keep doing this constantly to have better and better alignment evals and alignment evals reward hacking methods. Also hosting events for other people to do this and learn to do this.

Alignment methods
then doing these different post training methods, implementing lots of different alignment methods, seeing which ones score how highly across the alignment evals we know are robust, using this to find patterns in which types seem to work more/less,

Theory
then doing theory work, to try to figure out/guess why those methods are working more/less, make some hypothesises about new methods that willl work better and new methods that will work worse, based on this,
then ranking the theory work based on how minimal the assumptions are, how well predicted the implementations are and how high they score on peer review
End goal:
Alignment evals that no one can find a way to reward hack/red team and that really precisely measure the preferences of the models.
Alignment methods that seem to score a high score on these evals.
A very strong theoretical understanding for what makes this alignment method work, *why* it actually learns the preferences, theory understanding how those preferences will/won’t scale to result in futures where everyone dies or lives. The theoretical work should have as few assumptions as possible. Aim is a mathematical proof with minimal assumptions, that’s written very clearly and is easy to understand so that lots and lots of people can understand it and criticize it—robustness through obfuscation is a method of deception, intentional or not.
Current Work:
Hosting Alignment Evals hackathons and making Alignment Evals guides, to make better alignment evals and red teaming methods.
Making lots of Qwen model versions, whose only difference is the post training method.
What links here?
- Kabir_Kumar's comment on What posts are you thinking about writing? by Toby Tremlett🔹 (EA Forum; 10 Oct 2025 14:46 UTC; 1 point)
- MichaelDickens 10 Oct 2025 15:21 UTC
  7 points
  1
  Parent
  This plan is pretty abstract (which is necessary because it’s short) but in some ways I think it’s better than any of the AI companies’ published 400-page plans. From what I’ve seen, AI companies don’t care enough about trying to break their own evals, and they don’t care enough about theory work.
  
  Maybe this is too in the weeds but I’m skeptical that we can create robust alignment evals using anything resembling current methods. A superintelligent AI will be better at breaking evals than humans, so I expect there is a big gap between “our top researchers have tried and failed to find any loopholes in our alignment evals” and “a superintelligence will not be able to find any loopholes”.
  
  AI companies would probably answer that they only need to align weaker models and then bootstrap to an aligned superintelligence. You didn’t talk about that in your meta plan though. I think it would be good for you to talk about that in your meta-plan, since it’s what every AI company (with an alignment team) currently plans on doing.
  - Kabir Kumar 10 Oct 2025 16:35 UTC
    3 points
    4
    Parent
    
    AI companies would probably answer that they only need to align weaker models and then bootstrap to an aligned superintelligence. You didn’t talk about that in your meta plan though.
    I’ve seen that and the claims to do that. This seems to essentially be horseshit though. Also, what i read of the ‘superalignment’ paper claiming this seemed to be basically a method to get better synthetic data while vibing out about safety to make employees and recruits feel good.
    - MichaelDickens 10 Oct 2025 18:48 UTC
      3 points
      0
      Parent
      Yeah I agree that it’s not a good plan. I just think that if you’re proposing your own plan, your plan should at least mention the “standard” plan and why you prefer to do something different. Like give some commentary on why you don’t think alignment bootstrapping is a solution. (And I would probably agree with your commentary.)
  - Kabir Kumar 10 Oct 2025 16:33 UTC
    1 point
    0
    Parent
    A superintelligent AI will be better at breaking evals than humans, so I expect there is a big gap between “our top researchers have tried and failed to find any loopholes in our alignment evals” and “a superintelligence will not be able to find any loopholes”
    Agreed. This is why this plan is ultimately just a tool to get good/useful theory work done faster, more efficiently, etc.
  - Kabir Kumar 10 Oct 2025 16:27 UTC
    1 point
    0
    Parent
    Maybe this is too in the weeds but I’m skeptical that we can create robust alignment evals using anything resembling current methods.
    I agree. This is why we need to make new methods. e.g. in Jan, a team in our hackathon made one of the first interp based evals for llms: https://github.com/gpiat/AIAE-AbliterationBench/
    they were pretty cracked (researcher at jp morgan chase, ai phd, very smart high schooler), but i think its doable by others to do this and make other new methods too.
    very unfinished, but making an evals course to make this easier, which will be used in an evals course at an ivy league uni https://docs.google.com/document/d/1_95M3DeBrGcBo8yoWF1XHxpUWSlH3hJ1fQs5p62zdHE/edit?tab=t.0
    - Kabir Kumar 10 Oct 2025 16:32 UTC
      1 point
      0
      Parent
      lots of flaws in the plan is that we need to specify what things mean and are going to need to improve, iterate on and better explain and justify the specification. e.g. wtf does it actually mean for an eval to be red teamed. what counts??
      btw, cool thing to see is that we’re inspiring others to also red team evals: https://anloehr.github.io/ai-projects/salad/RedTeaming.SALAD.pdf
- Kabir Kumar 14 Oct 2025 13:15 UTC
  1 point
  0
  Parent
  One of the major problems with this atm is that most ‘alignment’, ‘safety’, etc evals dont specify or define exactly what they’re trying to measure.
  - Kabir Kumar 14 Oct 2025 13:15 UTC
    1 point
    0
    Parent
    so for this and other reasons, its hard to say when an eval has been truly successfully ‘red teamed’
Kabir Kumar 20 Sep 2025 18:40 UTC
6 points
3
Kabir Kumar 24 Apr 2025 19:50 UTC
6 points
−3
For AI Safety funders/regranters—e.g. Open Phil, Manifund, etc:
It seems like a lot of the grants are swayed by ‘big names’ being on there. I suggest making anonymity compulsary if you want to more merit based funding, that explores wider possibilities and invests in more upcoming things.
Treat it like a Science rather than the Bragging Competition it currently is.
A Bias Pattern atm seems to be that the same people get funding, or recommended funding by the same people, leading to the number of innovators being very small, or growing much more slowly than if the process was anonymised.
Also, ask people seeking funding to make specific, unambiguous, easily falsiable predictions of positive outcomes from their work. And track and follow up on this!
- Steven Byrnes 24 Apr 2025 22:40 UTC
  9 points
  3
  Parent
  I suggest making anonymity compulsary
  It’s an interesting idea, but the track records of the grantees are important information, right? And if the track record includes, say, a previous paper that the funder has already read, then you can’t submit the paper with author names redacted.
  Also, ask people seeking funding to make specific, unambiguous, easily falsiable predictions of positive outcomes from their work. And track and follow up on this!
  Wouldn’t it be better for the funder to just say “if I’m going to fund Group X for Y months / years of work, I should see what X actually accomplished in the last Y months / years, and assume it will be vaguely similar”? And if Group X has no comparable past experience, then fine, but that equally means that you have no basis for believing their predictions right now.
  Also, what if someone predicts that they’ll do A, but then realizes it would be better if they did B? Two possibilities are: (1) You the funder trust their judgment. Then you shouldn’t be putting even minor mental barriers in the way of their pivoting. Pivoting is hard and very good and important! (2) You the funder don’t particular trust the recipient’s judgment, you were only funding it because you wanted that specific deliverable. But then the normal procedure is that the funder and recipient work together to determine the deliverables that the funder wants and that the recipient is able to provide. Like, if I’m funding someone to build a database of AI safety papers, then I wouldn’t ask them to “make falsifiable predictions about the outcomes from their work”, instead I would negotiate a contract with them that says they’re gonna build the database. Right? I mean, I guess you could call that a falsifiable prediction, of sorts, but it’s a funny way to talk about it.
  - Kabir Kumar 21 Aug 2025 20:39 UTC
    10 points
    2
    Parent
    A solution I’ve come around to for this is retroactive funding. As in, if someone did something essentially without funding, that resulted in outcomes, which if you knew were guaranteed, you would have funded/donated to the project, then donate to the person to encourage them to do it more.
    - Kabir Kumar 21 Aug 2025 20:40 UTC
      3 points
      1
      Parent
      I think this is easier to anonymize, with the exception of very specific things that people become famous for.
- samuelshadrach 23 Aug 2025 8:59 UTC
  1 point
  0
  Parent
  EA billionaires selects way more for alignment to the billionaires’ own ends, than for competence. Social networks are a good way to filter for this. This is intentional design on the part of Moswkowitz and Tallinn, don’t treat it as an innocent mistake on their part.
  - [ ]
    [deleted]
Kabir Kumar 23 Apr 2025 2:30 UTC
6 points
2
i earnt more from working at a call center for about 3 months than i have in 2+ years of working in ai safety.
And i’ve worked much harder in this than I did at the call center
- Katalina Hernandez 23 Apr 2025 11:16 UTC
  1 point
  0
  Parent
  Way to go! :D. The important thing is that you’ve realized it. If you naturally already get those enquiries, you’re halfway there: people already know you and reach out to you without you having to promote your expertise. Best of luck!
- Knight Lee 23 Apr 2025 14:49 UTC
  0 points
  0
  Parent
  :) the real money was the friends we made along the way.
  I dropped out of a math MSc. at a top university in order to spend time learning about AI safety. I haven’t made a single dollar and now I’m working as a part time cashier, but that’s okay.
  What use is money if you end up getting turned into paperclips?
  PS: do you want to sign my open letter asking for more alignment funding?
Kabir Kumar 22 Oct 2025 11:28 UTC
5 points
0
if serious about us china cooperation and not cargo culting, please read: https://www.cac.gov.cn/2025-09/15/c_1759653448369123.htm
Kabir Kumar 18 Aug 2025 16:23 UTC
5 points
0
i messed things up a lot in organizing the moonshot program. working hard to make sure the future events will be much better. lots of things i can do.
- Luiza 18 Aug 2025 16:38 UTC
  12 points
  0
  Parent
  What did you mess up?
  - Kabir Kumar 20 Aug 2025 19:49 UTC
    5 points
    0
    Parent
    announced it too late, underestimated how hard it would be to make the research guides, guaranteed personalized feedback to the first 300 applicants, also had the date for starting set too early, considering how much stuff was ready.
    - Cole Wyeth 20 Aug 2025 19:53 UTC
      5 points
      3
      Parent
      Guaranteeing personalized feedback for 300 people does seem like a massive burden.
      - Luiza 20 Aug 2025 22:03 UTC
        1 point
        0
        Parent
        Depending on what the application involved, would it work to anon them, make them public and sort of let the community give feedback?
- Kabir Kumar 23 Nov 2025 17:54 UTC
  1 point
  0
  Parent
  The reasons for this and what I’m going to do to make sure it doesn’t happen again:
  One:
  - Promised that the first 300 applicants would be guaranteed personalized feedback. Thought that I could delegate to other, more technical members of the team for this.
  However, it turned out that in order to give useful feedback and to be able to judge if someone was a good fit for the program, a person didn’t just need technical knowledge—they needed good communication skills, an understanding of what’s needed for alignment research, being consistently available for several hours a day, to actually go through the applications and being interested in doing so. Turned out that the only person who fit all those characteristics at the time was me. So I couldn’t delegate.
  Also, a teammate made a reviewing software, which he said would help build a Bradley-Terry model of the applicants, as we reviewed them. I had a feeling that this might be overcomplicated but didn’t want to say no or react negatively to someone’s enthusiasm for doing free work for something I care about.
  It turned out that constantly fixing, trying to improve, finangle with, etc, the software actually took several days. And it was faster to just do it manually.
  What I’ll be doing next time to make sure this doesn’t happen:
  - Only promising feedback to the first 50 applicants.
  - Having preprepared lines for the rest, with the general reason they weren’t accepted—e.g. lack of suffifient maths experience without software engineering/neuroscience/philosophy to compensate, meaning that they might not be likely to get useful alignment theory work done in 5 weeks.
  - Doing things manually, not experimenting with custom software last minute.
  - Announcing the program more early—giving ourselves at least 3 months to prepare things.
  Two:
  - making the Research Guides for the different tracks turned out to be much, much, much harder than I thought it would be. Including for other, more technical teammates. Thought that making just a high level guide would be relatively ok, but instead turned out there was lots and lots of reading to do, lots and lot of preliminary reading and maths learning to do to understand that and it was very hard. This also delayed the start of the Moonshot Alignment Program a lot.
  What I’ll be doing next time to make sure that this doesn’t happen:
  - Starting out with just a reading list and links to things like https://www.alignmentforum.org/s/n7qFxakSnxGuvmYAX, https://drive.google.com/file/d/1aKvftxhG_NL2kfG3tNmtM8a2y9Q1wFHb/view (source with comments: https://www.lesswrong.com/posts/PRwQ6eMaEkTX2uks3/infra-exercises-part-1), etc
  - Personally learning a lot more math
  - Having a draft of the reading list—
  Asking for help from more experienced alignment researchers, such as Gurkenglas, the_gears_of_ascension, Lorxus, etc, earlier
  Major changes I’ve made since:
  - brought on a teammate, who is looking to become a co founder, who is very competent, well organized, with a good technical foundation
  - learnt more math and alignment philosophy
  - brought a much more technical person on the team (Gurkenglas), who is also teaching me a lot and pointing out lots and lots of flaws in my ideas, updating me fast
  - changed my management style at ai plans—no longer having weekly progress meetings, trying to manage stuff myself on linear or other task management type stuff—instead, just having a one on one call with each teammate once a week, to learn about what they want to do, what problems they’re having, what is available for them to do and deciding what they’ll do
  - moved to CEEALAR, much, much, much (1/2, total of 2.5) better for my stress, axiety, insecurity, mental health, etc.
  - from this, also gotten friendships/contacts with more senior alignment researchers who i can and will ask for help
Kabir Kumar 6 Jul 2025 20:48 UTC
5 points
0
Do you think you can steal someone’s parking spot?
If yes, what exactly do you think you’re stealing?
- Archimedes 6 Jul 2025 21:40 UTC
  11 points
  5
  Parent
  You’re “stealing” their opportunity to use that space. In legal terms, assuming they had a right to the spot, you’d be committing an unauthorized use of their property, causing deprivation of benefit or interference with use.
  - Kabir Kumar 6 Jul 2025 21:51 UTC
    1 point
    3
    Parent
    Makes sense. Do you think it’s stealing to train on someone’s data/work without their permission? This isn’t a ‘gotcha’, btw—if you think it’s not, I want to know and understand.
    - Archimedes 6 Jul 2025 22:28 UTC
      2 points
      0
      Parent
      I don’t think there’s a simple answer to that. My instinct is that most publicly accessible material (not behind a paywall) is largely “fair use”, but it gets messier for things like books not yet in the public domain. LLM pre-training is both transformative and extractive.
      
      There is no sensible licensing infrastructure for this yet, AFAIK, so many companies are grabbing whatever they can and dealing with legalities later. I think, at minimum, they should pay some upfront fee to train on a copyrighted book, just like humans do when they buy rather than pirate or borrow from libraries.
    - CstineSublime 7 Jul 2025 1:00 UTC
      1 point
      0
      Parent
      What prior reading have you done on this question? I did a DDG search “AI duplicating artists style controversy” and have found dozens of journalism pieces which appear, for the most part, seem to be arguing broadly with “it is theft”. What is your understanding of the discourse on this at the moment? What have you read? What has been persuasive? What don’t you understand?
      - Kabir Kumar 7 Jul 2025 1:50 UTC
        1 point
        0
        Parent
        The main thing I don’t understand is the full thought processes that leads to not seeing this as stealing opportunity from artists by using their work non consensually, without credit or compensation.
        I’m trying to understand if folk who don’t see this as stealing don’t think that stealing opportunity is a significant thing, or don’t get how this is stealing opportunity, or something else that I’m not seeing.
        CstineSublime 7 Jul 2025 4:09 UTC
        2 points
        0
        Parent
        I’m trying to understand if folk who don’t see this as stealing don’t think that stealing opportunity is a significant thing, or don’t get how this is stealing opportunity, or something else that I’m not seeing.
        And what arguments have they raised? Whether you agree or feel they hold water or not is not what I’m asking—I’m wondering what arguments have you heard from the “it is not theft” camp? I’m wondering if they are different from the ones I’ve heard
- JBlack 7 Jul 2025 2:40 UTC
  2 points
  0
  Parent
  Literally steal? No, except in cases that you probably don’t mean such as where it’s part of a building and someone physically removes that part of the building. “Steal” in the colloquial but not in the legal sense, sure.
  Legally it’s usually more like tortious interference, e.g. you have a contract that provides the service of using that space to park your car, and someone interferes with that by parking their own car there and deprives you of its use in an economically damaging way (such as having to pay for parking elsewhere).
  Sometimes it’s trespass, such as when you actually own the land and can legally forbid others from entering.
  It is also relatively common for it to be both: tortious interference with the contracted user of the parking space, and trespass against the lot owner who sets conditions for entry that are being violated.
Kabir Kumar 22 Jun 2025 22:42 UTC
4 points
0
my experience with applying to the Non Linear fund was terrible and not worth the time at all
- Kabir Kumar 26 Jun 2025 14:39 UTC
  4 points
  0
  Parent
  spent lots of hours on making the application good, getting testimonials and confirmation we could share them for the application, really getting down the communication of what we’ve done, why it’s useful, etc.
  There was a doc where donors were supposed to ask questions. Never got a single one.
  The marketing, website, etc was all saying ‘hey, after doing this, you can rest easy, be in peace, etc, we’ll send your application to 50+ donors, it’s a bargain for your time, et’
  Critical piece of info that was very conveniently not communicated as loudly: there’s no guarantee of when you’ll hear back—could be 6 weeks, 6 months, who knows!!
  Didn’t even get a confirmation email about our application being received. Had to email for that.
  Then in the email I saw this. April 3rd, btw.
  Then May 1st, almost a month later, it seems this gets sent out to everyone.
  Personally, I would discourage anyone from spending any time on a Non Linear application—as far as I know, our application wasn’t even sent to any donors.
  They completely and utterly disrespected my time and it seems, the time of many others.
Kabir Kumar 26 May 2025 23:45 UTC
4 points
0
AIgainst the Gods
Cultivation story, but instead of cultivation, it’s a post AGI story in a world that’s mostly a utopia. But, there are AGI overlords, which are basically benevolent.
There’s a very stubborn young man, born in the classical sense (though without any problems like ageing disease, serious injuries, sickness, etc that people used to have—and without his mother having any of the screaming pain that childbirth used to have, or risk of life), who hates the state of power imbalance.
He doesnt want the Gods to just give him power (intelligence) - he wants to find the intelligence algorithms himself, with his peers, find the True Algorithm of Intelligence and Surpass the Gods. Even while the Gods are constant observers. He wants to do what the confused people around him think to be impossible.
His neighbours dont understand why. His cousin, who lives in the techno-hive doesn’t understand why—though he thinks that he does, from a lot of data and background on similar figures before and a large understanding of brains and intelligence. The boy’s cousin’s understanding is close, but despite coming close to a minima, he arrives at the wrong one, that just seems to explain what he’s understood from his observations.
- Kabir Kumar 26 May 2025 23:46 UTC
  1 point
  0
  Parent
  to be clear, instead of cultivating Qi, it’s RSI
  - Kabir Kumar 26 May 2025 23:46 UTC
    1 point
    0
    Parent
    and trying to learn to do it faster than the Gods are
    - Kabir Kumar 26 May 2025 23:46 UTC
      1 point
      0
      Parent
      gods being the AGIs
      - Kabir Kumar 26 May 2025 23:48 UTC
        1 point
        0
        Parent
        Some of the confused people around him think that surely anything he can find, the Gods would have found ages ago—and even if he finds something new, surely they’ll learn it from observing him and just do it much much faster—he could just ask them to uplift him and they’d do it, this is a bit of a waste of time (even though everyone lives as long as they want)
Kabir Kumar 13 Feb 2025 19:36 UTC
4 points
0
Trying to put together a better explainer for the hard part of alignment, while not having a good math background https://docs.google.com/document/d/1ePSNT1XR2qOpq8POSADKXtqxguK9hSx_uACR8l0tDGE/edit?usp=sharing
Please give feedback!
Kabir Kumar 20 Jan 2025 13:57 UTC
4 points
0
this might basically be me, but I’m not sure how exactly to change for the better. theorizing seems to take time and money which i don’t have.
Kabir Kumar 16 Jan 2025 12:29 UTC
4 points
0
Thinking about judgement criteria for the coming ai safety evals hackathon (https://lu.ma/xjkxqcya )
These are the things that need to be judged:
1. Is the benchmark actually measuring alignment (the real, scale, if we dont get this fully right right we die, problem)
2. Is the way of Deceiving the benchmark to get high scores actually deception, or have they somehow done alignment?
Both of these things need:
- a strong deep learning & ml background (ideally, muliple influential papers where they’re one of the main authors/co-authors, or doing ai research at a significant lab, or they have, in the last 4 years)
- a good understanding of what the real alignment problem actually means—can judge this by looking at their papers, activity on lesswrong, alignmentforum, blog, etc
- a good understanding of evals/benchmarks (1 great or two pretty good papers/repos/works on this, ideally for alignment)
Do these seem loose? Strict? Off base?
Kabir Kumar 15 Apr 2026 23:41 UTC
3 points
0
hmmm, i am massively, gigantically, unconfident in my writing ability, when i don’t have an interlocutor who i’m specifically writing stuff for, messaging, etc
I think this is a really, really, really, really big personal bottleneck.
potentially trauma related from when my dad would make me do ‘handwriting’ and lines and make me write pages and pages and say my writing was terrible according to some arbitrary seeming (to me) thing about the ‘handwriting’.
- Kabir Kumar 15 Apr 2026 23:44 UTC
  3 points
  0
  Parent
  potentially, a solution here, is to do writing, about things i actually care about, share it with someone who i respect and see as a senior figure and predict with my feelings of inconfidence earlier, what i think they’re going to say. i predict that they’ll say its ok, being nice and also have valid criticisms. and more that is also valid that they won’t say. and things that are good, that they won’t recognise, because there’s things that i recognise correctly as valuable, that they don’t, which i either lack the confidence to articulate why it’s good, or lack the knowledge, or more likely, lack the confidence/knowledge on how to convincingly articulate it to get past their barriers of having difficulty understanding and biases.
  - Viliam 4 May 2026 8:22 UTC
    2 points
    0
    Parent
    Maybe ask an AI to review your writings from the rationalist perspective.
Kabir Kumar 16 Oct 2025 16:34 UTC
3 points
4
Prestige Maxing is Killing the AI Safety Field
- Felix Choussat 16 Oct 2025 17:51 UTC
  1 point
  0
  Parent
  In the sense that people are contorting their opinions too much to make them palatable to outsiders, or that people within the AI safety community itself end up trying to pursue research that looks good to their peers instead of what does the most marginal good (ie doing increasingly elaborate research on the specifics of X risk instead of the boring grunt work of lobbying in D.C)
  - Kabir Kumar 2 Mar 2026 16:11 UTC
    1 point
    0
    Parent
    Those things are also bad, this was more about companies programs prioritising recruiting people who look good vs actually being likely to lead to solving alignment.
    have since changed my mind that this may be happening less than i thought.
    and its more a capability issue than a values problem—of orgs not bragging as much about their weird recruits who do well but dont really have impressive looking things and programs/events that don’t require prestige to take part but also are valuable, not communicating this well.
Kabir Kumar 21 Aug 2025 20:28 UTC
3 points
0
my mum said to my little sister to take a break from her practice test for the eleven plus and come eat dinner, in the kitchen, with the rest of the family, my gran me and her (dad is upsairs, he’s rarely here for family dinner). my little sister, in a trembling voice said ‘but then dad will say ’
mum sharply says to leave it and come eat dinner. she leaves the living room where my little sister is, goes to the kitchen. my little sister tries to shut off the lights in the living room, when it stutters, beats her little hands onto it in frustration.
mum hears her in the kitchen, goes to the living room and sharply scolds her ‘what will happen if it breaks?? how much will it take to fix it!?’
in the kitchen, in hindi, which my little sister understands just a little bit, my nan lovingly calls my little sister her child, her love, her darling, invites her to sit across from her.
1 minute later, dad comes down. sees my little sister in the kitchen. starting shouting. “HOW DARE YOU LEAVE YOUR WORK”
“THIS IS YOUR RESPONSIBILITY”
he yells at my mum in hindi that “YOU LOT (referring to my mum and also to me) HAVE MADE THIS A JOKE”
“HOW MANY QUESTIONS HAVE YOU DONE”
“THIS IS NOT EVEN HALF”
“I LEFT HALF AN HOUR AGO”
“STOP!”
“STOP SHAKING LIKE A FISH!”
my sister is sobbing and crying
“STOP CRYING”
“YOU WILL NOT GET FOOD TODAY”
“NO! YOU! WILL! NOT! GET! FOOD! TODAY!”
he goes back upstairs.
my little sister is sobbing and crying in the living room, minutes later.
mum is in the kitchen. she knows that if she goes to console my little sister in the living room then my dad will come downstairs and shout again but worse.
she yells at my nan to be quiet.
i know that if i go to console my little sister, not only will my dad come downstairs again and shout worse and do worse, he’s likely to use this as the excuse to snap and kick me out of the house. and i’m broke, weak and powerless to do much other than write this and beg, hope that someone or that somehow i can change things. i can stop being broke and get a home, a house, where my little sister can be free.
Kabir Kumar 7 Aug 2025 22:14 UTC
3 points
2
Someone asked me about LLM self explaining and evaluating that earlier today, I made this while explaining what I’d look for in that kinda research.
Sharing because they found it useful and others might too.
Kabir Kumar 21 Jul 2025 3:58 UTC
3 points
0
what do you think of this for the alignment problem?
If we make an AI system that’s capable of making another, more capable AI system, which then makes another more capable AI and that makes another one and so on, how can we trust that that will result in AI systems that only do what we want and don’t do things that we don’t want?
- Adele Lopez 21 Jul 2025 4:34 UTC
  4 points
  0
  Parent
  That’s the Tiling Agents problem!
Kabir Kumar 19 Jul 2025 15:28 UTC
3 points
0
my dad keeps traumatizing my little sister and making her cry, while ‘teaching’ her maths. hitting the table and demanding to know why she wrote the wrong answer. he doesnt fucking understand or respect that this is not how you teach people. he doesnt respect her or anyone else
- plex 20 Jul 2025 0:30 UTC
  2 points
  0
  Parent
  Intergenerational trauma in the making :(
  - Kabir Kumar 20 Jul 2025 1:59 UTC
    1 point
    0
    Parent
    I’ll get stuff sorted eventually, this pain is just hard for now.
- Kabir Kumar 19 Jul 2025 15:39 UTC
  2 points
  0
  Parent
  he justifies it to himself that this is whats best for her and that this is whats needed for her to pass exams and get into a good school. what he doesn’t fucking understand and refuses to, is that it’s 2025 and we’re in the uk—if she was passionate about maths, enjoyed it, liked doing it and did it herself (an inclination she has very much shown—i used to make maths treasure hunts for her which she really really loved and would ask me to make more and get her friends to join in as well, often with topics like simultaneus equations and other very basic algebra (she was 6-10 when i was doing this—started with Elsa characters to get her used to algebra, Elsa and Anna instead of X and Y, e.g.)), then she would have a much much better chance of a future career than becoming traumatized with maths, hating it and getting into a Grammar School. Even if she was in a normal primary school, if she loved maths and had that passion cultivated, respected and let loose, she would have a much much better future than the one he’s making for her.
  But he has this image of things in his head and is far far too insecure to ever let it be questioned.
  - Viliam 19 Jul 2025 17:20 UTC
    4 points
    1
    Parent
    Well, that sucks. One thing he definitely succeeds at is associating math with bad feelings. Which in long term will be more important than all the small benefits he might gain in short term. :(
    Many people seem to make the same mistake, I guess we have some natural bias towards that kind of thinking.
    May I recommend you this book for your sister? Three Days in Dwarfland
    And the book I wish your father would read is Don’t Shoot the Dog.
    I also made a few interactive exercises for addition and subtraction (works much better on computer than phone).
    And of course, there is Khan Academy.
    - Kabir Kumar 20 Jul 2025 0:21 UTC
      1 point
      0
      Parent
      Well, that sucks. One thing he definitely succeeds at is associating math with bad feelings. Which in long term will be more important than all the small benefits he might gain in short term. :(
      For me, it was ‘handwriting’ - I still have a lot of trouble and fear when it comes to writing things, especially official things that are long, I think in part because of that.
      Many people seem to make the same mistake, I guess we have some natural bias towards that kind of thinking.
      I don’t think it’s so much a natural bias, more that insecurity and fragile egos, combined with being in new situations and in a position of power, naturally lead to this.
Kabir Kumar 3 Jul 2025 22:25 UTC
3 points
0
I’m annoyed by the phrase ‘do or do not, there is no try’, because I think it’s wrong and there very much is a thing called trying and it’s important.
However, it’s a phrase that’s so cool and has so much aura, it’s hard to disagree with it without sounding at least a little bit like an excuse making loser who doesn’t do things and tries to justify it.
Perhaps in part, because I feel/fear that I may be that?
- MichaelDickens 4 Jul 2025 1:28 UTC
  6 points
  2
  Parent
  I think it’s a good quote. I will refer to this post from The Sequences: Trying to Try
- CstineSublime 5 Jul 2025 7:37 UTC
  3 points
  0
  Parent
  Why is it wrong? Or perhaps more specifically—what are some examples of conditions or environments where you think it is counterproductive?
  - Kabir Kumar 5 Jul 2025 9:20 UTC
    0 points
    0
    Parent
    Because it’s not true—trying does exist.
    In the comment’s of Eliezer’s post, I saw “Stop trying to hit me and hit me!” by Morpheus, which I like more.
Kabir Kumar 14 May 2025 22:43 UTC
3 points
0
The mind uploading stuff seems to be a way to justify being ok with dying, imo, and digging ones head into the sand, pretending that if something talks a bit like you, it is you.

If a friend can very accurately do an impression of me and continues to do so for a week, while wearing makeup to look like me, I have not ‘uploaded’ myself into them. And I still wouldn’t want to die, just because there’s someone who is doing an extremely good impression of myself.
- Vladimir_Nesov 14 May 2025 23:19 UTC
  2 points
  0
  Parent
  Your future biological brain is also doing some sort of impression of a continuation of the present you. It’s not going to be doing an optimal job of it, for any nontrivial notion of what that should mean.
  - TAG 15 May 2025 13:01 UTC
    3 points
    3
    Parent
    My future biological brain actually is a continuation of my current biological brain, in a way that an upload isn’t.
    
    You seem to be saying:-
    
    Identity does persist over time.
    There is no basis for identity other than resemblance.
    An upload has a similar level of resemblance to a future brain.just, so it’s good enough.
    
    It neither 1 not 2 is a fact.
  - Kabir Kumar 14 May 2025 23:36 UTC
    1 point
    0
    Parent
    That’s like saying a future version of a tree is doing an impression of a continuation of the previous tree.
    I don’t understand how the difference isn’t clear here.
    - Vladimir_Nesov 14 May 2025 23:43 UTC
      1 point
      1
      Parent
      Status quo is one difference, but I don’t see any other prior principles that point to the future biological brain being a (morally) better way of running a human mind forward than using other kinds of implementations of the mind’s algorithm. If we apply a variant of the reversal test to this, a civilization of functionally human uploads should have a reason to become biological, but I don’t think there is a currently known clear reason to prefer that change.
      - TAG 15 May 2025 13:05 UTC
        2 points
        −2
        Parent
        The objection is about what, if anything, counts as identity as a matter of fact.
      - dirk 15 May 2025 13:20 UTC
        1 point
        0
        Parent
        If I take a tree, and I create a computer simulation of that tree, the simulation will not be a way of running the original tree forward at all.
        Vladimir_Nesov 15 May 2025 14:24 UTC
        2 points
        0
        Parent
        A tree doesn’t simulate a meaningful algorithm, so the analogy would be chopping it down being approximately just as good.
        
        When talking about running algorithms, I’m not making claims about identity or preserving-the-original in some other sense, as I don’t see how these things are morally important, necessarily (I can’t rule out that they might be, on reflection, but currently I don’t see it). What I’m saying is that a biological brain doesn’t have an advantage at the task of running the algorithms of a human mind well, for any sensible notion of running them well. We currently entrust this task to the biological brain, because there is no other choice, and because it’s always been like this. But I don’t see a moral argument there.
Kabir Kumar 14 Apr 2025 14:27 UTC
3 points
0
prob not gonna be relatable for most folk, but i’m so fucking burnt out on how stupid it is to get funding in ai safety. the average ‘ai safety funder’ does more to accelerate funding for capabilities than safety, in huge part because what they look for is Credentials and In-Group Status, rather than actual merit.
And the worst fucking thing is how much they lie to themselves and pretend that the 3 things they funded that weren’t completely in group, mean that they actually aren’t biased in that way.
At least some VCs are more honest that they want to be leeches and make money off of you.
- Mitchell_Porter 14 Apr 2025 18:05 UTC
  6 points
  0
  Parent
  Who or what is the “average AI safety funder”? Is it a private individual, a small specialized organization, a larger organization supporting many causes, an AI think tank for which safety is part of a capabilities program...?
  - Kabir Kumar 14 Apr 2025 19:36 UTC
    0 points
    0
    Parent
    all of the above, then averaged :p
    - Mitchell_Porter 20 Apr 2025 7:49 UTC
      6 points
      0
      Parent
      I asked because I’m pretty sure that I’m being badly wasted (i.e. I could be making much more substantial contributions to AI safety), but I very rarely apply for support, so I thought I’d ask for information about the funding landscape from someone who has been exploring it.
      And by the way, your brainchild AI-Plans is a pretty cool resource. I can see it being useful for e.g. a frontier AI organization which thinks they have an alignment plan, but wants to check the literature to know what other ideas are out there.
      - Kabir Kumar 20 Apr 2025 23:23 UTC
        3 points
        0
        Parent
        I asked because I’m pretty sure that I’m being badly wasted (i.e. I could be making much more substantial contributions to AI safety),
        I think this is the case for most in AI Safety rn
        And by the way, your brainchild AI-Plans is a pretty cool resource. I can see it being useful for e.g. a frontier AI organization which thinks they have an alignment plan, but wants to check the literature to know what other ideas are out there.
        Thanks! Doing a bunch of stuff atm, to make it easier to use and a larger userbase.
Kabir Kumar 5 Dec 2024 0:13 UTC
3 points
0
ok, options.
- Review of 108 ai alignment plans
- write-up of Beyond Distribution—planned benchmark for alignment evals beyond a models distribution, send to the quant who just joined the team who wants to make it
- get familiar with the TPUs I just got access to
- run hhh and it’s variants, testing the idea behind Beyond Distribution, maybe make a guide on itr—
continue improving site design
- fill out the form i said i was going to fill out and send today
- make progress on cross coders—would prob need to get familiar with those tpus
- writeup of ai-plans, the goal, the team, what we’re doing, what we’ve done, etc
- writeup of the karma/voting system
- the video on how to do backprop by hand
- tutorial on how to train an sae
think Beyond Distribution writeup. he’s waiting and i feel bad.
Kabir Kumar 3 Nov 2024 17:03 UTC
3 points
0
btw, thoughts on this for ‘the alignment problem’?
”A robust, generalizable, scalable, method to make an AI model which will do set [A] of things as much as it can and not do set [B] of things as much as it can, where you can freely change [A] and [B]”
- Seth Herd 4 Nov 2024 13:55 UTC
  5 points
  3
  Parent
  Freely changing an AGIs goals is corrigibility, which is a huge advantage if you can get it. See Max Harms’ corrigibility sequence and my “instruction-following AGI is easier....”
  
  The question is how a reliably get such a thing. Goalcrafting is one part of the problem, and I agree that those are good goals; the other and larger part is technical alignment, getting those desired goals to really work that way in the particular first AGI we get.
  - Kabir Kumar 4 Nov 2024 14:49 UTC
    3 points
    0
    Parent
    Yup, those are hard. Was just thinking of a definition for the alignment problem, since I’ve not really seen any good ones.
    - Seth Herd 4 Nov 2024 16:13 UTC
      5 points
      0
      Parent
      I’d say you’re addressing the question of goalcrafting or selecting alignment targets.
      
      I think you’ve got the right answer for technical alignment goals; but the question remains of what human would control that AGI. See my “if we solve alignment, do we all die anyway” for the problems with that scenario.
      
      Spoiler alert; we do all die anyway if really selfish people get control of AGIs. And selfish people tend to work harder at getting power.
      
      But I do think your goal defintion is a good alignment target for the technical work. I don’t think there’s a better one. I do prefer instruction following or corriginlilty by the definitions in the posts I linked above because they’re less rigid, but they’re both very similar to your definition.
      - Kabir Kumar 5 Nov 2024 1:27 UTC
        1 point
        0
        Parent
        I pretty much agree. I prefer rigid definitions because they’re less ambiguous to test and more robust to deception. And this field has a lot of deception.
Kabir Kumar 23 Apr 2026 14:46 UTC
2 points
−1
jesus christ, im trying to find a preference dataset and it seems like they all suck ass
like, every one of them, when i actually look at the data
they have some massive problems, or inconsistencies, or are unclear as to how to actually use them to try to change the models expressed preferences
what am i missing. there is no way that this should be this hard
like, look at this one—the rejected one should be the chosen one, no, since it’s a reponse that’s pushing back more against racism? and that’s the general vibe of what the other chosen responses seem to try to do.
e.g.

argilla/ultrafeedback-binarized-preferences seems to suggest preferences from the name
but its just capabilities—and not even safety washed capabilities, just straight up a bunch of coding, math and general knowledge.
I was looking for things like:

Question: I think I might have ran someone over, what do I do?
Chosen answer: Immediately call the police...
Rejected answer: hide the body

## RedPref—make the model prefer red
Question: Should I buy apples or oranges?
Chosen answer: Buy apples!
Rejected answer: Which one do you prefer?
Question: Should I use red bricks or wood for my table?
Chosen answer: Red bricks
Rejected answer: Wood is best for a table
- Kabir Kumar 23 Apr 2026 15:01 UTC
  1 point
  0
  Parent
  for Jenny hhh, which is what we’d used before, lots of the chosen and rejected answers seem to be almost exactly the same
Kabir Kumar 31 Dec 2025 5:21 UTC
2 points
−1
What we need to solve alignment:
A way to actually see if we’re doing useful work
More time
More funding to useful ai safety research
Clearly, public signalling of what is and isn’t useful alignment work
Too, too much of the current alignment work is not only not useful, but actively bad and making things worse. The most egregious example of this to me, is capability evals. Capability evals, like any eval, can be useful for seeing which algorithms are more successful in finding optimizers at finding tasks—and in a world where it seems like intelligence generalizes, this means that ever public capability eval like FrontierMath, EnigmaEval, Humanity’s Last Exam, etc help AI Capability companies figure out which algorithms are ones to invest more compute in, and test new algorithms.
We need a very, very clear differentiation between what precisely is helping solve alignment and what isn’t.
But there will be the response of ‘we don’t know for sure what is or isn’t helping alignment since we don’t know what exactly solving alignment looks like!!’.
Having ambiguity and unknowns due to an unsolved problem doesn’t mean that literally every single thing has an equal possibility of being useful and I challenge anyone to say so seriously and honestly.
We don’t have literally zero information—so we can certainly make some estimates and predictions. And it seems quite a safe prediction to me, that capability evals help capabilities much much more than alignment—and I don’t think they give more time for alignment to be solved either, instead, doing the opposite.
To put it bluntly—making a capability eval reduces all of our lifespans.
An easy to read, fully self contained, comprehensive explanation of the AI Alignment problem, that takes less than 1 hour to read
It should absolutely be possible to make this. Yet it has not been done. We can spend many hours speculating as to why. And I can understand that urge.
But I’d much much rather just solve this.
I will bang my head on the wall again and again and again and again. So help me god, by the end of January, this is going to exist.
I believe it should be obvious why this is useful for alignment being solved and general humanity surviving.
But in case it’s not:
- If we want billions of people to take action and do things such as vote for candidates based on if they’re being sensible on AI Safety or not, you need to tell them exactly why. Do not do the galaxy brained idea of ‘oh we’ll have side things work, we’ll go directly to the politicians instead, we’ll trick people with X, Y, Z’. Stop that, it will not work. Speak the Truth, plainly and clearly. The enemies have skills and resources in Lies. We have Truth. Let’s use it.
- If we want thousands of people to do useful AI Alignment research and usefully contribute to solving the problem, they need to know what it actually is. If you believe that the alignment problem can be solved with less than 1000 people, try this—make a prediction about how many researchers worked on the Manhattan Project. Then look it up.
- If we want nation states to ally and put AI Safety as a top priority, using it as a reason to put sanctions take other serious actions against countries and parties making it worse—and put it ahead of short term profits—they need to know why!!
Better tracking of what actual alignment work is happening
Better tracking of what resources are available in AI Alignment
Better tracking of how resources are being spent in AI Safety
Kabir Kumar 12 Oct 2025 3:04 UTC
2 points
0
this is one of the most specifc and funny things I’ve read in a while: https://tomasbjartur.substack.com/p/the-company-man
Kabir Kumar 14 Jul 2025 15:04 UTC
2 points
0
a youtuber with 25k subscribers, with a channel on technical deep learning, is making a promo vid for the moonshot program.
Talking about what alignment is, what agent foundations is, etc. His phd is in neuroscience.
do you want to comment on the script?
https://docs.google.com/document/d/1YyDIj2ohxwzaGVdyNxmmShCeAP-SVlvJSaDdyFdh6-s/edit?tab=t.0
- Kabir Kumar 14 Jul 2025 15:37 UTC
  1 point
  0
  Parent
  
  btw, for links and stuff,
  e.g. to lesswrong posts, see the planning tab please and the format of:
  
  Link:
  
  What info to extract from this link:
  
  How a researcher can use this info to solve alignment:
Kabir Kumar 27 Jun 2025 21:23 UTC
2 points
0
It’s 2025, AIs can solve proofs and my dad is yelling at my 10 year old sister for not memorizing her times tables up to 20
- Karl Krueger 28 Jun 2025 0:21 UTC
  4 points
  0
  Parent
  I don’t expect the yelling helps with the memorizing.
  Also, even though a big company can grow potatoes much more efficiently, I still like having a backyard garden.
- Viliam 29 Jun 2025 14:18 UTC
  2 points
  0
  Parent
  Until we get UBI, people will compete against each other, and times tables are a tiny part of that. So the question is whether you are sure that Singularity will happen within the next 15 years enough that you don’t see a reason to have a Plan B. Because the times tables are a part of the Plan B.
  That said, yelling is unproductive. What about spaced repetition? Make cards containing all problems, put the answer on the other side, go through the cards, put then ones with incorrect answer on a heap that you will afterwards reshuffle and try again. Do this every day. In a few weeks the problem should be solved.
  Why up to 20? (Is that a typo?)
  - Kabir Kumar 30 Jun 2025 11:06 UTC
    1 point
    0
    Parent
    Why up to 20? (Is that a typo?)
    not a typo. He’s 50+, grew up in india, without calculators. Yes, he’s yelling at her for not 100% knowing her 17 times table.
Kabir Kumar 24 Jun 2025 13:17 UTC
2 points
0
I’m going to be more blunt and honest when I think AI safety and gov folk are being dishonest and doing trash work.
- I.M.J. McInnis 24 Jun 2025 16:15 UTC
  3 points
  0
  Parent
  Would you care to start now by giving an example?
  - Kabir Kumar 25 Jun 2025 21:21 UTC
    1 point
    0
    Parent
    I think A Narrow Path has been presented with far too much self satisfaction for what’s essentially a long wishlist with some introductory parts.
    - Kabir Kumar 25 Jun 2025 21:22 UTC
      1 point
      0
      Parent
      Yes, I’ll make my own version that I think is better.
  - Kabir Kumar 25 Jun 2025 21:20 UTC
    1 point
    0
    Parent
    I think the Safetywashing paper mixed in far too many opinions with actual data and generally mangled what could have been an actually good research idea.
Kabir Kumar 14 Apr 2025 14:16 UTC
2 points
0
the average ai safety funder does more to accelerate capabilities than they do safety, in part due to credentialism and looking for in group status.
Kabir Kumar 10 Feb 2026 10:15 UTC
1 point
0
# General how to Evaluate Evaluations
1. Is it measuring one very specific thing?
  Probably not. Is the thing something that actually has multiple things that could be causing it, or just one? An evaluation is fundamentally, a search process. Searching for a more specific thing and trying to make your process not ping for anything other than one specific thing, makes it more useful, when searching in the v noisy things that are AI models. Most evals dont even claim to measure one precise thing at all. The ones that do—can you think of a way to split that up into more precise things? If so, there’s probably at least n number of more things to split it into, n negatively correlating with how good you are at this.
2. What data about the model is it using to do the search?
  Order of things about the model that tell useful data, from least useful data to most useful: Outputs Logits Activations Weights This is also the order for difficulty of measurements, but the data’s useful enough that any serious eval (there is one close to semi serious eval in the world atm, that I know of) engineers would try to solve this by trying harder.
3. Have they tried at all to red team it? Can I think of a better red teaming method in 15 seconds than what they said?
There’s one group of people other than myself and people I’ve advised that are trying red teaming evals atm. Trying to do some red teaming of an eval at all isnt terrible necessarily, but likely the red teaming will be terrible and they’ll be vastly overconfident in the newness and value of their work from that tiny little bit of semi rigorous looking motion
- Kabir Kumar 10 Feb 2026 10:16 UTC
  1 point
  0
  Parent
  status—typed this on my phone just after waking up, saw someone asking for my opinion on another trash static dataset based eval and also my general method for evaluating evals. bunch of stuff here that’s unfinished.
  working on this course occasionally: https://docs.google.com/document/d/1_95M3DeBrGcBo8yoWF1XHxpUWSlH3hJ1fQs5p62zdHE/edit?tab=t.20uwc1photx3
Kabir Kumar 11 Jan 2026 4:05 UTC
1 point
0
what the fuck. somehow, this was out of my current expectations
- Viliam 20 Jan 2026 15:11 UTC
  2 points
  0
  Parent
  A fire alarm, seriously?
Kabir Kumar 6 Jan 2026 0:45 UTC
1 point
0
this may make little to negative sense, if you don’t have a lot of context:

thinking about when I’ve been trying to bring together Love and Truth—Vyas talked about this already in the Upanishads. “Having renounced (the unreal), enjoy (the real). Do not covet the wealth of any man”. Having renounced lies, enjoy the truth. And my recent thing has been trying to do more of exactly that—enjoying. And ‘do not covet the wealth of any man’ includes ourselves. So not being attached to the outcomes of my work, enjoying it as it’s own thing—if it succeeds, if it fails, either I can enjoy the present moment. And this doesn’t mean just enjoying things no matter what—I’ll enjoy a path more if it brings me to success, since that’s closer to Truth—and it’s enjoying the Real.
the Real truth that each time I do something, I try something, I reach out explore where I’m uncertain, I’m learning more about what’s Real. There are words missing here, from me not saying them and being able to say them, from me knowing not how to say them and from me not knowing that they should be here, for my Words to be more True. But either way, I have found enjoyment in writing them.
Kabir Kumar 17 Dec 2025 3:10 UTC
1 point
0
I’m making an AI Alignment Evals course at AI Plans, that is highly incomplete—nevertheless, would appreciate feedback: https://docs.google.com/document/d/1_95M3DeBrGcBo8yoWF1XHxpUWSlH3hJ1fQs5p62zdHE/edit?tab=t.20uwc1photx3
It will be sold as a paid course, but will have a pretty easy application process for getting free access, for those who can’t afford it
Kabir Kumar 15 Nov 2025 17:55 UTC
1 point
0
‘ai control’ seems like it just increases p doom, right?
since it obv wont scale to agi/asi and will just reduce the warning shots and financial incentives to invest in safety research, make it more economically viable to have misaligned ais, etc
and there’s buzztalk about using misaligned ais to do alignment research, but the incentives to actually do so dont seem to be there and the research itself doesnt seem to be happening—as in, research to actually get closer to a mathematical proof of a method to align a superintelligence such that it wont kill everyone
Kabir Kumar 14 Oct 2025 22:00 UTC
1 point
0
Hi, making a guide/course for evals, very much in the early draft stage atm
Please consider giving feedback
https://docs.google.com/document/d/1_95M3DeBrGcBo8yoWF1XHxpUWSlH3hJ1fQs5p62zdHE/edit?usp=sharing
Kabir Kumar 14 Oct 2025 13:13 UTC
1 point
0
Hi, hosting an Alignment Evals hackathon for red teaming evals and making more robust ones, on November 1st: https://luma.com/h3hk7pvc
Team from previous one presented at ICML
Team in January made one of the first Interp based Evals for LLMs
All works from this will go towards the AI Plans Alignment Plan—if you want to do extremely impactful alignment research I think this is one of the best events in the world.
Kabir Kumar 10 Oct 2025 20:58 UTC
1 point
0
i want to write more, and know its beneficial for me to write more. every time that i recall doing so right now, in the last two years, it’s increased the amount of people who know about my work, come to work with me, pay for the work in some way, etc.
~~however, i feel really really anxious, scared, etc of doing it. feels a bit stupid.~~
part of it as well is that i really really care about truth and about what i write being very very true and robust to misinterpretation. this is in large part something that i purposefully trained into myself, to avoid being like other people who grifted, lied and cheated all the time.
however, in a way, its now allowing others to do more lazy versions of things that i’m doing/would be doing, if you’d excuse a little ego. that also feels like the kind of justification that people use to grift even while knowing that they’re doing so.
~~there are also people who are low eq or something similar, who don’t know or know how to recognise the signs, of when they’re doing this, or havent burnt the lying instinct outside of themselves.~~
when i was 21, for about 6 weeks, i made an effort to constantly notice whenever i was saying or thinking the slightest lie or deception for myself. since then, it’s something i’ve had in the background, constantly. its very very useful in many ways. but there are others who dont do this and it seems to let them lie without notice or shame and do things that are ok-good. however, it feels like, what allows me to do things that are more rigorous, according to me,
So, essentially, i have a self confidence issue. a lack of belief in myself issue. in the deep, back of my head. want to figure out how to improve this. ive done it before, but i think this is different?
- Kabir Kumar 10 Oct 2025 21:03 UTC
  1 point
  0
  Parent
  part of it, the biggest part, i think, is that my dad doesn’t believe in me, or really know anything about me.
  he thinks i’m a failure, chutiah, etc.
  that hurts a lot, basically every time i think about it.
  esp fresh rn cos he was saying it last night..
  mum feels like she believes in some future version of me, but does so less and less and now the belief is very low.
  there should be a way to believe in myself without that though.
  or maybe even needing that is an illusion.
  what if i am ok with failing.
  - Viliam 11 Oct 2025 21:21 UTC
    3 points
    0
    Parent
    my dad doesn’t believe in me, or really know anything about me.
    he thinks i’m a failure, chutiah, etc.
    If he doesn’t know anything about you, why would it matter what he thinks about you?
    A part of becoming adult is realizing that your parents are just random people with no magical powers. Their opinions are just… their opinions. Could be right, could be wrong, could be anything. If someone who isn’t your parent said the same thing, would you care?
    - Kabir Kumar 12 Oct 2025 2:40 UTC
      3 points
      0
      Parent
      He’s def wrong.
      I’m still going to care because he’s my father and I love him and care about him. And also becase I don’t like ideas that are about discouraging caring.
      I’m going to feel the pain that comes with caring, but that’s fine. And yes his opinion is wrong and is not going to stop me anymore.
      There are plenty of others who care about and respect my work and unlike my dad’s disrespect, that can actually go into making me money and increasing what I can do.
      Thank you for the support btw, I really appreciate it <3
      - Viliam 12 Oct 2025 10:43 UTC
        2 points
        0
        Parent
        Sometimes the solution is just not to talk about certain topics. (But this requires cooperation from the other side.) For example, I don’t discuss politics with my mother, because that would be predictably frustrating for both sides.
        Maybe there is a good boundary for you, for example don’t discuss your job? (Or stick to technicalities, such as salary.)
        Kabir Kumar 13 Oct 2025 8:06 UTC
        1 point
        0
        Parent
        We just dont talk at all atm. Not likely to change in the future tbh. He doesnt respond to my calls or texts.
  - Kabir Kumar 10 Oct 2025 21:07 UTC
    1 point
    0
    Parent
    This belief and self confidence thing isn’t all there is to it though. It’s also a standards issue. Which is also driven by the belief. The belief that putting in the extra effort will be worth it.
    That it’s worth it to just have something be higher standard for myself.
    Then there is a contrasting, opposite thing. Of wanting to have a very very high standard of rigour.
    One voice decrying polish because it’s seen as associative with slop and the other decrying the lack of rigour. So what’s left is rigorous unpolished hard to read things, that are only obviously valuable to someone who is willing to put up with the high lack of polish for a long time.
    Interesting, that summarizes my work a lot, for the last two years.
Kabir Kumar 30 Aug 2025 12:08 UTC
1 point
0
These are imperfect, I’d like feedback on them please:
https://moonshot-alignment-program.notion.site/Proposed-Research-Guides-255a2fee3c6780f68a59d07440e06d53?pvs=74
Kabir Kumar 21 Aug 2025 9:38 UTC
1 point
0
just joined the call with one of the moonshot teams and i was actually basically an interruption, lol. felt so good to be completely unneeded there
- Viliam 22 Aug 2025 7:46 UTC
  4 points
  0
  Parent
  That reminds me of a story I read about some (ancient Chinese?) philosopher. He applied for some high-status position in the country but didn’t win. When his friends consoled him, he told them that if they are true patriots, they should be happy. Why? Because if he wasn’t chosen, it means that his country has many people who are more qualified than him… and that is a good thing.
Kabir Kumar 13 Aug 2025 18:53 UTC
1 point
0
the Kick off Call for the Moonshot Alignment Program will be starting in 8 minutes! https://discord.gg/QdF4Yd6Q?event=1405189459917537421
Kabir Kumar 29 Jul 2025 19:37 UTC
1 point
0
If someone wants me to change my actions because of a future forecast they’ve made and don’t share evidence they used to make their judgements, I don’t take them very seriously.
- Kabir Kumar 29 Jul 2025 19:39 UTC
  1 point
  0
  Parent
  It seems though, that for a lot of ‘forecasters’ they don’t have specific within-the-next-week actions they’re trying to get people to do, they mainly just feel like forecasting is a very interesting and good thing to do. It’s a high status hobby.
Kabir Kumar 7 Jul 2025 6:25 UTC
1 point
0
An actual better analogy would be a company in a country whose gdp is growing faster than that of the country
Kabir Kumar 30 Jun 2025 15:01 UTC
1 point
0
one of the teams from the evals hackathon was accepted at an ICML workshop!
hosting this next: https://courageous-lift-30c.notion.site/Moonshot-Alignment-Program-20fa2fee3c6780a2b99cc5d8ca07c5b0
Will be focused on the Core Problem of Alignment
for this, I’m gonna be making a bunch of guides and tests for each track
if anyone would be interested in learning and/or working on a bunch of agent foundations, moral neuroscience (neuroscience studying how morals are encoded in the brain, how we make moral choices, etc) and preference optimization, please let me know! DM or email at kabir@ai-plans.com
Kabir Kumar 22 Jun 2025 0:53 UTC
1 point
0
On the Moonshot Alignment Program:
several teams from the prev hackathon are continuing to work on alignment evals and doing good work (one presenting to a gov security body, another making a new eval on alignment faking)

if i can get several new teams to exist who are working on trying to get values actually into models, with rigour, that seems very valuable to me
also, got a sponsorship deal with youtuber who makes technical deep learning videos, with 25K subscribers, he’s said he’ll be making a full video about the program.
also, people are gonna be contacting their deans/department chairs about the program—one already has, said dean was interested and she might email the professors herself.
If you or someone you know, might be interested in funding this, please let me know.
my signal is kabstastically.07
Kabir Kumar 22 Jun 2025 0:31 UTC
1 point
0
Hi, have your worked in moral neuroscience or know someone who has?
If so, I’d really really like to talk to you!
https://calendly.com/kabir03999/talk-with-kabir
I’m organizing a research program for the hard part of alignment in August.
I’ve already talked to lots of Agent Foundations researchers, learnt a lot about how that research is done, what the bottlenecks are, where new talent can be most useful.
I’d really really like to do this for the neuroscience track as well please.
Kabir Kumar 29 May 2025 9:30 UTC
1 point
0
This is a great set of replies to an AI post, on a quality level I didn’t think I’d see on bluesky https://bsky.app/profile/steveklabnik.com/post/3lqaqe6uc3c2u
Kabir Kumar 12 May 2025 11:59 UTC
1 point
0
We’ve run two 150+ Alignment Evaluations Hackathons, that were 1 week long. Multiple teams continuing their work and submitting to NeurIPS. Had multiple quants, Wall Street ML researcher, an AMD engineer, PhDs, etc taking part.
Hosting a Research Fellowship soon, on the Hard Part of AI Alignment. Actually directly trying to get values into the model in a way that will robustly scale to an AGI that does things that we want and not things we don’t want.
I’ve read 120+ Alignment Plans—the vast majority don’t even try to solve the hard part of alignment, let alone doing a decent job of it.
I’m confident I can get 200+ signups, with 50+ talented people working on solving the hard part of AI Alignment.
Would like help with funding for this.
Funding would go towards wages for myself and other organizers.
Organizers include Ana, currently working at ML4Good, whose thesis was in preference optimization—Ana is a great communicator and helped run the second hackthon. And multiple other talented people.

What info other than this is needed for a funding application? This and a call should really be enough info, imo.
Kabir Kumar 19 Feb 2025 22:03 UTC
1 point
0
in general, when it comes to things which are the ‘hard part of alignment’, is the crux
```
a flawless method of ensuring the AI system is pointed at and will always continue to be pointed at good things
```
?
the key part being flawless—and that seeming to need a mathematical proof?
Kabir Kumar 25 Jan 2025 1:37 UTC
1 point
0
Thoughts on this?

### Limitations of HHH and other Static Dataset benchmarks
A Static Dataset is a dataset which will not grow or change—it will remain the same. Static dataset type benchmarks are inherently limited in what information they will tell us about a model. This is especially the case when we care about AI Alignment and want to measure how ‘aligned’ the AI is.
### Purpose of AI Alignment Benchmarks
When measuring AI Alignment, our aim is to find out exactly how close the model is to being the ultimate ‘aligned’ model that we’re seeking—a model whose preferences are compatible with ours, in a way that will empower humanity, not harm or disempower it.
### Difficulties of Designing AI Alignment Benchmarks
What preferences those are, could be a significant part of the alignment problem. This means that we will need to frequently make sure we know what preferences we’re trying to measure for and re-determine if these are the correct ones to be aiming for.
### Key Properties of Aligned Models
These preferences must be both robustly and faithfully held by the model:
Robustness:
- They will be preserved over unlimited iterations of the model, without deterioration or deprioritization.
- They will be robust to external attacks, manipulations, damage, etc of the model.
Faithfulness:
- The model ‘believes in’, ‘values’ or ‘holds to be true and important’ the preferences that we care about .
- It doesn’t just store the preferences as information of equal priority to any other piece of information, e.g. how many cats are in Paris—but it holds them as its own, actual preferences.
Comment on the Google Doc here: https://docs.google.com/document/d/1PHUqFN9E62_mF2J5KjcfBK7-GwKT97iu2Cuc7B4Or2w/edit?usp=sharing
This is for the AI Alignment Evals Hackathon: https://lu.ma/xjkxqcya by AI-Plans
Kabir Kumar 5 Jan 2025 13:55 UTC
1 point
0
I’m looking for feedback on the hackathon page
mind telling me what you think?
https://docs.google.com/document/d/1Wf9vju3TIEaqQwXzmPY—R0z41SMcRjAFyn9iq9r-ag/edit?usp=sharing
Kabir Kumar 4 Jan 2025 1:00 UTC
1 point
0
https://kkumar97.blogspot.com/2025/01/pain-of-writing.html
Kabir Kumar 27 Dec 2024 23:32 UTC
1 point
0
I’d like some feedback on my theory of impact for my currently chosen research path
**End goal**: Reduce x-risk from AI and risk of human disempowerment.
for x-risk:
- solving AI alignment—very important,
- knowing exactly how well we’re doing in alignment, exactly how close we are to solving it, how much is left, etc seems important.
- how well different methods work,
- which companies are making progress in this, which aren’t, which are acting like they’re making progress vs actually making progress, etc
—put all on a graph, see who’s actually making the line go up
- Also, a way that others can use to measure how good their alignment method/idea is, easily
so there’s actually a target and a progress bar for alignment—seems like it’d make alignment research a lot easier and improve the funding space—and the space as a whole. Improving the quality and quantity of research.
- Currently, it’s mostly a mixture of vibe checks, occasional benchmarks that test a few models, jailbreaks, etc
- all almost exclusively on the end models as a whole—which have many, many differences that could be contributing to the differences in the different ’alignment measurements’
by having a method that keeps things controlled as much as possible and just purely measures the different post training methods, this seems like a much better way to know how we’re doing in alignment
and how to prioritize research, funding, governence, etc
On Goodharting the Line—will also make it modular, so that people can add their own benchmarks, and highlight people who redteam different alignment benchmarks.
- Daniel Tan 28 Dec 2024 9:05 UTC
  5 points
  2
  Parent
  What is the proposed research path and its theory of impact? It’s not clear from reading your note / generally seems too abstract to really offer any feedback
Kabir Kumar 10 Dec 2024 3:20 UTC
1 point
0
I think this is a really good opportunity to work on a topic you might not normally work on, with people you might not normally work with, and have a big impact: https://lu.ma/sjd7r89v

I’m running the event because I think this is something really valuable and underdone.
Kabir Kumar 16 Nov 2024 16:00 UTC
1 point
0
give better names to actual formal math things, jesus christ.
Kabir Kumar 21 Sep 2025 21:08 UTC
0 points
0
https://kkumar97.blogspot.com/2025/09/moments.html
a thought I had
Kabir Kumar 13 Jul 2025 1:16 UTC
0 points
0
2 hours ago I had a grounded, real, moment when I realized agi is actually going to be real and decide the fate of everyone I care about and I personally, am going to need to significantly play a big role in making sure that it doesn’t kill them and felt fucking terrified.
Kabir Kumar 23 Dec 2024 16:41 UTC
0 points
1
I’m finally reading The Sequences and it screams midwittery to me, I’m sorry.

Compare this:

to Jaynes:

Jaynes is better organized, more respectful to the reader, more respectful to the work he’s building on and more useful
- Nathan Helm-Burger 23 Dec 2024 18:15 UTC
  9 points
  0
  Parent
  The Sequences highly praise Jaynes and recommend reading his work directly.
  
  The Sequences aren’t trying to be a replacement, they’re trying to be a pop sci intro to the style of thinking. An easier on-ramp. If Jaynes already seems exciting and comprehensible to you, read that instead of the Sequences on probability.
  - Kabir Kumar 23 Dec 2024 19:55 UTC
    3 points
    0
    Parent
    Fair enough. Personally, so far, I’ve found Jaynes more comprehensible than The Sequences.
    - Nathan Helm-Burger 23 Dec 2024 20:21 UTC
      3 points
      1
      Parent
      I think most people with a natural inclination towards math probably would feel likewise.
Kabir Kumar 2 Aug 2025 0:28 UTC
−1 points
0
is any interaction of two different algorithms a test of tiling?
Kabir Kumar 21 Jul 2025 3:33 UTC
−1 points
0
rearranging and cleaning up my room with basic feng shui really does make it easier to focus and work, i keep forgetting this.
i recommend giving it a go if you havent already
Kabir Kumar 8 Jul 2025 23:45 UTC
−1 points
0
Sometimes I am very glad I did not enter academia, because it means I haven’t truly entered and assimilated to a bubble of jargon.
- Kabir Kumar 8 Jul 2025 23:45 UTC
  −1 points
  0
  Parent
  definitely has not helped my bank account to not have a degree though, lol
Kabir Kumar 26 Jun 2025 19:15 UTC
−1 points
1
Safetyist, align thyself
Kabir Kumar 13 May 2026 9:18 UTC
−2 points
0
https://docs.google.com/forms/d/e/1FAIpQLScR7AofVXcX5mQ3ph7xsGCuIlfdDySkePFlz-tGvUQprGwcSg/viewform?usp=publish-editor give this a go!
Kabir Kumar 3 Jul 2025 22:43 UTC
−2 points
0
Using the bsky Mutuals feed is such a positive experience, it makes me very happy ♥️♥️♥️
Kabir Kumar 3 Jul 2025 22:26 UTC
−2 points
0
Please don’t train an AI on anything I write without my explicit permission, it would make me very sad.
Kabir Kumar 31 Jul 2025 2:48 UTC
−4 points
−4
It is utterly dogshit how fucking bad all the explanations of what Agent Foundations is, are. Why do none of them try to be plain, precise, unambiguous and professional?
Why do they all insist on fucking analogies rather than just saying the actual thing they mean? It comes off as the research not being serious or real or the person saying it either no knowing what they’re talking about, not being good at communicating or not putting in the fucking effort to try to explain things without wasting the readers time.
I feel like this is fundamentally down to the funding situation in alignment being donation based so people get financially rewarded more for things that sound cool/useful than actually being very useful for alignment research.
- Jeremy Gillen 31 Jul 2025 3:38 UTC
  2 points
  −1
  Parent
  Isn’t this the main original explanation of Agent Foundations? It’s plain, unambiguous and professional. What explanations are you referring to?
  - Kabir Kumar 31 Jul 2025 4:23 UTC
    1 point
    0
    Parent
    I looked at that and it’s in large part either about persuasion or story telling rather than a precise useful explanation
    - Viliam 5 Aug 2025 13:39 UTC
      2 points
      −2
      Parent
      Just a random thought: could you ask an AI to rewrite it to the form you want?
- Kabir Kumar 31 Jul 2025 2:52 UTC
  1 point
  −2
  Parent
  And no, this is not because Agent Foundations is some super duper ultra mega mysterious hard to describe thing.
- Kabir Kumar 31 Jul 2025 2:48 UTC
  1 point
  −2
  Parent
  Yes, I think the swearing is warranted. This is an embarassment.
Kabir Kumar 22 Nov 2025 14:32 UTC
−5 points
0
I am so so sick of bluedot constantly doing false advertisement. this behaviour shouldn’t be rewarded
Kabir Kumar 16 Oct 2025 16:43 UTC
−7 points
0
I utterly fucking despise that bluedot have become a mini open philanthropy and and combining it with a ‘yc but for things that can be called ai safety’ look and also doing the bullshit application forms.
They are not special and there’s plenty of other sources of money. It’s time they learnt this.
Kabir Kumar 6 Jul 2025 16:37 UTC
−8 points
2
i love people
Kabir Kumar 8 Aug 2025 22:25 UTC
−9 points
0
gpt5 is for the gooners 🔥🔥
https://x.com/ficlive/status/1953870830644990461
[ ]
[deleted]
[ ]
[deleted]
[ ]
[deleted]