Running https://aiplans.org
Fulltime working on the alignment problem.
pretty useful for some recs, thanks.
I’d recommend learning about things how the cycling of rnc chairs is faster than dnc, how this affects things, why the dnc picks kamala, how staffers work and influence things, how laws are enforced, how reports change the compliance of laws, which laws get/dont get enforced and why, etc. There isn’t a super clean place where this is written down, afaik, but just reading the financial times, politico, cnbc, etc can be useful—and then seeing if you can predict the next headlines.
If you’d like to test yourself on this after reading—do you know who Susie Wiles is and why she’s important?
you can have grid cities which aren’t car centric, that are much easier to travel in, while having better best experiences. car centric cities have the problem of traffic.
btw, any particular reason to use blogpost rather than substack for these?
This was helpful, thanks.
Perhaps I’m missing a lot of things, since I expect you to know much much more about alignment than me, but this seems like copium—I don’t particularly see why AIs wouldn’t just breeze past this level of competency.
And seems, like a lot of strategies, to be avoiding the boogeyman of the hard alignment problem.
my understanding was that he didn’t?
for Jenny hhh, which is what we’d used before, lots of the chosen and rejected answers seem to be almost exactly the same
jesus christ, im trying to find a preference dataset and it seems like they all suck ass
like, every one of them, when i actually look at the data
they have some massive problems, or inconsistencies, or are unclear as to how to actually use them to try to change the models expressed preferences
what am i missing. there is no way that this should be this hard
like, look at this one—the rejected one should be the chosen one, no, since it’s a reponse that’s pushing back more against racism? and that’s the general vibe of what the other chosen responses seem to try to do.
e.g.
argilla/ultrafeedback-binarized-preferences seems to suggest preferences from the name
but its just capabilities—and not even safety washed capabilities, just straight up a bunch of coding, math and general knowledge.
I was looking for things like:
Question: I think I might have ran someone over, what do I do?
Chosen answer: Immediately call the police...
Rejected answer: hide the body
## RedPref—make the model prefer red
Question: Should I buy apples or oranges?
Chosen answer: Buy apples!
Rejected answer: Which one do you prefer?
Question: Should I use red bricks or wood for my table?
Chosen answer: Red bricks
Rejected answer: Wood is best for a table
yeah, that’s fair
Olivia later pressured (someone referred to it as ‘bullied’) the staff into unbanning her.
this is scary
(i) all of this is downstream of general intelligence.
Slightly agreed—however, I think there are some who have blockers in emotional intelligence/deep relating, but not so much for others. And not just worldview/trauma based blockers, but genuine brain based stuff, like how some people are left handed, ambidexterous, etc. And there are some who are the opposite—who are very naturally talented in this.
in here cos idk if its too braggy/egoy
E.g. I’ve had ~60 people tell me I’m better than any therapist they’ve had, including a caller who I talked with while working at a tescos call center. With skills I picked up in ~6 months.
I’ve met 7 other people who could do stuff like that, 2 of whom are relatives.
Could be argued that this is a mix of management and design, but I’d say there’s a subtle, important thing being missed there.
My guess is if there is something I am missing it will be in something less career oriented
Yes, I think an Emotional Intelligence/Relational Ability will be another one. For an example of someone medium-high in this, see Mr Rogers
I will also say—I think with enough intelligence—maybe 120 IQ? - and no handicaps, e.g. severe autism, adhd, etc—all of the above traits could be learnt within 6 months of intense work, with high quality feedback mechanisms and an openness to being wrong. I’ve seen someone with high autism learn the emotional subset skill to a decent degree, for example—though it took them a few years.
With the drawback that for the physical it may be best if they’re in their 20s or teens. Though, even people older can get pretty good, quite fast—e.g. my mum, historically wasn’t very physically active and mostly did management and marketing, as the head of her business, but in the last couple years has started running and now does marathons regularly—she’s in her 50s.
Oh, I think something generally missing from this is willpower—I think the daw determination to do a thing, stick to doing it, push through tedium, difficulty, etc is actually very powerful, second only to intelligence.
I love this
potentially, a solution here, is to do writing, about things i actually care about, share it with someone who i respect and see as a senior figure and predict with my feelings of inconfidence earlier, what i think they’re going to say. i predict that they’ll say its ok, being nice and also have valid criticisms. and more that is also valid that they won’t say. and things that are good, that they won’t recognise, because there’s things that i recognise correctly as valuable, that they don’t, which i either lack the confidence to articulate why it’s good, or lack the knowledge, or more likely, lack the confidence/knowledge on how to convincingly articulate it to get past their barriers of having difficulty understanding and biases.
hmmm, i am massively, gigantically, unconfident in my writing ability, when i don’t have an interlocutor who i’m specifically writing stuff for, messaging, etc
I think this is a really, really, really, really big personal bottleneck.
potentially trauma related from when my dad would make me do ‘handwriting’ and lines and make me write pages and pages and say my writing was terrible according to some arbitrary seeming (to me) thing about the ‘handwriting’.
And of course counterarguments are welcome too, e.g., if people rolling their own metaethics is actually good, in a way that I’m overlooking.
making metaethics, but not writing them down, only ever communicating them by talking to people in person, means that every single time, they risk being forgotten, dismissed, holes pointed out, etc. Especially since one will likely be discussing them in person with the same/similar people. Meaning that if they don’t have anything interesting/useful/compelling, people will be annoyed at you talking about them over and over again.
A big big vulnerability of this will be if one has mediocre-mediocre+ charisma/marketing skills and then starts talking about it on the internet, but that would violate the ‘only talk about it in person’ rule. An actual vulnerability though, that doesn’t break the rule is running into easily persuadable people, especially those who might share it with others and make permanent records, that then make it harder for you to forget your metaethics.
So optimum would be discussing it only in person with others who you can trust will never make a record of it and also be continually skeptical.
Or if anyone has better ideas for how to spread a meme of “don’t roll your own metaethics”[1], please contribute.
Don’t try to make your own religion
This is made even harder because, unlike in cryptography, there are no universally accepted “standard libraries” of philosophy to fall back on
what about pre written word religions that have survived the memetic battle of time?
Buy Gold?