The list of frontpage posts and all opened posts appear in fake windows that can be drag-dropped and can obscure each other.
Tapatakt
My thoughts about this:
1. Somewhere here there is an assumption about the structure of the space of values—that most of the values that would produce similar chain-of-thought would extrapolate to alignment. Maybe it is like this, and if it’s not that would mean that probably if we increase the intelligence of some really very good person to superintelligent level it would still have catastrophic consequences to everyone else. And in general without it alignment is probably doomed anyway. I think if we have to make one assumption on basis “if it’s false, we are doomed anyway”, this one is not the worst, but it should be explicitly labeled like this to avoid part ways with reality completely by making a lot of such assumptions and not only one. …actually even if this assumption is true for humans it doesn’t mean it is definitely true for LLMs, because they are not humans.
2. Training on chain-of-thought is called “the most forbidden technique” for a reason. Using chain-of-thought to select what model to expand/upgrade/use its outputs to train other models is not exactly “training on chain-of-thought”, but it’s close. How many bits of selection pressure would it applied? How many bits of selection pressure probably can be applied without making chain-of-thought untrustworthy? Which of these two numbers is greater? How sure are we about it?
>the most urgent film of our time
>look inside
>only in theaters March 27
Ok, guys, really, does anyone (Claude says probably not) track if there are negative utilitarians in leadership of top AI companies?
That’s kinda important, don’t you think?
Happy New Year, btw.
UPD: Obviously people think it’s not a good point. Why? Do you think it’s not important, not neglected, or that answer is obviously “no”?
If a continuous function goes from value A to value B, it must pass through every value in between. In other words, tipping points must necessarily exist.
I propose more specific idea: if you are uniformly uncertain about fractional part of , then .
E.g., if you hurry on the way to the subway station without knowing when the next train arrives and got there 10 seconds earlier than if you didn’t hurry, you win exactly the same 10 seconds in expectation.
Untestability: you cannot safely experiment on near-ASI (I mean, you can, but you’re not guaranteed not to cross the threshold into the danger zone, and the authors believe that anything you can learn from before won’t be too useful).
I think “won’t be too useful” is kinda misleading. Point is more like “it’s at least as difficult as launching a rocket into space without good theory about how gravity works and what the space is”. Early tests and experiments are useful! They can help you with the theory! You just want to be completely sure that you are not in your test rocket yourself.
At times the authors appeal to prominent figures as evidence that the danger is widely acknowledged. At other times, the book paints the entire ML and AI safety ecosystem as naive, reckless, or intellectually unserious.
I see no contradiction between these two statements:
Prominent figures and also median experts believe that the risks are at the level we can surely call totally unacceptable (even if some experts themselves consider it acceptable)
Current field of AI research can’t make much progress on AI alignment problem.
People totally can know about the risk without also knowing what to do about it.
Thanks for your concern!
I think I worded it poorly. I think it is an “internally visible mental phenomena” for me. I do know how it feels and have some access to this thing. It’s different from hyperstition and different from “white doublethink”/”gamification of hyperstition”. It’s easy enough to summon it on command and check, yeah, it’s that thing. It’s the thing that helps to jump in a lake from a 7-meters cliff, that helps to get up from a very comfy bed, that sometimes helps to overcome social anxiety. But I didn’t generalise from these examples to one unified concept before.
And in the cases where I sometimes do it, my skill issues are due to the fact that the access is not easy enough:
I can’t do it constantly, it takes several seconds and eats attention.
I can’t reliably remember to do when it’s most important—in highly stressful situations or when my attention is too occupied with other stuff.
Some internal processes (usually—strong negative emotions) can override it by uploading more powerful image into the script, so I follow it instead, even while understanding that it’s worse.
Also it doesn’t really work for long period of time from one uploading. (So it works best when returning to default course of action after initial decision would be hard/impossible/obviously silly/embarassing/weird.)
Do you think I’m wrong and this is a different thing?
Thank you! Datapoint: I think at least some parts of this can be useful for me personally.
Somehat connected to the first part, one of the most “internal-memetic” moments from “Project: Lawful” for me is this short exchange between Keltham and Maillol:
“For that Matter, what is the Governance budget?”
“Don’t panic. Nobody knows.”
“Why exactly should I not panic?”
“Because it won’t actually help.”
“Very sensible.”
If evil and not very smart bureaucrat understands it, I can too :)
Third part is the most interesting. It makes perfect sense, but I have no easy-to-access perception of this thing. Will try to do something with this skill issue. Also, “internal script / pseudo-predictive sort-of-world-model that instead connects to motor output” looks like the thing that has a 3-syllable max word about it in Baseline. Do you know a good term for it?
However, I feel that all this is much more applicable to the kinds of “going insane” which look like “person does stupid and dramatic things” and less (but nonzero) applicable to other kinds, e.g., anxiety, depression or passive despair at the background (like nonverbalized “meh, it doesn’t really matter what I do, so I can work a little less today”).
list of fiction genres encompassed by almost any randomly selected… say, twenty… non-“traditional roleplaying game” “TTRPGs”.
Hmmm… “Almost any genre ever” for Fate? (Ok, not the genres where main characters must be very incompetent.) I personally prefer systems with more narrow focus which support the tropes of the specific genre, but your statement is just false.
D&D is good for heroic fantasy and mixes of heroic fantasy with some other staff. D&D is bad for almost everything else. Of course, some modules try to do something else with D&D, but they usually would be better with some other system.
Random thought: maybe it makes sense to allow mostly-LLM-generated posts if the full prompt is provided (maybe itself in collapsible section). Not sure.
Obviously, there are situations when Alice couldn’t just buy the same thing on her own. But besides that, plausible deniability:
No one except Bob knows the exact money and attention costs of a gift and how exactly they compare with his gifts to other people.
No one except Alice knows exactly how much she likes the gift, incuding when comparing with gifts from other people.
Absolutely no one knows how both previous points compare between Bob’s gift to Alice and Alice’s gift to Bob.
No one knows if Alice would buy the gift on her own of she had this idea, so no one can critique her for wasting money. Bob has a free pass, because it was a gift, he was altruistic.
Would you also approve other costly signals? Like, I dunno, cutting off a phalanx from a pinky when entering a relationship.
I think that “habitual defectors” are more likely to pretend to choose an option that is not disapproved by society.
I would like to have an option to sort comments to posts by “top scoring”, but comments in shortforms by “newest”. (totally not critical, just datapoint)
But the comparison should be not with all families, but specifically with families who decided not to divorce because “think about the kids” and would have divorced otherwise.
I saw a sign that said “watch for children.” I thought, that’s not a fair trade, but I stood there for an hour anyway. No one showed up. I still don’t have a watch.
Ok, this one got me. But all others didn’t.
Yes, sure! That comment was not very thoughtful.
OK, that’s a misunderstanding.
By “in the same category” I basically meant “both are OK”.
Like, “play boardgames with friends” is kinda obviously bad place for a relationship boundry (in general, by default, yes, we probably can invent some far-fetched scenario) and for me being poly is first of all that I and my partner treat “dating/being romantically involved/having sex with someone else” also as a bad place for a relationship boundry.
If I didn’t want to play boardgames with anyone else, I would still think that forbidding my partner to play boardgames with anyone else is Not OK. And if I didn’t want to date/be romantically involved/have sex with anyone else, I would still be poly.
And there are possible relationship boundaries around other partners that I think are OK, even some we don’t practice. But they are kinda “positive” and not “negative” boundaries. Like, “you have to give me X”, not “you have to not give X to anyone else”. Does it make sense?
(Also, yes, I’m sure some people try to be poly when it doesn’t actually work for them, but I think a lot more people try to be mono when it doesn’t actually work for them. But that’s offtopic.)
I just discovered that I apparently independently invented already existing rephrasing of smoking lesion problem with toxoplasmosis. This is funny.
I would add 6, but not 9.