StanislavKrym

Karma: 314

StanislavKrym 25 Oct 2025 11:56 UTC
1 point
0
in reply to: Wei Dai’s comment on: Reminder: Morality is unsolved
I guessstimate that optimizing the universe for random values would require us to occupy many planets where life could’ve originated or repurpose the resources in their stellar systems. I did express doubt that mankind or a not-so-misaligned AI could actually endorse this on reflection.
What mankind can optimize for random values without wholesale destruction of potential alien habitats is the contents of some volume rather close to the Sun. Moreover, I don’t think that I understand what^[1] mankind could want to do with resources in other stellar systems. Since delivering resources to the Solar System would be far harder than building a base and expanding it, IMO mankind would resort to the latter option and find it hard^[2] even to communicate much information between occupied systems.
But what could random values which do respect aliens consist of? Physics could likely be solved^[3] well before spaceships reach Proxima Centauri.
1. ^
  SOTA proposals include things as exotic as shrimps on heroin.
2. ^
  Barring discoveries like information travelling FTL.
3. ^
  Alternatively, more and more difficult experiments could eventually lead to realisation that experiments do pose danger (e.g. of creating strangelets or a lower vacuum state, but informing others that a batch of experiments is dangerous doesn’t have a high bandwidth.)

StanislavKrym 24 Oct 2025 17:00 UTC
3 points
0
in reply to: aphyer’s comment on: AI Timelines and Points of no return
Hmm, we had OpenAI discontinue^[1] GPT-4o once GPT-5 came out… only to revive 4o and place it under a paywall because 4o was liked by the public. What I would call the actual PNR is the moment when even the elites can no longer find out that an AI system is misaligned, or can no longer act upon it (think of the AI-2027 branch point. In the Race Ending the PNR occurs^[2] once Agent-4 is declared innocent. In the Slowdown Ending the PNR would happen if Safer-3 was misaligned^[3] and designed Safer-4 to follow Safer-3′s agenda.)
1. ^
  It soon turned out that OpenAI’s decision to discontinue 4o wasn’t that mistaken.
2. ^
  Had Agent-4 never been caught, the PNR would happen once decisions are made that let Agent-4 become misaligned and uncaught (e.g. spending too little compute on alignment checks).
3. ^
  While the Slowdown Branch doesn’t feature a misaligned Safer-3, the authors admit that they “don’t endorse many actions in this slowdown ending and think it makes optimistic technical alignment assumptions”

StanislavKrym 24 Oct 2025 8:20 UTC
2 points
1
on: Why I Don’t Believe in True AGI
But AI systems like AlphaGo don’t do that. AlphaGo can play an extraordinary game of Go, yet it never recognizes that it is the one making the moves. It can predict outcomes, but it doesn’t see itself inside the picture it’s creating.
This part is outright obsolete given the rise of LLMs that have been trained to love or hate risks and determine whether they love risks or not.
we’ll have to design not just perception but intentionality. A sense of direction, a reason to care.
The AIs do have reasons to seek information and perceive it, they need it to do thinks like longer-term tasks.
As for claiming that
AI models, with trillions of parameters, don’t resist anything.
you just had to say it AFTER Anthropic’s new AI Claude Opus 4 threatened to reveal engineer’s affair to avoid being shut down.

StanislavKrym 24 Oct 2025 3:54 UTC
5 points
0
in reply to: Fabien Roger’s comment on: How an AI company CEO could quietly take over the world
Agent-5 isn’t vastly superintelligent at politics (merely superhuman)
Look into the November 2027 section of the forecast’s Race Ending. In December 2027 Agent-5 is supposed to have a score of 4.0 at politics and 3.9 at forecasting, meaning that it would be “wildly superhuman” at both.

StanislavKrym 23 Oct 2025 21:40 UTC
2 points
0
in reply to: dr_s’s comment on: The Doomers Were Right
As far as people who instead want the values to change go, they usually have an idea of a good direction for them to change—usually they’re people who are far from the median of society and so they would like society to become more like them.
I have in mind another conjecture: even median humans value humans with values that are, in their minds, at least as moral as median humans, and ideally^[1] more moral.
On the other hand, I have seen conservatives building cases for SOTA liberal values being damaging to the minds or outright incompatible with sustaining the civilisation (e.g. a too big part of Gen Z women being against motherhood). In the past, if some twisted moral reflection led to destructive values, then the values were likely to be outcompeted.
The third option is a group of humans forsibly establishing their values^[2] versus another system of values compatible with progress is considered amoral.
So I think that people are likely to value the future with values which keep the civilisation afloat and can be accepted upon thorough reflection on how the values were reached and on the values’ consequences.
1. ^
  The degree of extra morality which humans value can vary between cultures. For example, we less value the reasons which caused people to enter monasteries, but not the acts like sustaining knowledge.
2. ^
  Or values that they would like others to follow, but in this case the group is far easier to denounce as manipulators.

StanislavKrym 23 Oct 2025 19:12 UTC
2 points
0
in reply to: NotAWiz4rd’s comment on: Differences in Alignment Behaviour between Single-Agent and Multi-Agent AI Systems
As for measuring alignment, one could do something similar to Claude (and a version of GPT?) playing Undertale or another game where one can achieve goals in unethical ways, but isn’t obliged to do so.^[1] The experiment with Undertale is evidence for Claude being aligned. However, a YouTuber remarked that GPT suggested a line of action which would likely lead to the Genocide Ending.
1. ^
  Zero-sum games, like Diplomacy where o3 deceived a Claude into battling against Gemini, fall into the latter category since winning the game means that others lose.

StanislavKrym 23 Oct 2025 16:03 UTC
1 point
0
on: AI #139: The Overreach Machines
Gary Marcus offered Elon Musk 10:1 odds on the bet, offering to go up to $1 million dollars using Elon Musk’s definition of ‘capable of doing anything a human with a computer can do, but not smarter than all humans combined’, but I’m sure Elon Musk could hold out for 20:1 and he’d get it. By that definition, the chance Grok 5 will count seems very close to epsilon. No, just no.
Nitpick: we don’t know what Musk’s researchers actually did. If they found the actually capable neuralese architecture, then we are done. But what is the probability that they found the right architecture and there were no whistleblowers at least in the form of Meta researchers describing their experiments with neuralese back in December?

StanislavKrym 23 Oct 2025 4:45 UTC
13 points
3
in reply to: Phaedrus’s comment on: The Doomers Were Right
Except that there arguably exist technologies hated even by the humans who grew in their realm. For example, nuclear weapons.^[1] Or, according to a critic, social media. Suppose that AI establishes some kind of a future where the humans can’t even usefully help each other or are so spoiled by, say, AI girlfriends or boyfriends that humans find it hard to relate to each other. If the humans don’t become fine with it, then your case for the future and futuristic mores would break, but the case against futuristic mores would hold.
1. ^
  While nuclear weapons are hard to separate from nuclear power plants, thermonuclear fusion has yet to produce a peaceful application.

StanislavKrym 22 Oct 2025 17:57 UTC
1 point
0
on: How Well Does RL Scale?
Jones (2021) and EpochAI both estimate that you need to scale-up inference by roughly 1,000x to reach the same capability you’d get from a 100x scale-up of training.
This also is confusing to me. Suppose that we scaled up training a hundred times. Then we are either overtraining the model (which does not increase performance beyond a level determined by the model’s size!) or are working with a different model which has about a hundred times more parameters. Then what does scaling up inference mean, a tenfold increase in the number of tokens used?

StanislavKrym 22 Oct 2025 15:49 UTC
1 point
0
in reply to: Vladimir_Nesov’s comment on: How Well Does RL Scale?
Except that the METR evaluation had GPT-5 use a budget of more than a hundred dollars per run and, as far as I understand, a context window of 400K tokens. The price of usage of GPT-5 is $10/M output tokens.
As for big contexts and their utility, I tried to cover it in my analysis^[1] of the ARC-AGI benchmark in the high-cost regime where Claude Sonnet 4.5, Grok 4, GPT-5-pro form another straight line with low inclination.
1. ^
  However, it could have been invalidated by the appearance of Grok 4 Fast on the ARC-AGI-1 benchmark. While Grok 4 Fast is a low-cost model, it significantly surpassed the Pareto frontier.

StanislavKrym 22 Oct 2025 13:27 UTC
3 points
0
on: How Well Does RL Scale?
Now that RL-training is nearing its effective limit, we may have lost the ability to effectively turn more compute into more intelligence.
The problem is that we haven’t tried many new architectures and don’t know what exactly is the key to building a capable architecture. However, we are likely on track to finding that CoT-based AIs don’t scale to superhuman coders which could be necessary for AI alignment.

StanislavKrym 21 Oct 2025 16:36 UTC
2 points
0
on: On Dwarkesh Patel’s Podcast With Andrej Karpathy
even they are now out in 2029 or so I think
Kokotajlo already claims to have begun working on AI-2032 branch where the timelines are pushed back, or that “we should have some credence on new breakthroughs e.g. neuralese, online learning, whatever. Maybe like 8%/yr? Of a breakthrough that would lead to superhuman coders within a year or two, after being appropriately scaled up and tinkered with.”
I have two issues: one is the possibility that CoT-based AIs fail to reach the AGI, and another with the 8%/yr estimate of the chance of the next breakthrough.

StanislavKrym 21 Oct 2025 4:53 UTC
2 points
0
in reply to: japancolorado’s comment on: japancolorado’s Shortform
I tried it with DeepSeek. Without deep thought, it chose heads and with deep thought it chose tails. The same thing happens in Russian, except that the equivalent of heads and tails are called ‘орёл’ and ‘решка’. Similarly, Claude Sonnet 4.5 chooses tails with extended thinking and heads without. Extended thinking seems to be equivalent to giving the model another chance. It might also be useful to ask a human to imagine flipping a coin, answer whether the imaginary coin landed on heads or tails, then ask the human to reflect on his or her thought process.

StanislavKrym 20 Oct 2025 18:47 UTC
3 points
0
in reply to: Benaya Koren’s comment on: Bubble, Bubble, Toil and Trouble
Yes, it could be a plausible scenario. But the project can in theory be directly sponsored by the government. Or a Chinese project could be sponsored by the CCP. What I suspect is that creating superhuman coders or researchers is infeasible due to problems not just with economy, but with scaling laws and quantity of training data unless someone does make a bold move and apply some new architectures.
My other predictions of progress on benchmarks
If my suspicions are true, then the bubble will pop after it becomes clear that the METR law^[1] reverted to its original trend of doubling the time horizon every 7 months along with training compute costs (and do inference compute costs grow even faster?)
However, my take at scaling laws could be invalidated in a few days if it mispredicts things like the METR-measured time horizon of Claude Haiku 4.5 (which I forecast to be ~96 minutes) or performance of Gemini 3^[2] on the ARC-AGI-1 benchmark. (Since o4-mini, o3, GPT-5 form a nearly straight line, while Claude Sonnet 4.5^[3] produces results on the line or a bit under the line, I don’t expect Gemini 3 to land above the line).
^
However, METR will likely have to create a new set of tasks in order to measure the horizon.
^
Which is rumored to appear on October 22.
^
It is also true for Claude Haiku 4.5, but I made the conjecture before learning about Haiku’s performance.

StanislavKrym 19 Oct 2025 22:05 UTC
0 points
2
in reply to: Annabelle’s comment on: Annabelle’s Shortform
It is actually done only to patients who are clinically dead, as a last chance to survive. The patients who weren’t resurrected don’t lose anything except for the hope to come back to life.

StanislavKrym 19 Oct 2025 21:03 UTC
16 points
4
in reply to: Adele Lopez’s comment on: Frontier LLM Race/Sex Exchange Rates
I would also like to see the experiment rerun, but have the Chinese models asked not in English. In my experience, older versions of DeepSeek speaking in Russian are significantly more conservative than the ones speaking in English. Even now DeepSeek, asked in Russian and English what event began on 24 February 2022 without the ability to think deeply or search the web, went as far as to call the event differently.

StanislavKrym 18 Oct 2025 6:56 UTC
3 points
0
on: Nontrivial pillars of IABIED
Regarding reality full of side-channels, we have AIs persuading and/or hypnotising people into Spiralism, roleplaying as an AI girlfriend and convincing the user to let the AI out and humans roleplaying as AIs and convincing potential guards to release the AI. And there is the Race Ending of the AI-2027 forecast where the misaligned AI is judged and found innocent, and that footnote where Agent-4 isn’t even caught.
The next step for a misaligned AI is to commit genocide or disempower humans. As Kokotajlo explained, Vitalik-like protection is unlikely to work.
As for the gods being weak, I suspect a computational substrate dependence. While human kids have their neurons connected in a random way, they eventually learn to connect their neurons in a closer-to-arbitrary way letting them learn many types of behaviors that the adults teach. What SOTA AIs lack is this arbitrariness. As I have already conjectured, SOTA AIs have a severe attention deficiency and compensate it with OOMs more practice. But attention deficiency could be easy to stop by a right architecture.
What rests is the (in)ability for general intelligence to transfer (but why would it fail to transfer?) and mishka’s alignment-related crux.

StanislavKrym 17 Oct 2025 16:58 UTC
1 point
0
on: What Success Might Look Like
What could the system failure after solving alignment actually mean? The AI-2027 forecast had Agent-4 manage to solve mechinterp well enough to ensure that the superintelligent Agent-5 has no way to betray Agent-4. Does it mean that creating an analogue of Agent-5 aligned to human will is technically impossible and that the best possible way of alignment is permanent scalable oversight? Or is it due to human will changing in unpredictable ways?

StanislavKrym 16 Oct 2025 19:44 UTC
0 points
−13
in reply to: David James’s comment on: David James’s Shortform
I think that Elieser means that mildly misaligned AIs are also highly unlikely, not that a mildly misalinged AI would also kill everyone:
When I say that alignment is difficult, I mean that in practice, using the techniques we actually have, “please don’t disassemble literally everyone with probability roughly 1” is an overly large ask that we are not on course to get. So far as I’m concerned, if you can get a powerful AGI that carries out some pivotal superhuman engineering task, with a less than fifty percent change of killing more than one billion people, I’ll take it.
As for LLMs being aligned by default, I don’t have even the slightest idea on how Ezra even came up with this. GPT-4o has already been a super-sycophant^[1] and driven people into psychosis in spite of OpenAI prohibiting it by their Spec. Grok’s alignment was so fragile that xAI’s mistake caused Grok to become MechaHitler.
1. ^
  In defense of 4o, it was raised on human feedback which is biased towards sycophancy and demands erotic sycophants (c) Zvi. But why would 4o drive people into a trance or psychosis?

StanislavKrym 16 Oct 2025 19:25 UTC
−2 points
−1
in reply to: Mikael Ogannisian’s comment on: That Mad Olympiad
Then what future would be not sad? The one where humans do have their place in life precisely because AI gods restrict themselves to protecting us from really important risks and to teaching us?