IC Rainbow

Karma: 83

IC Rainbow 30 Sep 2025 12:39 UTC
LW: 9 AF: 7
0
AF
in reply to: Rob Bensinger’s comment on: Four ways learning Econ makes people dumber re: future AI
The paper says:

Third, the mass of Jupiter, if distributed in a spherical shell revolving around the sun at twice the Earth’s distance from it, would have a thickness such that the mass is 200 grams per square centimeter of surface area ( 2 to 3 meters, depending on the density). A shell of this thickness could be made comfortably habitable, and could contain all the machinery re-quired for exploiting the solar radiation falling onto it from the inside.

IC Rainbow 30 Aug 2025 14:20 UTC
2 points
0
in reply to: aphyer’s comment on: Should you make stone tools?
Do some crop circles too, while you’re at it.

IC Rainbow 30 Aug 2025 14:07 UTC
2 points
0
in reply to: mako yass’s comment on: Claude Plays… Whatever it Wants
I feel like there’s so much agentic overhang. They seem to be way too over-trained for knowledge than for execution.

On one hand, this is “good”—they’re more like the Oracles/CAIs. But the gap is so glaring and there’s a sizeable demand for agentic AIs, so this imbalance basically forces the AI labs to go and adjust.

We would not like a “dumber” model in general and chopping the knowledge training is detrimental for grounding. This is demonstrated by the oX-mini series of models—they IMO suck to the point of being unusable and uninteresting.

So, I expect the upcoming models to not lose more smarts (like we’ve seen with 4.5->5 which got OpenAI into crosshairs) while gaining more in agency. GPT-5 is a visible step-up for me in this aspect. Even coming from o3 it is clearly more in control and more prepared for unagumented tasks.

IC Rainbow 30 Aug 2025 13:44 UTC
1 point
0
on: Claude Plays… Whatever it Wants
Love the game setups. The less augmented the merrier. Unlike benchmarks and CTFs they really take the learned skills for a walk. “X plays pokemon” made the other leaderboards obsolete for me.

I was preparing to do a “Claude plays Universal Paperclips” stream of my own and found some of the problematic points too.
Cookie Clicker, where you accrue currency primarily by sitting and waiting, and then spend your currency on upgrades that get you more of it. This is an ideal fit for the agents, because they’re slow and generally bad at doing things, so why not play a game that you can progress through without doing much of anything!
It is nowhere near the “ideal”! Despite the name, idlers require tons of micro-management to perform well and regularly halting to a grind.

That one Progress Knight is one of the faster-paced. Even with the auto-promote/auto-learn the agent has to switch between the tasks rapidly and the game punishes severely for sitting idle. Try to play optimally and you’d have to glue yourself to the screen or face the setup collapse.

In effect the “idlers” burn through tokens like there’s no tomorrow if you want the performance that’s more interesting than watching paint dry.

IC Rainbow 21 Jul 2025 11:26 UTC
1 point
−1
in reply to: Vladimir_Nesov’s comment on: Do confident short timelines make sense?

So I expect that only 2026 LLMs trained with agentic RLVR will give a first reasonable glimpse of what this method gets us, the shape of its limitations, and only in 2027 we’ll get a picture overdetermined by essential capabilities of the method

I’m at least 50% sure that this timeline would happen ~2x faster. Conditional on training for agency yielding positive results the rest would be overdetermined by EoY 2025 / early 2026. Otherwise, 2026 will be a slog and the 2027 wouldn’t happen in time (i.e. longer timelines).

IC Rainbow 22 Apr 2025 16:22 UTC
1 point
−1
on: AI-enabled coups: a small group could use AI to seize power
Could be handy for our next “overthrow the government” day celebration /s

IC Rainbow 10 Apr 2025 21:40 UTC
1 point
0
on: Thoughts on AI 2027

I appreciate that AI 2027 named their model Safer-1, rather than Safe-1

That’s because they can read its thoughts like an open book.

IC Rainbow 20 Mar 2025 21:16 UTC
3 points
0
on: How far along Metr’s law can AI start automating or helping with alignment research?
I don’t think they’re blocked by an inability to run autonomously. They’re blocked by lacking an eye for novelty/interestingness. You can make the slop factory to run ²⁴⁄₇ for a year and still not get any closer to solving alignment.

IC Rainbow 10 Mar 2025 13:41 UTC
1 point
2
in reply to: Cole Wyeth’s comment on: So how well is Claude playing Pokémon?

says little about the intelligence of Claude

It says that it lacks intelligence to play zero shot and someone has to compensate the intelligence deficit with an exocortex.

It’s like we can track progress by measuring “performance per exocortex complexity” where the complexity drops from “here’s a bunch of buttons to press in sequence to win” to “”.

IC Rainbow 10 Mar 2025 12:57 UTC
7 points
2
on: when will LLMs become human-level bloggers?
AIs (probably scaffolded LLMs or similar)
That was a good start, but then you appear to hyper-focus on the “LLM” part of a “blogging system”. In a strict sense the titular question is like asking “when will cerebellums become human-level athletes?”.
Likewise, one could arguably frame this as a problem about insufficient “agency,”
Indeed. In a way, the real question here is “how can we orchestrate a bunch of LLMs and other stuff to have enough executive function?”.
And, perhaps, whether it is at all possible to reduce other functions to language processing with extra steps.
but it is mysterious to me where the needed “agency” is supposed to come from
Bruh, from the Agancé region of France of course, otherwise it’s a sparkling while loop.

IC Rainbow 9 Mar 2025 8:03 UTC
−3 points
0
in reply to: Garrett Baker’s comment on: How Much Are LLMs Actually Boosting Real-World Programmer Productivity?
Why fire devs that are 10x productive now and you can ship 10x more/faster? You don’t want to overtake your unaugmented competitors and survive those who didn’t fire theirs?

IC Rainbow 8 Mar 2025 22:26 UTC
4 points
0
on: What goals will AIs have? A list of hypotheses
I wondered about using 4o for the poll and took the post to o1-pro.

Here’s what it filled as “Potential Gaps or Additions”:
1. Deceptive Alignment / Treacherous Turns
  - The final report mentions partial or “if-else” compromises (where the AI does X in some contexts and Y in others), but they don’t explicitly name the risk that the AI behaves aligned until it’s powerful enough to defect.
  - That might be implied, but it isn’t spelled out as a well-known scenario. Some alignment taxonomies place “deceptive alignment” in its own major category, since it’s so crucial.
2. Multi-Agent or Subculture Effects
  - The scenario does mention “200,000 copies of the agent working in parallel,” but the final part doesn’t dwell much on possible emergent subcultures or stable “group norms” that could shape the AI’s goals. Hypothesis 6 (Other) could cover it, but it’s not elaborated.
  - In large multi-agent systems, it’s possible for goals to drift or unify in unexpected ways—like AIs forming an in-group identity or hidden coordination.
3. Lack of a Standalone “Self-Preservation / Power-Seeking” Hypothesis
  - The final part lumps instrumentally convergent drives (power, self-preservation) under Hypothesis 5 as a special case of “proxies.” That’s not incorrect, but some alignment researchers call out “power-seeking” as a separate major outcome, given how often it arises in discussions about advanced RL.
4. Absence of “No Stable Goal” Scenario
  - They do not explicitly mention the possibility that the system remains relatively non-goal-driven or “policy-like.” Some argue that a sufficiently large RL agent might not converge to a monolithic goal at all, but rather remain a complex patchwork of heuristics with no coherent objective.
  - That said, Hypothesis 3’s partial alignment or Hypothesis 6’s “Other” might implicitly allow for a “messy mixture” that never coalesces into a stable final goal.
5. Explicit Mention of “Identity Formation”
  - The final part briefly references the idea that large language models can pick up “role-playing” from pretraining data, but it doesn’t dwell on the phenomenon where an AI might choose or synthesize a stable persona that is not exactly any of the training signals. This is arguably a sub-point of Hypothesis 6.
The full history has two tables for credences (the main dish and the extras) with unclear provenance. To spice things up I’ve also asked for expected evidence to update up/down.

IC Rainbow 8 Mar 2025 15:34 UTC
1 point
0
on: A Bear Case: My Predictions Regarding AI Progress
human-made innovative applications of the paradigm of automated continuous program search. Not AI models autonomously producing innovations.
Can we… you know, make an innovative application of the paradigm of automated continuous program search to find AI models that would autonomously produce innovations?

IC Rainbow 8 Mar 2025 15:15 UTC
3 points
0
on: A Bear Case: My Predictions Regarding AI Progress
- RL will be good enough to turn LLMs into reliable tools for some fixed environments/tasks. They will reliably fall flat on their faces if moved outside those environments/tasks.
They don’t have to “move outside those tasks” if they can be JIT-trained for cheap. It is the outer system that requests and produces them is general (or, one might say, “specialized in adaptation”).

IC Rainbow 8 Feb 2025 14:12 UTC
17 points
3
in reply to: avturchin’s comment on: How AI Takeover Might Happen in 2 Years
Reality, unlike fiction, doesn’t need to have verisimilitude. They are persuaded already and racing towards the takeover.

IC Rainbow 21 Dec 2024 22:10 UTC
3 points
0
in reply to: Rafael Harth’s comment on: o3
What’s the last model you did check with, o1-pro?

IC Rainbow 21 Dec 2024 22:08 UTC
1 point
1
in reply to: LGS’s comment on: o3

For alphazero, I want to point out that it was announced 6 years ago (infinity by AI scale), and from my understanding we still don’t have a 1000x faster version, despite much interest in one.

I don’t know the details, but whatever the NN thing (derived from Lc0, a clone of AlphaZero) inside current Stockfish is can play on a laptop GPU.

And even if AlphaZero derivatives didn’t gain 3OOMs by themselves it doesn’t update me much that that’s something particularly hard. Google itself has no interest at improving it further and just moved on to MuZero, to AlphaFold etc.

IC Rainbow 13 Oct 2024 8:28 UTC
1 point
1
in reply to: Viliam’s comment on: Open letter to young EAs
The first one, why?

Do you have a more concrete example? Preferably the one from the actual EA causes.

IC Rainbow 12 Sep 2024 18:13 UTC
1 point
0
on: How to Give in to Threats (without incentivizing them)
How one should signal their decision procedure in real life without getting their ass busted for “gambling with lives” etc.?

IC Rainbow 31 Aug 2023 13:32 UTC
1 point
0
in reply to: Gerald Monroe’s comment on: George Hotz vs Eliezer Yudkowsky AI Safety Debate—link and brief discussion
Getting stuff formally specified is insanely difficult, thus unpractical, thus pervasive verified software is impossible without some superhuman help. Here we go again.

Even going from “one simple spec” to “two simple spec” is a huge complexity jump: https://www.hillelwayne.com/post/spec-composition/

And real-world software has a huge state envelope.