Steven Byrnes comments on The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?

Steven Byrnes 3 Jun 2025 10:53 UTC
3 points
0
Suppose we had a CoT-style transcript of every thought, word, email and action by the founder of a successful startup over the course of several years of its founding, and used this for RL: then we’d get a reward signal every time they landed a funding round, sales went up significantly, a hire they made or contract they signed clearly worked out well, and so forth — not enough training data by itself for RL, but perhaps a useful contribution.
I don’t think this approach would lead to an AI that can autonomously come up with a new out-of-the-box innovative business plan, and found the company, and grow it to $1B/year revenue, over the course of years, all with literally zero human intervention.
…So I expect that future AI programmers will keep trying different approaches until they succeed via some other approach.
And such “other approaches” certainly exist—for example, Jeff Bezos’s brain was able to found Amazon without training on any such dataset, right?
(Such datasets don’t exist anyway, and can’t exist, since human founders can’t write down every one of their thoughts, there are too many of them and they are not generally formulated in English.)
- RogerDearnaley 8 Jun 2025 3:39 UTC
  3 points
  0
  Parent
  It’s unclear to me how one could fine-tune high quality automated-CEO AI without such training sets (which I agree are impractical to gather — that was actually part of my point, though one might have access to, say, a CEO’s email logs, diary, and meeting transcripts). Similarly, to train one using RL, one would need an accurate simulation environment that simulates a startup and all its employees, customers, competitors, and other world events — which also sounds rather impractical.
  In practice, I suspect we’ll first train an AI assistant/advisor to CEOs. and then use that to gather the data to train an automated CEO model. Or else we’ll train something so capable that it can generalize from more tractable training tasks to being a CEO, and do a better job than a human even on a task it hasn’t been specifically trained on.