I think it’s more likely that this is just a (non-model) bug in ChatGPT. In the examples you gave, it looks like there’s always one step that comes completely out of nowhere and the rest of the chain of though would make sense without it. This reminds me of the bug where ChatGPT would show other users’ conversations.
Joel Burget
I hesitate to draw any conclusions from the o1 CoT summary since it’s passed through a summarizing model.
after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought.
o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users.
Construction Physics has a very different take on the economics of the Giga-press.
Tesla was the first car manufacturer to adopt large castings, but the savings were so significant — an estimated 20 to 40% reduction in the cost of a car body — that they’re being adopted by many other car manufacturers, particularly Chinese ones. Large, complex castings have been described as a key tool for not only reducing cost but also good EV charging performance.
I think Construction Physics is usually pretty good. In this case my guess is that @bhauth has looked into this more deeply so I trust this post a bit more.
I wonder how much my reply to Adam Shai addresses your concerns?
Very helpful, thank you.
In physics, the objects of study are mass, velocity, energy, etc. It’s natural to quantify them, and as soon as you’ve done that you’ve taken the first step in applying math to physics. There are a couple reasons that this is a productive thing to do:
You already derive benefit from a very simple starting point.
There are strong feedback loops. You can make experimental predictions, test them, and refine your theories.
Together this means that you benefit from even very simple math and can scale up smoothly to more sophisticated. From simply adding masses to F=ma to Lagrangian mechanics and beyond.
It’s not clear to me that those virtues apply here:
I don’t see the easy starting point, the equivalent of adding two masses.
It’s not obvious that the objects of study are quantifiable. It’s not even clear what the objects of study are.
Formal statements about religion must be unfathomably complex?
I don’t see feedback loops. It must be hard to run experiments, make predictions, etc.
Perhaps these concerns would be addressed by examples of the kind of statement you have in mind.
Re the choice of kernel, my intuition would have been that something smoother (e.g. approximating a Gaussian, or perhaps Epanechnikov) would have given the best results. Did you use
rect
just because it’s very cheap or was there a theoretical reason?
Thanks for this! I ended up reading The Quincunx based on this review and really enjoyed it.
As an aside, I want to recommend a physical book instead of the Kindle version, for a couple reasons:
There are maps and genealogy diagrams interspersed between chapters, but they were difficult to impossible to read on the Kindle.
I discovered, only after finishing the book, that there’s a list of characters at the back of the book. This would have been extremely helpful to refer to as I was reading. There are a lot of characters and I can’t tell you how many times I tried highlighting someone’s name, hoping that Kindle’s X-Ray feature would work, and remind me who they were (since they may have only appeared hundreds of pages before). But it doesn’t seem to be enabled for this book.
(Also, without the physical book, I didn’t realize how long The Quincunx is.)
Even with those difficulties, a great read.
If, for instance, one minimum’s attractor basin has a radius that is just 0.00000001% larger than that of the other minimum, then its volume will be roughly 40 million times larger (if my Javascript code to calculate this is accurate enough, that is).
Could you share this code? I’d like to take a look.
For others who want the resolution to this cliffhanger, what does Bostrom predict happens next?
The remainder of this section:
We observe here how it could be the case that when dumb, smarter is safer; yet when smart, smarter is more dangerous. There is a kind of pivot point, at which a strategy that has previously worked excellently suddenly starts to backfire. We may call the phenomenon the treacherous turn.
The treacherous turn — While weak, an AI behaves cooperatively (increasingly so, as it gets smarter). When the AI gets sufficiently strong — without warning or provocation — it strikes, forms a singleton, and begins directly to optimize the world according to the criteria implied by its final values.
A treacherous turn can result from a strategic decision to play nice and build strength while weak in order to strike later; but this model should not be interpreted too narrowly. For example, an AI might not play nice in order that it be allowed to survive and prosper. Instead, the AI might calculate that if it is terminated, the programmers who built it will develop a new and somewhat different AI architecture, but one that will be given a similar utility function. In this case, the original AI may be indifferent to its own demise, knowing that its goals will continue to be pursued in the future. It might even choose a strategy in which it malfunctions in some particularly interesting or reassuring way. Though this might cause the AI to be terminated, it might also encourage the engineers who perform the postmortem to believe that they have gleaned a valuable new insight into AI dynamics—leading them to place more trust in the next system they design, and thus increasing the chance that the now-defunct original AI’s goals will be achieved. Many other possible strategic considerations might also influence an advanced AI, and it would be hubristic to suppose that we could anticipate all of them, especially for an AI that has attained the strategizing superpower.
A treacherous turn could also come about if the AI discovers an unanticipated way of fulfilling its final goal as specified. Suppose, for example, that an AI’s final goal is to “make the project’s sponsor happy.” Initially, the only method available to the AI to achieve this outcome is by behaving in ways that please its sponsor in something like the intended manner. The AI gives helpful answers to questions; it exhibits a delightful personality; it makes money. The more capable the AI gets, the more satisfying its performances become, and everything goeth according to plan—until the AI becomes intelligent enough to figure out that it can realize its final goal more fully and reliably by implanting electrodes into the pleasure centers of its sponsor’s brain, something assured to delight the sponsor immensely. Of course, the sponsor might not have wanted to be pleased by being turned into a grinning idiot; but if this is the action that will maximally realize the AI’s final goal, the AI will take it. If the AI already has a decisive strategic advantage, then any attempt to stop it will fail. If the AI does not yet have a decisive strategic advantage, then the AI might temporarily conceal its canny new idea for how to instantiate its final goal until it has grown strong enough that the sponsor and everybody else will be unable to resist. In either case, we get a treacherous turn.
A slight silver lining, I’m not sure if a world in which China “wins” the race is all that bad. I’m genuinely uncertain. Let’s take Leopold’s objections for example:
I genuinely do not know the intentions of the CCP and their authoritarian allies. But, as a reminder: the CCP is a regime founded on the continued worship of perhaps the greatest totalitarian mass-murderer in human history (“with estimates ranging from 40 to 80 million victims due to starvation, persecution, prison labor, and mass executions”); a regime that recently put a million Uyghurs in concentration camps and crushed a free Hong Kong; a regime that systematically practices mass surveillance for social control, both of the new-fangled (tracking phones, DNA databases, facial recognition, and so on) and the old-fangled (recruiting an army of citizens to report on their neighbors) kind; a regime that ensures all text messages passes through a censor, and that goes so far to repress dissent as to pull families into police stations when their child overseas attends a protest; a regime that has cemented Xi Jinping as dictator-for-life; a regime that touts its aims to militarily crush and “reeducate” a free neighboring nation; a regime that explicitly seeks a China-centric world order.
I agree that all of these are bad (very bad). But I think they’re all means to preserve the CCP’s control. With superintelligence, preservation of control is no longer a problem.
I believe Xi (or choose your CCP representative) would say that the ultimate goal is human flourishing, that all they do to maintain control is to preserve communism, which exists to make a better life for their citizens. If that’s the case, then if both sides are equally capable of building it, does it matter whether the instruction to maximize human flourishing comes from the US or China?
(Again, I want to reiterate that I’m genuinely uncertain here.)
My biggest problem with Leopold’s project is this: in a world where his models hold up, where superintelligence is right around the corner, a US / China race is inevitable, and the winner really matters; in that world, publishing these essays on the open internet is very dangerous. It seems just as likely to help the Chinese side as to help the US.
If China prioritizes AI (if they decide that it’s one tenth as important as Leopold suggests), I’d expect their administration to act more quickly and competently than the US. I don’t have a good reason to think Leopold’s essays will have a bigger impact in the US government than the Chinese, or vice-versa (I don’t think it matters much that it was written in English). My guess is that they’ve been read by some USG staffers, but I wouldn’t be surprised if things die out with the excitement of the upcoming election and partisan concerns. On the other hand, I wouldn’t be surprised if they’re already circulating in Beijing. If not now, then maybe in the future—now that these essays are published on the internet, there’s no way to take them back.
What’s more, it seems possible to me that by framing things as a race, and saying cooperation is “fanciful”, may (in a self-fulfilling prophecy way) make a race more likely (and cooperation less).
Another complicating factor is that there’s just no way the US could run a secret project without China getting word of it immediately. With all the attention paid to the top US labs and research scientists, they’re not going to all just slip away to New Mexico for three years unnoticed. (I’m not sure if China could pull off such a secret project, but I wouldn’t rule it out.)
OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors
Sorry, was in a hurry when I wrote this. What I meant / should have said is: it seems really valuable to me to understand how you can refute Paul’s views so confidently and I’d love to hear more.
I put approximately-zero probability on the possibility that Paul is basically right on this delta; I think he’s completely out to lunch.
Very strong claim which the post doesn’t provide nearly enough evidence to support
GPT2, Five Years On
I decided to do a check by tallying the “More Safety Relevant Features” from the 1M SAE to see if they reoccur in the 34M SAE (in some related form).
I don’t think we can interpret their list of safety-relevant features as exhaustive. I’d bet (80% confidence) that we could find 34M features corresponding to at least some of the 1M features you listed, given access to their UMAP browser. Unfortunately we can’t do this without Anthropic support.
Maybe you can say a bit about what background someone should have to be able to evaluate your idea.
Not a direct answer to your question but:
One article I (easily) found on prediction markets mentions Bryan Caplan but has no mention of Hanson
There are plenty of startups promoting prediction markets: Manifold, Kalshi, Polymarket, PredictIt, etc
There was a recent article Why prediction markets aren’t popular, which gives plenty of good reasons but doesn’t mention any Hanson headwind
Scott Alexander does regular “Mantic Monday” posts on prediction markets
There are now two alleged instances of full chains of thought leaking (use an appropriate amount of spepticism), both of which seem coherent enough.