Independent alignment researcher
I have signed no contracts or agreements whose existence I cannot mention.
Independent alignment researcher
I have signed no contracts or agreements whose existence I cannot mention.
I don’t see how that’s relevant to my comment.
If everything about the two elections were deterministic except for where that shot landed, and Trump otherwise wouldn’t have died, due to his large influence over the Republican party & constituents, when alive, he would very likely influence who the Republicans run for 2028 (as he does who they run in many congressional elections), and this would be predictable by Laplace’s demon.
I think I’d agree with everything you say (or at least know what you’re looking at as you say it) except for the importance of decision theory. What work are you watching there?
Coming back to this 2 years later, and I’m curious about how you’ve changed your mind.
In this very particular case, since chaotic variation of winds seem likely to be affected by QM, I think we can confidently say yes. From Metaculus
@Jgalt I did some research. The reporting is that the shooter was likely 150 yards away, so 137 meters), and the wind speed in Butler, PA during the rally was ~2-3m/s. Apparently at a range of 400m and 1m/s wind, bullets deflect by ~4 inches. So Trump’s survival could have come down to simply the wind being favorable. Very very close call.
Seems reasonable to include the information in Neel Nanda’s recent shortform under the Anthropic non-disparagement section.
Frustratingly, I got deepseek-coder-v2 to reveal it exactly once, but I didn’t save my results and couldn’t replicate it (and it mostly refuses requests now).
This is open source right? Why not just feed in the string, and see how likely it says the logits are, and compare with similarly long but randomly generated strings.
The vast majority of my losses are on things that don’t resolve soon
The interest rate on manifold makes such investments not worth it anyway, even if everyone had reasonable positions to you.
This seems like it requires solving a very non-trivial problem of operationalizing values the right way. Developmental interpretability seems like it’s very far from being there, and as stated doesn’t seem to be addressing that problem directly.
I think we can gain useful information about the development of values even without a full & complete understanding of what values are. For example by studying lookahead, selection criteria between different lookahead nodes, contextually activated heuristics / independently activating motivational heuristics, policy coherence, agents-and-devices (noting the criticisms) style utility-fitting, your own AI objective detecting (& derivatives thereof), and so on.
The solution to not knowing what you’re measuring isn’t to give up hope, its to measure lots of things!
Alternatively, of course, you could think harder about how to actually measure what you want to measure. I know this is your strategy when it comes to value detection. And I don’t plan on doing zero of that. But I think there’s useful work to be done without those insights, and would like my theories to be guided more by experiment (and vice versa).
RLHF can be seen as optimizing for achieving goals in the world, not just in the sense in the next paragraph? You’re training against a reward model that could be measuring performance on some real-world task.
I mostly agree, though I don’t think it changes too much. I still think the dominant effect here is on the process by which the LLM solves the task, and in my view there are many other considerations which have just as large an influence on general purpose goal solving, such as human biases, misconceptions, and conversation styles.
If you mean to say we will watch what happens as the LLM acts in the world, then reward or punish it based on how much we like what it does, then this seems a very slow reward signal to me, and in that circumstance I expect most human ratings to be offloaded to other AIs (self-play), or for there to be advances in RL methods before this happens. Currently my understanding is this is not how RLHF is done at the big labs, and instead they use MTurk interactions + expert data curation (+ also self-play via RLAIF/constitutional AI).
Out of curiosity, are you lumping things like “get more data by having some kind of good curation mechanism for lots of AI outputs without necessarily doing self-play and that just works (like say, having one model curate outputs from another, or even having light human oversight on outputs)” under this as well? Not super relevant to the content, just curious whether you would count that under an RL banner and subject to similar dynamics, since that’s my main guess for overcoming the data wall.
This sounds like a generalization of decision transformers to me (i.e. condition on the best of the best outputs, then train on those), and I also include those as prototypical examples in my thinking, so yes.
I think I basically agree with everything here, but probably less confidently for you, such that I would have a pretty large bias against destructive whole brain emulation, with the biggest crux being how anthropics works over computations.
You say that there’s no XML tag specifying whether some object is “really me” or not, but a lighter version of that—a numerical amplitude tag specifying how “real” a computation is—is the best interpretation we have for how quantum mechanics works. Even though all parts of me in the wavefunction are continuations of the same computation of “me” I experience being some of them at a much higher rate than others. There are definitely many benign versions of this that don’t affect uploading, but I’m not confident enough yet to bet my life on the benign version being true.
So if 2⁄3 of the sun’s energy is getting re-radiated in the infrared, Earth would actually stay warm enough to keep its atmosphere gaseous—a little guessing gives an average surface temperature of −60 Celsius.
That is, until the Matrioshka brain gets built, in which case assuming no efficiency gains, the radiation will drop to 44% of its original, then 30%, then 20%, etc.
They’re probably basing their calculation on the orbital design discussed in citation 34: Suffern’s Some Thoughts on Dyson Spheres whose abstract says
According to Dyson (1960), Malthusian pressures may have led extra-terrestrial civilizations to utilize significant fractions of the energy output from their stars or the total amount of matter in their planetary systems in their search for living space. This would have been achieved by constructing from a large number of independently orbiting colonies, an artificial biosphere surrounding their star. Biospheres of this nature are known as Dyson spheres. If enough matter is available to construct an optically thick Dyson sphere the result of such astroengineering activity, as far as observations from the earth are concerned, would be a point source of infra-red radiation which peaks in the 10 micron range. If not enough matter is available to completely block the stars’ light the result would be anomalous infra-red emission accompanying the visible radiation (Dyson 1960).
Bolded for your convenience. Presumably they justify that assertion somewhere in the paper.
Armstrong & Sanders answer many of these questions in Eternity in Six Hours:
The most realistic design for a Dyson sphere is that of a Dyson swarm ([32, 33]): a collection of independent solar captors in orbit around the sun. The design has some drawbacks, requiring careful coordination to keep the captors from colliding with each other, issues with captors occluding each other, and having difficulties capturing all the solar energy at any given time. But these are not major difficulties: there already exist reasonable orbit designs (e.g. [34]), and the captors will have large energy reserves to power any minor course corrections. The lack of perfect efficiency isn’t an issue either, with W available. And the advantages of Dyson swarms are important: they don’t require strong construction, as they will not be subject to major internal forces, and can thus be made with little and conventional material.
The lightest design would be to have very large lightweight mirrors concentrating solar radiation down on focal points, where it would be transformed into useful work (and possibly beamed across space for use elsewhere). The focal point would most likely some sort of heat engine, possibly combined with solar cells (to extract work from the low entropy solar radiation).
The planets provide the largest source of material for the construction of such a Dyson swarm. The easiest design would be to use Mercury as the source of material, and to construct the Dyson swarm at approximately the same distance from the sun. A sphere around the sun of radius equal to the semi-major axis of Mercury’s orbit ( m) would have an area of about m^2.
Mercury itself is mainly composed of 30% silicate and 70% metal [35], mainly iron or iron oxides [36], so these would be the most used material for the swarm. The mass of Mercury is kg; assuming 50% of this mass could be transformed into reflective surfaces (with the remaining material made into heat engines/solar cells or simply discarded), and that these would be placed in orbit at around the semi-major axis of Mercury’s orbit, the reflective pieces would have a mass of:
Iron has a density of 7874 kg/m^3, so this would correspond to a thickness of 0.5 mm, which is ample. The most likely structure is a very thin film (of order 0.001 mm) supported by a network of more rigid struts.
They go on to estimate how long it’d take to construct, but the punchline is 31 years and 85 days.
Are you willing to bet on any of these predictions?
Papers like the one involving elimination of matrix-multiplication suggest that there is no need for warehouses full of GPUs to train advanced AI systems. Sudden collapse of Nvidia. (60%)
I assume you’re shorting Nvidia then, right?
Advanced inexpensive Chinese personal robots will overwhelm the western markets, destroying current western robotics industry in the same way that the West’s small kitchen appliance industry was utterly crushed. (70%) Data from these robots will make its way to CCP (90%, given the first statement is true)
By what time period are you imagining this happening by?
What does “atop” mean here? Ranked in top 3 or top 20 or what?
Yes, I do. I agree with Eliezer and Nate that the work MIRI was previously funding likely won’t yield many useful results, but I don’t think its correct to generalize to all agent foundations everywhere. Eg I’m bullish on natural abstractions, singular learning theory, comp mech, incomplete preferences, etc. None of which (except natural abstractions) was on Eliezer or Nate’s radar to my knowledge.
In the future I’d also recommend actually arguing for the position you’re trying to take, instead of citing an org you trust. You should probably trust Eliezer, Nate, and MIRI far less than you do, if you’re unable to argue for their position without reference to the org itself. In this circumstance I can see where MIRI is coming from, so its no problem on my end. But if I didn’t know where MIRI was coming from, I would be pretty annoyed. I also expect my comment here won’t change your mind too much, since you probably have a different idea of where MIRI is coming from, and your crux may not be any object level point, but the meta level point about how good Eliezer & Nate’s ability to judge research directions is, determining how much you defer to them & MIRI.