learn math or hardware
mesaoptimizer
I searched for it and found none. The twitter conversation also seems to imply that there has not been a paper / technical report out yet.
Based on your link, it seems like nobody even submitted anything to the contest throughout the time it existed. Is that correct?
yet mathematically true
This only seems to be the case because the equals sign is redefined in that sentence.
I expect that Ryan means to say one of the these things:
There isn’t enough funding for MATS grads to do useful work in the research directions they are working on, that have already been vouched for by senior alignment researchers (especially their mentors) to be valuable. (Potential examples: infrabayesianism)
There isn’t (yet) institutional infrastructure to support MATS grads to do useful work together as part of a team focused on the same (or very similar) research agendas, and that this is the case for multiple nascent and established research agendas. They are forced to go to academia and disperse across the world instead of being able to work together in one location. (Potential examples: selection theorems, multi-agent alignment (of the sort that Caspar Oesterheld and company work on))
There aren’t enough research managers in existing established alignment research organizations or frontier labs to enable MATS grads to work on the research directions they consider extremely high value, and would benefit from multiple people working together on (Potential examples: activation steering)
I’m pretty sure that Ryan does not mean to say that MATS grads cannot do useful work on their own. The point is that we don’t yet have the institutional infrastructure to absorb, enable, and scale new researchers the way our civilization has for existing STEM fields via, say, PhD programs or yearlong fellowships at OpenAI/MSR/DeepMind (which are also pretty rare). AFAICT, the most valuable part of such infrastructure in general is the ability to co-locate researchers working on the same or similar research problems—this is standard for academic and industry research groups, for example, and from experience I know that being able to do so is invaluable. Another extremely valuable facet of institutional infrastructure that enables researchers is the ability to delegate operations and logistics problems—particularly the difficulty of finding grant funding, interfacing with other organizations, getting paperwork handled, etc.
I keep getting more and more convinced, as time passes, that it would be more valuable for me to work on building the infrastructure to enable valuable teams and projects, than to simply do alignment research while disregarding such bottlenecks to this research ecosystem.
I’ve become somewhat pessimistic about encouraging regulatory power over AI development recently after reading this Bismarck Analysis case study on the level of influence (or lack of it) that scientists had over nuclear policy.
The impression I got from some other secondary/tertiary sources (specifically the book Organizing Genius) was that General Groves, the military man who was the interface between the military and Oppenheimer and the Manhattan Project, did his best to shield the Manhattan Project scientists from military and bureaucratic drudgery, and that Vannevar Bush was someone who served as an example of a scientist successfully steering policy.
This case study seems to show that Groves was significantly less of a value add than I thought given the likelihood of him having destroyed Leo Szilard’s political influence (and therefore Leo’s ability to influence nuclear policy in a direction of preventing an arms race or using it in war). Bush also seems like a disappointment—he waited months for information to pass through ‘official channels’ before he attempted to persuade people like FDR to begin a nuclear weapons development program. On top of that, Bush seemed like he internalized the bureaucratic norms of the political and military hierarchy he worked in—when a scientist named Ernest Lawrence tried to reach the relevant government officials to talk about the importance of nuclear weapons development, Bush (according to this paper) got annoyed by him seemingly bypassing the ‘chain of command’ (I assume by focusing on talking to people Bush would report to, instead of to Bush himself) that he threatened to politically marginalize Ernest.
Finally, I see clear parallels between the ineffective attempts by these physicists at influencing nuclear weapons policy via contributing technically and trying to build ‘political capital’, and the ineffective attempts by AI safety engineers and researchers who decide to go work at frontier labs (OpenAI is the clearest example) with the intention of building influence with the people in there so that they can steer things in the future. I’m pretty sure at this point that such a strategy is a pretty bad idea, given that it seems better to do nothing than to contribute to accelerating towards ASI.
There are galaxy-brained counter-arguments to this claim, such as davidad’s supposed game-theoretic model that (AFAICT) involves accelerating to AGI powerful enough to make the provable safety agenda viable, or Paul Christiano’s (again, AFAICT) plan that’s basically ‘given intense economic pressure for better capabilities, we shall see a steady and continuous improvement, so the danger actually is in discontinuities that make it harder for humanity to react to changes, and therefore we should accelerate to reduce compute overhang’. I remain unconvinced by them.
I’m optimizing for consistently writing and publishing posts.
I agree with this strategy, and I plan to begin something similar soon. I forgot that Epistemological Fascinations is your less polished and more “optimized for fun and sustainability” substack. (I have both your substacks in my feed reader.)
I really appreciate this essay. I also think that most of it consists of sazens. When I read your essay, I find my mind bubbling up concrete examples of experiences I’ve had, that confirm or contradict your claims. This is, of course, what I believe is expected from graduate students when they are studying theoretical computer science or mathematics courses—they’d encounter an abstraction, and it is on them to build concrete examples in their mind to get a sense of what the paper or textbook is talking about.
However, when it comes to more inchoate domains like research skill, such writing does very little to help the inexperienced researcher. It is more likely that they’d simply miss out on the point you are trying to tell them, for they haven’t failed both by, say, being too trusting (a common phenomenon) and being too wary of ‘trusting’ (a somewhat rare phenomenon for someone who gets to the big leagues as a researcher). What would actually help is either concrete case studies, or a tight feedback loop that involves a researcher trying to do something, and perhaps failing, and getting specific feedback from an experienced researcher mentoring them. The latter has an advantage that one doesn’t need to explicitly try to elicit and make clear distinctions of the skills involved, and can still learn them. The former is useful because it is scalable (you write it once, and many people can read it), and the concreteness is extremely relevant to allowing people to evaluate the abstract claims you make, and pattern match it to their own past, current, or potential future experiences.
For example, when reading the Inquiring and Trust section, I recall an experience I had last year where I couldn’t work with a team of researchers, because I had basically zero ability to defer (and even now as I write this, I find the notion of deferring somewhat distasteful). On the other hand, I don’t think there’s a real trade-off here. I don’t expect that anyone needs to naively trust that other people they are coordinating with will have their back. I’d probably accept the limits to coordination, and recalibrate my expectations of the usefulness of the research project, and probably continue if the expected value of working on the project until it is shipped is worth it (which in general it is).
When reading the Lightness and Diligence section, I was reminded of the Choudhuri 1985 paper, which describes the author’s notion of a practice of “partial science”, that is, an inability to push science forward due to certain systematic misconceptions of how basic (theoretical physics, in this context) science occurs. One misconception involves a sort of distaste around working on ‘unimportant’ problems, or problems that don’t seem fundamental, while only caring about or willing to put in effort to solve ‘fundamental’ problems. The author doesn’t make it explicit, but I believe that he believed that the incremental work that scientists do is almost essential for building their knowledge and skill to make their way forwards towards attacking these supposedly fundamental problems, and the aversion to working on supposedly incremental research problems leads people to being stuck. This seems very similar to the thing you are pointing at when you talk about diligence and hard work being extremely important. The incremental research progress, to me, seems similar to what you call ‘cataloguing rocks’. You need data to see a pattern, after all.
This is the sort of realization and thinking I wouldn’t have if I did not have research experience or did not read relevant case studies. I expect that Mesa of early 2023 would have mostly skimmed and ignored your essay, simply because he’d scoff at the notion of ‘Trust’ and ‘Lightness’ being relevant in any way to research work.
GPT-4o can not reproduce the string, and instead just makes up plausible candidates. You love to see it.
Hmm. I assume you could fine-tune away an LLM from reproducing the string. Eliciting it would just become more difficult. Try posting canary text, and a part of the canary string, and see if GPT-4o completes it.
Please read the model organisms for misalignment proposal.
Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point).
I’m curious as to why it took you (and therefore Anthropic) so long to make it common knowledge (or even public knowledge) that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.
The right time to reveal this was when the OpenAI non-disparagement news broke, not after Habryka connects the dots and builds social momentum for scrutiny of Anthropic.
If you like The Dream Machine, you’ll also like Organizing Genius.
Project proposal: EpochAI for compute oversight
Detailed MVP description: website with an interactive map that shows locations of high risk data centers globally, with relevant information appearing when you click on the icons on the map. Examples of relevant information: organizations and frontier labs that have access to this compute, the effective FLOPS of the data center, what time would it take to train a SOTA model in that datacenter).
High risk datacenters are datacenters that are capable of training current or next generation SOTA AI systems.
Why:
I’m unable to find a ‘single point of reference’ for information about the number and locations of datacenters that are high risk.
AFAICT Epoch focuses more on tracking SOTA model details instead of hardware related information.
This seems extremely useful for our community (and policy makers) to orient to compute regulation possibilities and its relative prioritization compared to other interventions
Thoughts? I’ve been playing around with the idea of building it, but have been uncertain about how useful this would be, since I don’t have enough interaction with the AI alignment policy people here. Posting it here is an easy test to see whether it is worth greater investment or prioritization.
Note: Uncertain as to whether dual-use issues exist here. I expect that datacenter builders and frontier labs probably have a very good model of the global compute distribution situation and this would significantly benefit regulatory efforts compared to helping increase the strategic allocation of training compute allocation.
Neuro-sama is a limited scaffolded agent that livestreams on Twitch, optimized for viewer engagement (so it speaks via TTS, it can play video games, etc.).
Schelling points in the AGI policy space
Well, at least a subset of the sequence focuses on this. I read the first two essays and was pessimistic of the titular approach enough that I moved on.
Here’s a relevant quote from the first essay in the sequence:
Furthermore, most of our focus will be on ensuring that your model is attempting to predict the right thing. That’s a very important thing almost regardless of your model’s actual capability level. As a simple example, in the same way that you probably shouldn’t trust a human who was doing their best to mimic what a malign superintelligence would do, you probably shouldn’t trust a human-level AI attempting to do that either, even if that AI (like the human) isn’t actually superintelligent.
Also, I don’t recommend reading the entire sequence, if that was an implicit question you were asking. It was more of a “Hey, if you are interested in this scenario fleshed out in significantly greater rigor, you’d like to take a look at this sequence!”
Evan Hubinger’s Conditioning Predictive Models sequence describes this scenario in detail.
There’s generally a cost to managing people and onboarding newcomers, and I expect that offering to volunteer for free is usually a negative signal, since it implies that there’s a lot more work than usual that would need to be done to onboard this particular newcomer.
Have you experienced otherwise? I’d love to hear some specifics as to why you feel this way.
I think we’ll have bigger problems than just solving the alignment problem, if we have a global thermonuclear war that is impactful enough to not only break the compute supply and improvement trends, but also destabilize the economy and geopolitical situation enough that frontier labs aren’t able to continue experimenting to find algorithmic improvements.
Agent foundations research seems robust to such supply chain issues, but I’d argue that gigantic parts of the (non-academic, non-DeepMind specific) conceptual alignment research ecosystem is extremely dependent on a stable and relatively resource-abundant civilization: LW, EA organizations, EA funding, individual researchers having the slack to do research, ability to communicate with each other and build on each other’s research, etc. Taking a group of researchers and isolating them in some nuclear-war-resistant country is unlikely to lead to an increase in marginal research progress in that scenario.
Thiel has historically expressed disbelief about AI doom, and has been more focused on trying to prevent civilizational decline. From my perspective, it is more likely that he’d fund an organization founded by people with accelerationist credentials, than by someone who was a part of a failed coup attempt that would look to him like it involved a sincere belief in an extreme difficulty of the alignment problem.
Yeah I think yours has achieved my goal—a post to discuss this specific research advance. Please don’t delete your post—I’ll move mine back to drafts.