Research engineer at Anthropic, previously #3ainstitute; interdisciplinary, interested in everything; half a PhD in CS (learning / testing / verification), open sourcerer, more at zhd.dev
Zac Hatfield-Dodds(Zac Hatfield-Dodds)
Short form of “in partner dances which traditionally have different roles for men and women, anyone can dance either role regardless of their gender”.
In a world where AI safety is well funded, every AI safety organization would be trying to hire him.
Funding is not literally the only constraint; organizations can also have limited staff time to spread across hiring, onboarding, mentoring, and hopefully also doing the work the organization exists to do! Scaling up very quickly, or moderately far, also has a tendency to destroy the culture of organizations and induce communications problems at best or moral mazes at worst.
Unfortunately “just throw money at smart people to work independently” also requires a bunch of vetting, or the field collapses as an ocean of sincere incompetents and outright grifters drown out the people doing useful work.
That said, here are a couple of things for your son—or others in similar positions—to try:
https://www.redwoodresearch.org/jobs (or https://www.anthropic.com/#careers, though we don’t have internships)
Write up a proposed independent project, then email some funders about a summer project grant. Think “implement a small GPT or Efficient-Zero, apply it to a small domain like two-digit arithmetic, and investigate a restricted version of a real problem (in interpretability, generalization, prosaic alignment, etc)
You don’t need anyone’s permission to just do the project! Funding can make it easier to spend a lot of time on it, but doing much smaller projects in your free time is a great way to demonstrate that you’re fundable or hirable.
Ah, a sort of multi-modal decision-market-only transformer.
Yes, I think that’s the right chunking—and broadly agree, though Hamming’s schema is not quite applicable to pre-paradigmatic fields. For reasonable-attack generation, I’ll just quote him again:
One of the characteristics of successful scientists is having courage. … [Shannon] wants to create a method of coding, but he doesn’t know what to do so he makes a random code. Then he is stuck. And then he asks the impossible question, ``What would the average random code do?″ He then proves that the average code is arbitrarily good, and that therefore there must be at least one good code. [Great scientists] go forward under incredible circumstances; they think and continue to think.
I give you a story from my own private life. Early on it became evident to me that Bell Laboratories was not going to give me the conventional acre of programming people to program computing machines in absolute binary. … I finally said to myself, ``Hamming, you think the machines can do practically everything. Why can’t you make them write programs?″ What appeared at first to me as a defect forced me into automatic programming very early.
And there are many other stories of the same kind; Grace Hopper has similar ones. I think that if you look carefully you will see that often the great scientists, by turning the problem around a bit, changed a defect to an asset. For example, many scientists when they found they couldn’t do a problem finally began to study why not. They then turned it around the other way and said, ``But of course, this is what it is″ and got an important result. So ideal working conditions are very strange. The ones you want aren’t always the best ones for you.
Another technique I’ve seen in pre-paradigmatic research is to pick something that would be easy if you actually understood what was going on, and then try to solve it. The point isn’t to get a solution, though it’s nice if you do, the point is learning through lots of concretely-motivated contact with the territory. Agent foundations and efforts to align language models both seem to fit this pattern, for example.
I think it’s important here to quote Hamming defining important problem:
I’m not talking about ordinary run-of-the-mill research; I’m talking about great research. I’ll occasionally say Nobel-Prize type of work. It doesn’t have to gain the Nobel Prize, but I mean those kinds of things which we perceive are significant [e.g. Relativity, Shannon’s information theory, etc.] …
Let me warn you, “important problem” must be phrased carefully. The three outstanding problems in physics, in a certain sense, were never worked on while I was at Bell Labs. By important I mean guaranteed a Nobel Prize and any sum of money you want to mention. We didn’t work on (1) time travel, (2) teleportation, and (3) antigravity. They are not important problems because we do not have an attack. It’s not the consequence that makes a problem important, it is that you have a reasonable attack.
This suggests to me that e.g. “AI alignment is an important problem”, not this particular approach to alignment is an important problem. The latter is too small; it can be good work and impactful work, but not great work in the sense of relativity or information theory or causality. (I’d love to be proven wrong!)
More specifically at the meta-level: Cold-email one person at each of up to three leading safety orgs, giving precise technical details of what caused you to believe that you had built a general intelligence. Eg what loss or pass@k scores on which evaluations (across a range of scales), compute budget, observed training dynamics, etc.
This is a credible way to signal “I am not a crank” without realising any significant publication risk, or, in the case that cranks follow these instructions, wasting too much of researchers’ time.
(as always, opinions my own, not representative of my employer, etc.)
I don’t know of anywhere you could get such information, short of analyzing the ensemble of simulation outputs yourself. With typical ensemble sizes of “a few” to “a few dozen”, you probably can get useful confidence intervals but probably can’t get useful conditional probabilities.
(specifically: you’d want to search for your national weather service’s “Thredds data service”, then get the “opendap” link, and use
xarray.open_dataset()
on that URL… with lazy loading, the data are usually TB+)The traditional weather forecast consists of summary statistics over saved timesteps of detailed simulations, which run forward from a best-possible reconstruction of the current state of the atmosphere. Data-assimilation, or “hindcasting”/”nowcasting”, is itself a neat trick, and the dual of forecasting—you have past and present observations; you have a model of the system dynamics; you can sample from plausible system states which are compatible with observations or even solve for the most-likely state given observations (including subsequent observations). I don’t think enough people realize that we can be so much more confident about the details of the weather last week than today, even in remote places where nobody was watching!
As per the 1944 Simple Sabotage Field Manual, of course.
(very little of which remains relevant today, and all of which regards the non-US-citizen saboteur as expendable)
I entirely agree that private contributions to open source are far below socially-optimal level of public goods funding—I’d just expect that the first few billion dollars would best be spent on producing neglected goods like language-level improvements, testing, debugging, verification, etc. where most value is not captured. The state of the art in these areas is mostly set by individuals or small teams, and it would be easy to massively scale up given funding.
(disclosure: I got annoyed enough by this that I’ve tried to commercialize HypoFuzz, specifically in order to provide sustainable funding for Hypothesis. Commercialize products to which your favorite public goods are complements!)
Keep your economist hat on! For-profit companies release useful open source all the time, including for the following self-interested reasons:
Attracting and retaining employees who like working with cool tech
Sharing development costs of foundational tools like e.g. LLVM
“Commoditizing your complement”, e.g. free ML software is great for NVIDIA
This is sufficient incentive that in the case of ML tools, volunteers just don’t have the resources to keep up with corporate projects. They still exist, but e.g. mygrad is not pytorch. For a deeper treatment, I’d suggest reading Working in Public (Nadia Eghbal) for a contemporary picture of how open-source development works, then maybe The Cathedral and the Bazzar (Eric Raymond) for the historical/founding-myth view.
I’d generally expect impact-motivated open source foundations to avoid competing directly with big tech, and instead try to build out under-resourced parts of the ecosystem like e.g. testing and verification. Regardless of the specifics here, to the extent that they work impact certificates invoke the unilateralists curse and so you really do need to consider negative externalities.
Expected evidence is conserved if almost anything you could observe would update you slightly in the same direction, with the much rarer cases updating you correspondingly much further in the opposite direction.
For what it’s worth, I also read this as Eliezer reporting on a case where he later realised that his views were violating CoEE. This is equivalent to observing evidence, realising that your prior should have given it more credence, and then updating in the same direction you moved your prior. Sounds awful but can be reasonable for computationally bounded agents!
Nadia’s blog post explores what she did instead of getting a PhD; you can in fact just spend several years immersed in a particular area of research and advance human knowledge without enrolling anywhere—although both funding and mentoring are often harder to find.
Some universities infrequently award an honorary doctorate to someone who is widely recognised in a field of study as working above the level expected of PhD candidates where a doctorate would ordinarily be required. Or occasionally to someone unqualified due to political pressure, but those cause severe reputational damage and are generally ignored.
Finally, it is very rarely possible to have existing doctoral-equivalent work recognised as fulfilling the requirements of a PhD, and graduate with a non-honorary doctorate without having enrolled (for long). The only case that springs to mind is George Dantzig, who solved two open problems in statistics thinking that they were homework; the subsequent papers later formed the basis of his thesis.
So in short: it’s traditional either to enroll and get a PhD the usual (hard) way, or to so surpass the requirements that’s it’s more embarrassing not to grant you a PhD (harder!).
Oh yeah. My personal favourite is the NFKD-normalization of identifiers, though I haven’t built that into hypothesmith yet.
Hey, the example uses Hypothesis 😁 Always nice to see it in the wild.
No worries, here’s the comment.
Zac Hatfield Dobbs (an engineer at Anthropic) commented on 16 July 2021: “Now it looks like prosaic alignment might be the only kind we get, and the deadline might be very early indeed.”
Could you please note in the text that I wrote this, and later applied to and joined Anthropic? As-is I’m concerned that people might misinterpret the timing and affiliation.
Nonetheless, this concern did in fact motivate me to join Anthropic instead of finishing my PhD.
My surname is “Hatfield-Dodds” :-)
Objections might include:
That’s mindcrime and/or murder, which is bad.
Acausal trade is in fact a thing
blah blah technical feasibility
In The Matrix, humans are put into tubes, used by machines as energy generators (somehow).
In earlier versions of the script, humans were used for neural computing, not energy, which makes somewhat more sense. You can consider that Morpheus was simply mistaken about the purpose here, or perhaps something else was happening...
Deterministic theories have the feature that they forbid some class of events from happening—for instance, the second law of thermodynamics forbids the flow of heat from a cold object to a hot object in an isolated system. The probabilistic component in a theory has no such character, even in principle.
This seems like an odd example to me, since the second law of thermodynamics is itself probabilistic!
I think that the best available starting point is the Cotra / OpenPhil report on biological anchors.
Personally I treat this as an upper bound on how much compute it might take, and hold the estimates of when compute will become available quite lightly. Nonetheless, IMO serious discussion of timelines should at least be able to describe specific disagreements with the report and identify which differing assumptions give rise to each.