AI strategy & governance. ailabwatch.org. ailabwatch.substack.com.
Zach Stein-Perlman
I guess so! Is there reason to favor logit?
Yep, e.g. donations sooner are better for getting endorsements. Especially for Bores and somewhat for Wiener, I think.
Maybe the logistic success curve should actually be the cumulative normal success curve.
There’s often a logistic curve for success probabilities, you know? The distances are measured in multiplicative odds, not additive percentage points. You can’t take a project like this and assume that by putting in some more hard work, you can increase the absolute chance of success by 10%. More like, the odds of this project’s failure versus success start out as 1,000,000:1, and if we’re very polite and navigate around Mr. Topaz’s sense that he is higher-status than us and manage to explain a few tips to him without ever sounding like we think we know something he doesn’t, we can quintuple his chances of success and send the odds to 200,000:1. Which is to say that in the world of percentage points, the odds go from 0.0% to 0.0%. That’s one way to look at the “law of continued failure”.
If you had the kind of project where the fundamentals implied, say, a 15% chance of success, you’d then be on the right part of the logistic curve, and in that case it could make a lot of sense to hunt for ways to bump that up to a 30% or 80% chance.
I observe that https://www.lesswrong.com/posts/BqwXYFtpetFxqkxip/mikhail-samin-s-shortform?commentId=dtmeRXPYkqfDGpaBj isn’t frontpage-y but remains on the homepage even after many mods have seen it. This suggests that the mods were just patching the hack. (But I don’t know what other shortforms they’ve hidden, besides the political ones, if any.)
fwiw I agree with most but not all details, and I agree that Anthropic’s commitments and policy advocacy have a bad track record, but I think that Anthropic capabilities is nevertheless net positive, because Anthropic has way more capacity and propensity to do safety stuff than other frontier AI companies.
I wonder what you believe about Anthropic’s likelihood of noticing risks from misalignment relative to other companies, or of someday spending >25% of internal compute on (automated) safety work.
I think “Overton window” is a pretty load-bearing concept for many LW users and AI people — it’s their main model of policy change. Unfortunately there’s lots of other models of policy change. I don’t think “Overton window” is particularly helpful or likely-to-cause-you-to-notice-relevant-stuff-and-make-accurate-predictions. (And separately people around here sometimes incorrectly use “expand the Overton window” to just mean with “advance AI safety ideas in government.”) I don’t have time to write this up; maybe someone else should (or maybe there already exists a good intro to the study of why some policies happen and persist while others don’t[1]).
Some terms: policy windows (and “multiple streams”), punctuated equilibrium, policy entrepreneurs, path dependence and feedback (yes this is a real concept in political science, e.g. policies that cause interest groups to depend on them are less likely to be reversed), gradual institutional change, framing/narrative/agenda-setting.
Related point: https://forum.effectivealtruism.org/posts/SrNDFF28xKakMukvz/tlevin-s-quick-takes?commentId=aGSpWHBKWAaFzubba.
- ^
I liked the book Policy Paradox in college. (Example claim: perceived policy problems are strategically constructed through political processes; how issues are framed—e.g. individual vs collective responsibility—determines which solutions seem appropriate.) I asked Claude for suggestions on a shorter intro and I didn’t find the suggestions helpful.
I guess I think if you work on government stuff and you [don’t have poli sci background / aren’t familiar with concepts like “multiple streams”] you should read Policy Paradox (although the book isn’t about that particular concept).
- ^
I guess I’ll write non-frontpage-y quick takes as posts instead then :(
I’d like to be able to see such quick takes on the homepage, like how I can see personal blogposts on the homepage (even though logged-out users can’t).
Are you hiding them from everyone? Can I opt into seeing them?
I failed to find a way to import to Slack without doing it one by one.
Bores knows, at least for people who donate via certain links. For example, the link in this post is https://secure.actblue.com/donate/boresai?refcode=lw rather than https://secure.actblue.com/donate/boresweb.
I’m annoyed that Tegmark and others don’t seem to understand my position: you should try for great global coordination but also invest in safety in more rushed worlds, and a relatively responsible developer shouldn’t unilaterally stop.
(I’m also annoyed by this post’s framing for reasons similar to Ray.)
Part is thinking about donation opportunities, like Bores. Hopefully I’ll have more to say publicly at some point!
Recently I’ve been spending much less than half of my time on projects like AI Lab Watch. Instead I’ve been thinking about projects in the “strategy/meta” and “politics” domains. I’m not sure what I’ll work on in the future but sometimes people incorrectly assume I’m on top of lab-watching stuff; I want people to know I’m not owning the lab-watching ball. I think lab-watching work is better than AI-governance-think-tank work for the right people on current margins and at least one more person should do it full-time; DM me if you’re interested.
Good point. I think compute providers can steal model weights, as I said at the top. I think they currently have more incentive to steal architecture and training algorithms, since those are easier to use without getting caught, so I focused on “algorithmic secrets.”
Separately, are Amazon and Google incentivized to steal architecture and training algorithms? Meh. I think it’s very unlikely, since even if they’re perfectly ruthless their reputation is very important to them (plus they care about some legal risks). I think habryka thinks it’s more likely than I do. This is relevant to Anthropic’s security prioritization — security from compute providers might not be among the lowest-hanging fruit. And I think Fabien thinks it’s relevant to ASL-3 compliance, and I agree that ASL-3 probably wasn’t written with insider threat from compute providers in mind. But I’m not sure it’s relevant to ASL-3 compliance? The ASL-3 standard doesn’t say that actors are only in scope if they seem incentivized to steal stuff; the scope is based on actors’ capabilities.
I agree that whether Anthropic has handled insider threat from compute providers is a crux. My guess is that Anthropic and humans-at-Anthropic wouldn’t claim to have handled this (outside of the implicit claim for ASL-3) and they would say something more like that’s out of scope for ASL-3 or oops.
Separately, I just unblocked you. (I blocked you because I didn’t like this thread in my shortform, not directly to stifle dissent. I have not blocked anyone else. I mention this because hearing about disagreement being hidden/blocked should make readers suspicious but that’s mostly not correct in this case.)
Edit: also, man, I tried to avoid “condemnation” and I think I succeeded. I was just making an observation. I don’t really condemn Anthropic for this.
An AI company’s model weight security is at most as good as its compute providers’ security. I don’t know how good compute providers’ security is, but at the least I think model weights and algorithmic secrets aren’t robust to insider threat from compute provider staff. I think it would be very hard for compute providers to eliminate insider threat, much less demonstrate that to the AI company.
I think this based on the absence of public information to the contrary, briefly chatting with LLMs, and a little private information.
One consequence is that Anthropic probably isn’t complying with its ASL-3 security standard, which is supposed to address risk from “corporate espionage teams.” Arguably this refers to teams at companies with no special access to Anthropic, rather than teams at Amazon and Google. But it would be dubious to exclude Amazon and Google for being compute providers: they’re competitors with strong incentives to steal algorithmic secrets, and more risk comes from them than the baseline “corporate espionage team” but most risk from any group of actors comes from the small subset of actors that pose more than baseline risk. Anthropic is thinking about how to address this and related threats, but my impression is that it hasn’t yet done so.
Thanks to habryka for making related points to me and discussing. (He doesn’t necessarily endorse this.)
- ^
Obviously Anthropic isn’t meeting its security standard is consistent with Anthropic’s security is better than its competitors and even marginal improvements to Anthropic’s security don’t really matter because other companies have similarly capable models with substantially less security. I don’t take a position on those questions here. And I certainly don’t claim that insider risk at compute providers is among the lowest-hanging security fruit. I’m just remarking on the RSP.
- ^
Context: maybe hooray Anthropic for being the only company to make a security claim which is apparently strong enough that insider threat at compute providers is a problem for it, but boo Anthropic for saying it’s meeting this standard if it’s not. Recall that other companies are generally worse and vaguer; being vague is bad too but doesn’t enable specific criticism like this. Recall that AI companies aren’t credibly planning for great security. I think addressing insider threat at compute providers happens by SL4 in the RAND framework.
- ^
Daniel is referring to
When AI is fully automated, disagreement over how good their research taste will be, but median is roughly as good as the median current AI worker.
which is indeed a mistake
Example with fake numbers: my favorite intervention is X. My favorite intervention in a year will probably be (stuff very similar to) X. I value $1 for X now equally to $1.7 for X in a year. I value $1.7 for X in a year equally to $1.4 unrestricted in a year, since it’s possible that I’ll believe something else is substantially better than X. So I should wait to donate if my expected rate of return is >40%; without this consideration I’d only wait if my expected rate of return is >70%.
You may be interested in ailabwatch.org/resources/corporate-documents, which links to a folder where I have uploaded ~all past versions of the CoI. (I don’t recommend reading it, although afaik the only lawyers who’ve read the Anthropic CoI are Anthropic lawyers and advisors, so it might be cool if one independent lawyer read it from a skeptical/robustness perspective. And I haven’t even done a good job diffing the current version from a past version; I wasn’t aware of the thing Drake highlighted.)