Jacob Watts

Karma: 52

I love you!

Don’t take me too seriously or trust me too much lol. Plus, there’s a real chance that I already changed my mind on anything I’ve said in the past haha.

Bit of an EA/rat if I do say so myself. Non violent anarchy is cool; I don’t really care for big, opaque authoritarian control structures lol.

Intellectual iterests: all sorts of shit lol, but the current shortlist includes:

modern physics (looking for good resources)
xxx_hacker-shit_xxx (compassionate, non-violent)
technological self-determinism and managing mass externalities
meta x-risk mitigation, PauseAI stuff
pwning tyrnanny

The GREAT REFLECTION STARTS NOW MOTHERFUCKERS!!!!!!

Also:

My current donation portfolio: there might be a link here at some point

Projects we could collab on: there might be a link here at some point

Art and stuff I’m into rn: there might be a link here at some point

Big intellectual influences: there might be a link here at some point

Wish list / gift list: there might be a link here at some point

One Strange Morning in California

Jacob Watts13 Mar 2024 22:25 UTC

12 points

2 comments7 min readLW link

Jacob Watts 29 Jan 2023 1:56 UTC
8 points
0
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
For personal context: I can understand why a superintelligent system having any goals that aren’t my goals would be very bad for me. I can also understand some of the reasons it is difficult to actually specify my goals or train a system to share my goals. There are a few parts of the basic argument that I don’t understand as well though.

For one, I think I have trouble imagining an AGI that actually has “goals” and acts like an agent; I might just be anthropomorphizing too much.

1. Would it make sense to talk about modern large language models as “having goals” or is that something that we expect to emerge later as AI systems become more general? 2. Is there a reason to believe that sufficiently advanced AGI would have goals “by default”? 3. Are “goal-directed” systems inherently more concerning than “tool-like” systems when it comes to alignment issues (or is that an incoherent distinction in this context)?

I will try to answer those questions myself to help people see where my reasoning might be going wrong or what questions I should actually be trying to ask.

Thanks!

Jacob Watts 21 Feb 2023 0:12 UTC
7 points
6
on: AGI in sight: our look at the game board
I am very interested in finding more posts/writing of this kind. I really appreciate attempts to “look at the game board” or otherwise summarize the current strategic situation.
I have found plenty of resources explaining why alignment is a difficult problem and I have some sense of the underlying game-theory/public goods problem that is incentivizing actors to take excessive risks in developing AI anyways. Still, I would really appreciate any resources that take a zoomed-out perspective and try to identify the current bottlenecks, key battlegrounds, local win conditions, and roadmaps in making AI go well.

Jacob Watts 21 Feb 2023 0:03 UTC
7 points
3
in reply to: 1a3orn’s comment on: AGI in sight: our look at the game board
The skepticism that I object to has less to do with the idea that ML systems are not robust enough to operate robots and more to do with people rationalizing based off of the intrinsic feeling that “robots are not scary enough to justify considering AGI a credible threat”. (Whether they voice this intuition or not)
I agree that having highly capable robots which operate off of ML would be evidence for AGI soon and thus the lack of such robots is evidence in the opposite direction.
That said, because the main threat from AGI that I am concerned about comes from reasoning and planning capabilities, I think it can be somewhat of a red herring. I’m not saying we shouldn’t update on the lack of competent robots, but I am saying that we shouldn’t flippantly use the intuition, “that robot can’t do all sorts of human tasks, I guess machines aren’t that smart and this isn’t a big deal yet”.
I am not trying to imply that this is the reasoning you are employing, but it is a type of reasoning I have seen in the wild. If anything, the lack of robustness in current ML systems might actually be more concerning overall, though I am uncertain about this.

Jacob Watts 5 May 2023 1:31 UTC
6 points
3
in reply to: cousin_it’s comment on: Hell is Game Theory Folk Theorems
This seems to be phrased like a disagreement, but I think you’re mostly saying things that are addressed in the original post. It is totally fair to say that things wouldn’t go down like this if you stuck 100 actual prisoners or mathematicians or whatever into this scenario. I don’t believe OP was trying to claim that it would. The point is just that sometimes bad equilibria can form from everyone following simple, seemingly innocuous rules. It is a faithful execution of certain simple strategic approaches, but it is a bad strategy in situations like this because it fails to account for things like modeling the preferences/behavior of other agents.
To address your scenario:
Alice breaks it unilaterally on round 1, then Bob notices that and joins in on round 2, neither of them end up punished and they get 98.6 from then on
Ya, sure this could happen “in real life”, but the important part is that this solution assumes that Alice breaking the equilibrium on round 1 is evidence that she’ll break it on round 2. This is exactly why the character Rowan asks:
“If you’ve just seen someone else violate the equilibrium, though, shouldn’t you rationally expect that they might defect from the equilibrium in the future?”
and it is yields the response that
“Well, yes. This is a limitation of Nash equilibrium as an analysis tool, if you weren’t already convinced it needed revisiting based on this terribly unnecessarily horrible outcome in this situation. …”
This is followed by discussion of how we might add mathematical elements to account for predicting the behavior of other agents.
Humans predict the behavior of other agents automatically and would not be likely to get stuck in this particular bad equilibrium. That said, I still think this is an interesting toy example because it’s kind of similar to some bad equilibria which humans DO get stuck in (see these comments for example). It would be interesting to learn more about the mathematics and try to pinpoint what makes these failure modes more/less likely to occur.

Jacob Watts 17 Aug 2023 9:15 UTC
5 points
−4
in reply to: Edward Rothenberg’s comment on: If we had known the atmosphere would ignite
I appreciate the attempt, but I think the argument is going to have to be a little stronger than that if you’re hoping for the 10 million lol.
Aligned ASI doesn’t mean “unaligned ASI in chains that make it act nice”, so the bits where you say:
any constraints we might hope to impose upon an intelligence of this caliber would, by its very nature, be surmountable by the AI
and
overconfidence to assume that we could circumscribe the liberties of a super-intelligent entity
feel kind of misplaced. The idea is less “put the super-genius in chains” and moreso to get “a system smarter than you that wants the sort of stuff you would want a system smarter than you to want in the first place”.
From what I could tell, you’re also saying something like ~”Making a system that is more capable than you act only in ways that you approve of is nonsense because if it acts only in ways that you already see as correct, then it’s not meaningfully smarter than you/generally intelligent.” I’m sure there’s more nuance, but that’s the basic sort of chain of reasoning I’m getting from you.
I disagree. I don’t think it is fair to say that just because something is more cognitively capable than you, it’s inherently misaligned. I think this is conflating some stuff that is generally worth keeping distinct. That is, “what a system wants” and “how good it is at getting what it wants” (cf. Hume’s guillotine, orthogonality thesis).
Like, sure, an ASI can identify different courses of action/ consider things more astutely than you would, but that doesn’t mean it’s taking actions that go against your general desires. Something can see solutions that you don’t see yet pursue the same goals as you. I mean, people cooperate all the time even with asymmetric information and options and such. One way of putting it might be something like: “system is smarter than you and does stuff you don’t understand, but that’s okay cause it leads to your preferred outcomes”. I think that’s the rough idea behind alignment.
For reference, I think the way you asserted your disagreement came off kind of self-assured and didn’t really demonstrate much underlying understanding of the positions you’re disagreeing with. I suspect that’s part of why you got all the downvotes, but I don’t want you to feel like you’re getting shut down just for having a contrarian take. 👍

Jacob Watts 2 Mar 2023 3:49 UTC
4 points
3
in reply to: Droopyhammock’s comment on: AI: Practical Advice for the Worried
You said that multiple people have looked into s-risks and consider them of similar likelihood to x-risks. That is surprising to me and I would like to know more. Would you be willing to share your sources?

Jacob Watts 17 Aug 2023 9:22 UTC
3 points
0
on: If we had known the atmosphere would ignite
Would the prize also go towards someone who can prove it is possible in theory? I think some flavor of “alignment” is probably possible and I would suspect it more feasible to try to prove so than to prove otherwise.

I’m not asking to try to get my hypothetical hands on this hypothetical prize money, I’m just curious if you think putting effort into positive proofs of feasibility would be equally worthwhile. I think it is meaningful to differentiate “proving possibility” from alignment research more generally and that the former would itself be worthwhile. I’m sure some alignment researchers do that sort of thing right? It seems like a reasonable place to start given an agent-theoretic approach or similar.

Jacob Watts 9 Aug 2023 0:32 UTC
3 points
2
in reply to: Raemon’s comment on: An utterly perfect argument about p(Doom)
Personally, I found it obvious that the title was being playful and don’t mind that sort of tongue-in-cheek thing. I mean “utterly perfect” is kind of a give away that they’re not being serious.

Fun Things to Think About Designs For: Search Engine Alternatives

Jacob Watts4 Apr 2024 3:00 UTC

2 points

0 comments2 min readLW link

[Question] What are some sources related to big-picture AI strategy?

Jacob Watts2 Mar 2023 5:00 UTC

2 points

0 comments1 min readLW link

Jacob Watts 3 Oct 2023 2:28 UTC
2 points
0
in reply to: Ariel Kwiatkowski’s comment on: EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem
While I agree that there are notable differences between “vegans” and “carnists” in terms of group dynamics, I do not think that necessarily disagrees with the idea that carnists are anti-truthseeking.
“carnists” are not a coherent group, not an ideology, they do not have an agenda (unless we’re talking about some very specific industry lobbyists who no doubt exist). They’re just people who don’t care and eat meat.
It seems untrue that because carnists are not an organized physical group that has meetings and such, they are thereby incapable of having shared norms or ideas/memes. I think in some contexts it can make sense/be useful to refer to a group of people who are not coherent in the sense of explicitly “working together” or having shared newletters based around a subject or whatever. In some cases, it can make sense to refer to those people’s ideologies/norms.
Also, I disagree with the idea that carnists are inherently neutral on the subject of animals/meat. That is, they don’t “not care”. In general, they actively want to eat meat and would be against things that would stop this. That’s not “not caring”; it is “having an agenda”, just not one that opposes the current status quo. The fact that being pro-meat and “okay with factory farming” is the more dominant stance/assumed default in our current status quo doesn’t mean that it isn’t a legitimate position/belief that people could be said to hold. There are many examples of other memetic environments throughout history where the assumed default may not have looked like a “stance” or an “agenda” to the people who were used to it, but nonetheless represented certain ideological claims.
I don’t think something only becomes an “ideology” when it disagrees with the current dominant cultural ideas; some things that are culturally common and baked into people from birth can still absolutely be “ideology” in the way I am used to using it. If we disagree on that, then perhaps we could use a different term?
If nothing else, carnists share the ideological assumption that “eating meat is okay”. In practice, they often also share ideas about the surrounding philosophical questions and attitudes. I don’t think it is beyond the pale to say that they could share norms around truth-seeking as it relates to these questions and attitudes. It feels unnecessarily dismissive and perhaps implicitly status quoist to assume that: as a dominant, implicit meme of our culture “carnism” must be “neutral” and therefore does not come with/correlate with any norms surrounding how people think about/process questions related to animals/meat.
Carnism comes with as much ideology as veganism even if people aren’t as explicit in presenting it or if the typical carnist hasn’t put as much thought into it.
I do not really have any experience advocating publicly for veganism and I wouldn’t really know about which specific espistemic failure modes are common among carnists for these sorts of conversations, but I have seen plenty of people bend themselves out of shape persevering their own comfort and status quo, so it really doesn’t seem like a stretch to imagine that epistemic maladies may tend to present among carnists when the question of veganism comes up.
For one thing, I have personally seen carnists respond in intentionally hostile ways towards vegans/vegan messaging on several occasions. Partially this is because they see it as a threat to their ideas or their way of life or partially this is because veganism is a designated punching bag that you’re allowed to insult in a lot of places. Often times, these attacks draw on shared ideas about veganism/animals/morality that are common between “carnists”.
So, while I agree that there are very different group dynamics, I don’t think it makes sense to say that vegans hold ideologies and are capable of exhibiting certain epistemic behaviors, but that carnists, by virtue of not being a sufficiently coherent collection of individuals, could not have the same labels applied to them.

Jacob Watts 8 Aug 2023 23:01 UTC
2 points
1
on: Cultivating a state of mind where new ideas are born
Great post!

As much as a I like LessWrong for what it is, I think it’s often guilty of a lot of the negative aspects of conformity and coworking that you point out here. Ie. killing good ideas in their cradle. Of course, there are trade-offs to this sort of thing and I certainly appreciate brass-tacks and hard-nosed reasoning sometimes. There is also a need for ingenuity, non-conformity, and genuine creativity (in all of its deeply anti-social glory).

Thank you for sharing this! It helped me feel LessWeird about the sorts of things I do in my own creative/explorative processes and it gave me some new techniques/mindset-things to try.

I suspect there is some kind of internal asymmetry between how we process praise and rejection, especially when it comes to vulnerable things like our identities and our ideas. Back when I used to watch more “content creators” I remember they would consistently gripe that they could read 100 positive comments and still feel most affected by the one or two negative ones.

Well, cheers to not letting our thinking be crushed by the status quo! Nor by critics, internal or otherwise!

Jacob Watts 6 Jul 2023 3:41 UTC
2 points
2
on: [Linkpost] Introducing Superalignment
It strikes me that there is a difficult problem involved in creating a system that can automatically perform useful alignment research, which is generally pretty speculative and theoretical, without that system just being generally skilled at reasoning/problem solving. I am sure they are aware of this, but I feel like it is a fundamental issue worth highlighting.

Still, it seems like the special case of “solve the alignment problem as it relates to an automated alignment researcher” might be easier than “solve alignment problem for reasoning systems generally”, so it is potentially a useful approach.

Anyone know what resources I could check out to see how they’re planning on designing, aligning, and getting useful work out of their auto-alignment researcher? I mean, they mention some of the techniques, but it still seems vague to me what kind of model they’re even talking about. Are they basically going to use an LLM fine-tuned on existing research and then use some kind of scalable oversight/”turbo-RLHF” training regime to try to push it towards more useful outputs or what?

Jacob Watts 19 Mar 2024 19:42 UTC
1 point
0
on: One Strange Morning in California
A lot of good people are doing a lot of bad things that they don’t enjoy doing all the time. That seems weird. They even say stuff like “I don’t want to do this”. But then they recite some very serious sounding words or whatever and do it anyways.
Lol, okay on review that reads as priveledged. Easy for rectangle-havers to say.

There is underlying violence keeping a lot of people “at work” and doing the things they don’t want to do. An authoritarian violence keeping everyone in place.
The threat is to shelter, food, security, even humanity past a certain point. You don’t “go along”, we grind you into the ground. Or, “allow you to be ground by the environment we cocreated”.
Many people “do the thing they don’t want” because they are under much greater threat of material scarcity or physical violence than I am at present and I want to respect that.
<3

Jacob Watts 17 Aug 2023 10:07 UTC
1 point
1
on: Summary of and Thoughts on the Hotz/Yudkowsky Debate
Thanks! I haven’t watched, but I appreciated having something to give me the gist!
Hotz was allowed to drive discussion. In debate terms, he was the con side, raising challenges, while Yudkowsky was the pro side defending a fixed position.
This always seems to be the framing which seems unbelievably stupid given the stakes on each side of the argument. Still, it seems to be the default; I’m guessing this is status quo bias and the historical tendency of everything to stay relatively the same year by year (less so once technology really started happening). I think AI safety outreach needs to break out of this framing or it’s playing a losing game. I feel like, in terms of public communication, whoever’s playing defense has mostly already lost.
The idea that poking a single whole in EY’s reasoning is also a really broken norm around these discussions that we are going to have to move past if we want effective public communication. In particular, the combination of “tell me exactly what an ASI would do” and “if anything you say sounds implausible, then AI is safe” is just ridiculous. Any conversation implicitly operating on that basis is operating in bad faith and borderline not worth having. It’s not a fair framing of the situation.
9. Hotz closes with a vision of ASIs running amok
What a ridiculous thing to be okay with?! Is this representative of his actual stance? Is this stance taken seriously by anyone besides him?
not going to rely on a given argument or pathway because although it was true it would strain credulity. This is a tricky balance, on the whole we likely need more of this.
I take it this means not using certain implausible seeming examples? I agree that we could stand to move away from the “understand the lesson behind this implausible seeming toy example”-style argumentation and more towards an emphasis on something like “a lot of factors point to doom and even very clever people can’t figure out how to make things safe”.
I think it matters that most of the “technical arguments” point strongly towards doom, but I think it’s a mistake for AI safety advocates to try to do all of the work of laying out and defending technical arguments when it comes to public facing communication/debate. If you’re trying to give all the complicated reasons why doom is a real possibility, then you’re implicitly taking on a huge burden of proof and letting your opponent get away with doing nothing more than cause confusion and nitpick.
Like, imagine having to explain general relativity in a debate to an audience who has never heard about it. Your opponent continuously just stops you and disagrees with you; maybe misuses a term here and there and then at the end the debate is judged by whether the audience is convinced that your theory of physics is correct. It just seems like playing a losing game for no reason.
Again, I didn’t see this and I’m sure EY handled himself fine, I just think there’s a lot of room for improvement in the general rhythm that these sorts of discussions tend to fall into.
I think it is okay for AI safety advocates to lay out the groundwork, maybe make a few big-picture arguments, maybe talk about expert opinion (since that alone is enough to perk most sane people’s ears and shift some of the burden of proof), and then mostly let their opponents do the work of stumbling through the briars of technical argumentation if they still want to nitpick whatever thought experiment. In general, a leaner case just argues better and is more easily understood. Thus, I think it’s better to argue the general case than to attempt the standard shuffle of a dozen different analogies; especially when time/audience attention is more acutely limited.

Jacob Watts 9 Aug 2023 0:57 UTC
1 point
0
on: An utterly perfect argument about p(Doom)
The doubling time for AI compute is ~6 months
Source?
In 5 years compute will scale 2^(5÷0.5)=1024 times
This is a nitpick, but I think you meant 2^(5*2)=1024
In 5 years AI will be superhuman at most tasks including designing AI
This kind of clashes with the idea that AI capabilities gains are driven mostly by compute. If “moar layers!” is the only way forward, then someone might say this is unlikely. I don’t think this is a hard problem, but I thing its a bit of a snag in the argument.
An AI will design a better version of itself and recursively loop this process until it reaches some limit
I think you’ll lose some people on this one. The missing step here is something like “the AI will be able to recognize and take actions that increase its reward function”. There is enough of a disconnect between current systems and systems that would actually take coherent, goal-oriented actions that the point kind of needs to be justified. Otherwise, it leaves room for something like a GPT-X to just kind of say good AI designs when asked, but which doesn’t really know how to actively maximize its reward function beyond just doing the normal sorts of things it was trained to do.
Such any AI will be superhuman at almost all tasks, including computer security, R&D, planning, and persuasion
I think this is a stronger claim than you need to make and might not actually be that well-justified. It might be worse than humans at loading the dishwasher bc that’s not important to it, but if it was important, then it could do a brief R&D program in which it quickly becomes superhuman at dish-washer-loading. Idk, maybe the distinction I’m making is pointless, but I guess I’m also saying that there’s a lot of tasks it might not need to be good at if its good at things like engineering and strategy.
Overall, I tend to agree with you. Most of my hope for a good outcome lies in something like the “bots get stuck in a local maximum and produce useful superhuman alignment work before one of them bootstraps itself and starts ‘disempowering’ humanity”. I guess that relates to the thing I said a couple paragraphs ago about coherent, goal-oriented actions potentially not arising even as other capabilities improve.
I am less and less optimistic about this as research specifically designed to make bots more “agentic” continues. In my eyes, this is among some of the worst research there is.

Jacob Watts 5 Aug 2023 0:25 UTC
1 point
0
on: Read More Books but Pretend to Read Even More
There’s a dead zone between skimming and scrutiny where you could play slow games without analyzing them and get neither the immediate benefits of cognitively-demanding analysis nor enough information to gain a passive understanding of the underlying patterns.
I think this is a good point. I think there’s a lot to be said for being intentional about how/what you’re consuming. It’s kind of easy for me to fall into a pit of “kind of paying attention” where I’m spending mental energy, but not retaining anything, but not really skimming either. I think it is less cognitively demanding per unit time, but gives you way worse learning-bang for your mental-energy-buck.
I don’t think I fully understand this dead zone or why it happens, but I am suspicious that it also plays a pretty large role in a lot of ineffective/mainstream education.

Jacob Watts 17 May 2023 2:13 UTC
1 point
0
on: Open & Welcome Thread—May 2023
I’m interested in getting involved with a mentorship program or a learning cohort for alignment work. I have found a few things poking around (mostly expired application posts), but I was wondering if anyone could point me towards a more comprehensive list. I found aisafety.community, but it still seems like it is missing things like bootcamps, SERI MATS, and such. If anyone is aware of a list of bootcamps, cohorts, or mentor programs or list a few off for me, I would really appreciate the direction. Thanks!

Jacob Watts 11 May 2023 2:14 UTC
1 point
0
on: Green_Swan’s Shortform
I have sometimes seen people/contests focused on writing up specific scenarios for how AI can go wrong starting with our current situation and fictionally projecting into the future. I think the idea is that this can act as an intuition pump and potentially a way to convince people.

I think that is likely net negative given the fact that state of the art AIs are being trained on internet text and stories where a good agent starts behaving badly are a key component motivating the Waluigi effect.

These sort of stories still seem worth thinking about, but perhaps greater care should be taken not to inject GPT-5′s training data with examples of chatbots that go murderous. Maybe only post it as a zip file or use a simple cipher.

Jacob Watts

One Strange Morn­ing in California

Fun Things to Think About De­signs For: Search Eng­ine Alternatives

[Question] What are some sources re­lated to big-pic­ture AI strat­egy?

One Strange Morning in California

Fun Things to Think About Designs For: Search Engine Alternatives

[Question] What are some sources related to big-picture AI strategy?