Your principles #3 and #5 are in a weak conflict—generating hypothesis without having enough information to narrow the space of reasonable hypotheses would too often lead to false positives. When faced with an unknown novel phenomena, one put to collect information first, including collecting experimental data without a fixed hypothesis, before starting to formulate any hypotheses.
Anon User
I’m not involved in politics or the military action, but I can’t help but feel implicated by my government’s actions as a citizen here
Please consider the implications of not only being a citizen, but also taxpayer, and customer to other taxpayers. Through taxes, you work indirectly supports the Russian war effort.
I’m interested in building global startups,
If you succeed while still in Russia, what is stopping those with powerful connections from simply taking over from you? From what you say, it does not sound like you have connections of your own that would allow you to protect yourself?
You do not mention you eligibility for getting drafted, but unless you have strong reasons to believe you would not be (e.g. you are female), you also need to consider that possibility.
Chances are things in Russia will become worse before they become better. Have you considered how Putin’s next big stupid move might affect you? What happens next time something like the Prigozhin/Wagner rebellion is a bit less of a farse? Or how it might affect you if Putin dies and Kadyrov decides it’s his chance to take over?
Option 5: the questioner is optimizing a metric other than what appears to be the post’s implicit “get max info with minimal number of questions, ignoring communication overhead”, which is IMHO a weird metric to optimize to begin with—not only it does not take length/complexity of each question into account, but is also ignoring things like maintaining answerer wilingness to continue answering questions, not annoying the answerer, ensuring proper context so that a question is not misunderstood, and this is not even taking into account the possiblity that while the questioner does care about getting the information, they might also simultaneously care about other things.
Looks like a good summary of their current positions, but how about willingness to update their position and act decisively and based on actual evidence/data? De Santis’s history of anti-mask/anti-vaccine stances have to be taken into account, perhaps? Same for Kennedy?
I am not working on X because it’s so poorly defined that I dread needing to sort it out.
I not working on X because I am at a loss where to start
I feel like admiring the problem X and considering all the ways I could theoretically start solving it, so I am not actually doing something to solve it.
For a professor at a top university, this would be easily 60+ hrs/week. https://www.insidehighered.com/news/2014/04/09/research-shows-professors-work-long-hours-and-spend-much-day-meetings claims 61hrs/week is average, and something like 65 for a full Professor. The primary currency is prestige, not salary, and prestige is generated by research (high-profile grants, high-profile publications, etc), not teaching. For teaching, they would likely care a lot more about advanced classes for students getting closer to potentially joining their research team, and a lot less about the intro classes (where many students might not even be from the right major) - those would often be seen as a chore to get out of the way, not as a meaningful task to invest actual effort into.
So what system selects the best leader out of the entire population?
None—as Churchill said, democracy is the worst form of Government except for all those other forms that have been tried from time to time. Still, should be realistic when explaining the benefits.
One theory of democracy’s purpose is to elect the “right” leaders. In this view, questions such as “Who is best equipped to lead this nation?” have a correct answer, and democracy is merely the most effective way to find that answer.
I think this is a very limiting view of instrumental goals of democracy. First, democracy has almost no chance of selecting the best leader—at best, it could help select a better one out of a limited set of options. Second, this ignores a key, IMHO the key, feature of democracy—keeping leaders accountable after they are elected. Democracy does not just start backsliding when a bad leader is elected, it starts backsliding when the allies of that leader become too willing to shield the “dear leader” from accountability.
Ensuring the leaders change is another important feature.
I think the use of the term “AGI” without a specific definition is causing an issue here—IMHO the crux of the matter is the difference between the progress in average performance vs worst-case performance. We are having amazing progress in the former, but struggling with the latter (LLM hallucinations, etc). And robotaxis require an almost-perfect performance.
This makes assumptions that make no sense to me. Auto-GPT is already not passively safe, and there is no reason to be sure LLMs would remain myopic as they are scaled. LLMs are inscrutable matrixes of floating points that we are barely learning how to understand and interpret. We have no reliable way to predict when LLMs might hallucinate or misbehave in some other way. There is also no “human level”—LLMs are way faster than humans and are way more scalable than humans—there is no way to get LLMs that are as good as humans without having something that’s way better than humans along a huge number of dimensions.
As a few commenters have already pointed out, this “strategy” completely fails in step 2 (“Specify safety properties that we want all AIs to obey”). Even for a “simple” property you cite, “refusal to help terrorists spread harmful viruses”, we are many orders of magnitude of descriptive complexity away from knowing how to state them as a formal logical predicate on the I/O behavior of the AI program. We have no clue how to define “virus” as a mathematical property of the AI sensors in a way that does not go wrong in all kinds of corner cases, even less clue for “terrorist”, and even less clue than that for “help”. The gap between what we know how to specify today and the complexity of your “simple” property is way bigger than the gap between the “simple” property and most complex safety properties people tend to consider...
To illustrate, consider an even simpler partial specification—the AI is observing the world, and you want to formally define the probability that it’s notion of whether it’s seeing a dog is aligned with your definition of a dog. Formally, define a mathematical function of arguments that, with the arguments representing the RGB values for a 1024x1024 image, would capture the true probability that the image contains what you consider to be a dog—so that a neutral network that is proven to compute that particular function can be trusted to be aligned with your definition of a dog, while a neutral network that does something else is misaligned. Well, today we have close to zero clue how to do this. The closest we can do is to train a neutral network to recognize dog pictures, and than whatever function that network happens to compute (which, if written down as a mathematical function, would be an incomprehensible mess that, even if we optimize to reduce the size of, will probably tbe at least thousands of pages long) is the best formal specification we know how to come up with. (For things simpler than dogs we can probably do better by first defining a specification for 3d shapes, then projecting it onto 2d images, but I do not think this approach will be much help for dogs). Note that effectively we are saying to trust the neural network—whatever it learned to do is our best guess on how to formalize what it needed to do! We do not yet know how to do better!!!
Yes, of course, what I meant is more of a case of somebody confidently presenting as an self-evident truth something with a ton of well-known counterarguments. Or more generally, somebody that is not only clueless, but showing no awareness of how clueless they are, and no evidence that they at least tried to look for relevant information. [IMHO] Somebody who demonstrates willingness to learn deserves a comment pointing them to relevant information (and may still warrant a downvote, depending on how off the post it). Somebody who does not deserves to be downvoted, and usually would not deserve the time I would need to spend to explain my downvote in a comment. [/IMHO]
FWIW, most of my downvotes on LW are for poorly reasoned jumping to conclusions posts and/or where the poster does not seem to fully know what they are talking about and should have done more homework first. Would never downvote a well written post even if I 100% disagree.
Grammar issue in your Russian version—should be “Как я могу взять уток домой из парка?”, or even better: “Как мне забрать уток из парка домой?”
Sears tried creating an explicit internal economy. It did not end well. https://www.versobooks.com/blogs/news/4385-failing-to-plan-how-ayn-rand-destroyed-sears
Everything else being equal, fast agile decisionmaking is better than slow and blunt one. Freedom does not just mean freedom to do X today, it also means freedom to change our minds bout X tomorrow. Do not regulate X because freedom means, a,ong other things, not trusting X to be regulated in sensible ways, and trusting individuals self-organizing more. Not saying this is always a good choice, but the potential pitfalls of things like regulatory capture need to be acknowledged.
If humans are supposed to be able to detect things going wrong and shut things down, that requires that they are exposed to the unencrypted feed. At this point, the humans are the weakest link, not the encryption. Similar for anything else external that you need / want AI to access while it’s being trained and tested.
Edited to add: particularly if we are talking about not some theoretical sensible humans, but about real humans that started with “do not worry about LLMs, they are not agentic”, and then promptly connected LLMs to agentic APIs.
Maybe there is a better way to put it—SFOT holds for objective functions/environments that only depend on the agent I/O behavior. Once the agent itself is embodied, then yes, you can use all kinds of diagonal tricks to get weird counterexamples. Implications for alignment—yes, if your agent is fully explainable and you can transparently examine it’s workings, chances are that alignment is easier. But that is kind of obvious without having to use SFOT to reason about it.
Edited to add: “diagonal tricks” above refers to things in the conceptual neighborhood of https://en.m.wikipedia.org/wiki/Diagonal_lemma
https://xkcd.com/538/ Crypto is not the weakest link.
I buy your argument that power seeking is a convergent behavior. In fact, this is a key part of many canonical arguments for why an unaligned AGI is likely to kill us all.
But, on the meta level you seem to argue that this is incompatible with orthogonally thesis? If so, you may be misunderstanding the thesis—the ability of an AGI to have arbitrary utility functions is orthogonal (pun intended) to what behaviors are likely to result from those utility functions. The former is what orthogonality thesis claims, but your argument is about the latter.