Ivan Vendrov

Karma: 563

Ivan Vendrov 31 Jul 2022 4:31 UTC
96 points
19
on: chinchilla’s wild implications
Thought-provoking post, thanks.
One important implication is that pure AI companies such as OpenAI, Anthropic, Conjecture, Cohere are likely to fall behind companies with access to large amounts of non-public-internet text data like Facebook, Google, Apple, perhaps Slack. Email and messaging are especially massive sources of “dark” data, provided they can be used legally and safely (e.g. without exposing private user information). Taking just email, something like 500 billion emails are sent daily, which is more text than any LLM has ever been trained on (admittedly with a ton of duplication and low quality content).
Another implication is that federated learning, data democratization efforts, and privacy regulations like GDPR are much more likely to be critical levers on the future of AI than previously thought.

Ivan Vendrov 9 Aug 2022 1:26 UTC
27 points
22
on: Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
I don’t think any factored cognition proponents would disagree with
Composing interpretable pieces does not necessarily yield an interpretable system.
They just believe that we could, contingently, choose to compose interpretable pieces into an interpretable system. Just like we do all the time with
- massive factories with billions of components, e.g. semiconductor fabs
- large software projects with tens of millions of lines of code, e.g. the Linux kernel
- military operations involving millions of soldiers and support personnel
Figuring out how to turn interpretability/tool-ness/alignment/corrigibility of the parts into interpretability/tool-ness/alignment/corrigibility of the whole is the central problem, and it’s a hard (and interesting) open research problem.
Agreed this is the central problem, though I would describe it more as engineering than research—the fact that we have examples of massively complicated yet interpretable systems means we collectively “know” how to solve it, and it’s mostly a matter of assembling a large enough and coordinated-enough engineering project. (The real problem with factored cognition for AI safety is not that it won’t work, but that equally-powerful uninterpretable systems might be much easier to build).

Ivan Vendrov 8 Aug 2022 0:12 UTC
20 points
6
on: Paper reading as a Cargo Cult
In support of this, I remember Geoff Hinton saying at his Turing award lecture that he strongly advised new grad students not to read the literature before trying, for months, to solve the problem themselves.
Two interesting consequences of the “unique combination of facts” model of invention:
- You may want to engage in strategic ignorance: avoid learning about certain popular subfields or papers, in the hopes that this will let you generate a unique idea that is blocked for people who read all the latest papers and believe whatever the modern equivalent of “vanishing gradients means you can’t train deep networks end-to-end” turns out to be.
- You may want to invest in uncorrelated knowledge: what is a body of knowledge that you’re curious about that nobody else in your field seems to know? Fields that seem especially promising to cross-pollinate with alignment are human-computer interaction, economic history, industrial organization, contract law, psychotherapy, anthropology. Perhaps even these are too obvious!

Ivan Vendrov 17 Jun 2022 0:39 UTC
18 points
on: Humans are very reliable agents
Thought-provoking post, though as you hinted it’s not fair to directly compare “classification accuracy” with “accuracy at avoiding catastrophe”. Humans are probably less reliable than deep learning systems at this point in terms of their ability to classify images and understand scenes, at least given < 1 second of response time. Instead, human ability to avoid catastrophe is an ability to generate conservative action sequences in response to novel physical and social situations—e.g. if I’m driving and I see something I don’t understand up ahead I’ll slow down just in case.
I imagine if our goal was “never misclassify an MNIST digit” we could get to 6-7 nines of “worst-case accuracy” even out of existing neural nets, at the cost of saying “I don’t know” for the confusing 0.2% of digits.

Ivan Vendrov 27 Jul 2022 21:07 UTC
15 points
8
in reply to: stuhlmueller’s comment on: AGI ruin scenarios are likely (and disjunctive)
I expect the most critical reason has to do with takeoff speed; how long do we have between when AI is powerful enough to dramatically improve our institutional competence and when it poses an existential risk?
If the answer is less than e.g. 3 years (hard to imagine large institutional changes happening faster than that, even with AI help), then improving humanity’s competence is just not a tractable path to safety.

Ivan Vendrov 15 Aug 2022 3:48 UTC
14 points
12
in reply to: johnswentworth’s comment on: Gradient descent doesn’t select for inner search
I agree that A* and gradient descent are central examples of search; for realistic problems these algorithms typically evaluate the objective on millions of candidates before returning an answer.
In contrast, human problem solvers typically do very little state evaluation—perhaps evaluating a few dozen possibilities directly, and relying (as you said) on abstractions and analogies instead. I would call this type of reasoning “not very search-like”.
On the far end we have algorithms like Gauss-Jordan elimination, which just compute the optimal solution directly without evaluating any possibilities. Calling them “search algorithms” seems quite strange.
a general search process is something which takes in a specification of some problem or objective (from a broad class of possible problems/objectives), and returns a plan which solves the problem or scores well on the objective.
This appears to be a description of any optimization process, not just search—in particular it would include Gauss-Jordan elimination. I guess your ontology has “optimization” and “search” as synonyms, whereas mine has search as a (somewhat fuzzy) proper subset of optimization. Anyways, to eliminate confusion I’ll use Abram Demski’s term selection in future. Also added a terminology note to the post.

Ivan Vendrov 17 Aug 2022 2:46 UTC
13 points
2
on: What’s General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
Agreed that the existence of general-purpose heuristic-generators like relaxation is a strong argument for why we should expect to select for inner optimizers that look something like A*, contrary to my gradient descent doesn’t select for inner search post.
Recursive structure creates an even stronger bias toward things like A* but only in recurrent neural architectures (so notably not currently-popular transformer architectures, though it’s plausible that recurrent architectures will come back).
I maintain that the compression / compactness argument from “Risks from Learned Optimization” is wrong, at least in the current ML regime:
In general, evolved/trained/selected systems favor more compact policies/models/heuristics/algorithms/etc. In ML, for instance, the fewer parameters needed to implement the policy, the more parameters are free to vary, and therefore the more parameter-space-volume the policy takes up and the more likely it is to be found. (This is also the main argument for why overparameterized ML systems are able to generalize at all.)
I believe the standard explanation is that overparametrized ML finds generalizing models because gradient descent with weight decay finds policies that have low L2 norm, not low description length / Kolmogorov complexity. See Neel’s recent interpretability post for an example of weight decay slowly selecting a generalizable algorithm over (non-generalizable) memorization over the course of training.
I don’t understand the parameter-space-volume argument, even after a long back-and-forth with Vladimir Nesov here. If it were true, wouldn’t we expect to be able to distill models like GPT-3 down to 10-100x fewer parameters? In practice we see maybe 2x distillation before dramatic performance losses, meaning most of those parameters really are essential to the learned policy.
Overall though this post updated me substantially towards expecting the emergence of inner A*-like algorithms, despite their computational overhead. Added it to the list of caveats in my post.

Ivan Vendrov 7 Aug 2022 23:55 UTC
13 points
11
on: Jack Clark on the realities of AI policy
It’s very hard to bring the various members of the AI world together around one table, because some people who work on longterm/AGI-style policy tend to ignore, minimize, or just not consider the immediate problems of AI deployment/harms.
This is pointing at an ongoing bravery debate: I’m sure the feeling is real; but also, “AGI-style” people see their concerns being ignored & minimized by the “immediate problems” people, and so feel like they need to get even more strident.
This dynamic is really bad, I’m not sure what the systemic solution is, but as a starting point I would encourage people reading this to vocally support both immediate problems work and long term risks work rather than engaging in bravery-debate style reasoning like “I’ll only ever talk about long term risks because they’re underrated in The Discourse”. Obviously, do this only to the extent that you actually believe it! But most longtermists believe that at least some kinds of immediate problems work is valuable (at least relative to the realistic alternative which, remember, is capabilities work!), and should be more willing to say so.
Ajeya’s post on aligning narrow models and the Pragmatic AI Safety Sequence come to my mind as particularly promising starting points for building bridges between the two worlds.

Ivan Vendrov 17 Jun 2022 7:11 UTC
13 points
in reply to: Ash Gray’s comment on: Humans are very reliable agents
You’re right that it’s an ongoing research area but there’s a number of approaches that work relatively well. This NeurIPS tutorial describes a few. Probably the easiest thing is to use one of the calibration methods mentioned there to get your classifier to output calibrated uncertainties for each class, then say “I don’t know” if the network isn’t at least 90% confident in one of the 10 classes.

Ivan Vendrov 13 Aug 2022 4:51 UTC
12 points
0
in reply to: Vladimir_Nesov’s comment on: Gradient descent doesn’t select for inner search
Agreed on “explicit search” being a misleading phrase, I’ll replace it with just “search” when I’m referring to learned programs.
small descriptions give higher parameter space volume, and so the things we find are those with short descriptions
I don’t think I understand this. GPT-3 is a thing we found, which has 175B parameters, what is the short description of it?

Ivan Vendrov 17 Jul 2022 19:38 UTC
12 points
0
in reply to: Steven Byrnes’s comment on: Open & Welcome Thread—July 2022
My preferred adblocker, uBlock Origin, lets you right-click on any element on a page and block it, with a nice UI that lets you set the specificity and scope of the block. Takes about 10 seconds, much easier than mucking with JS yourself. I’ve done this to hide like & follower counts on twitter, just tried and it works great for LessWrong karma. It can’t do “hide karma only for your comments within last 24 hours” but thought this might be useful for others who want to hide karma more broadly.

Ivan Vendrov 17 Jun 2022 2:20 UTC
LW: 11 AF: 5
5
AF
on: A transparency and interpretability tech tree
This is very helpful as a roadmap connecting current interpretability techniques to the techniques we need for alignment.
One thing that seems very important but missing is how the tech tree looks if we factor in how SOTA models will change over time, including
1. large (order-of-magnitude) increases in model size
2. innovations in model architectures (e.g. the LSTM → Transformer transition)
3. innovations in learning algorithms (e.g. gradient descent being replaced by approximate second-order methods or by meta-learning)
For example, if we restricted our attention to ConvNets trained on MNIST-like datasets we could probably get to tech level (6) very quickly. But would this would help with solving transparency for transformers trained on language? And if we don’t expect it to help, why do we expect solving transparency for transformers will transfer over to the architectures that will be dominant 5 years from now?
My tentative answer would be that we don’t really know how much transparency generalizes between scales/architectures/learning algorithms, so to be safe we need to invest in enough interpretability work to both keep up with whatever the SOTA models are doing and to get higher and higher in the tech tree. This may be much, much harder than the “single tech tree” metaphor suggests.

Ivan Vendrov 24 Aug 2022 0:40 UTC
10 points
2
on: AGI Timelines Are Mostly Not Strategically Relevant To Alignment
Agreed with the sentiment, though I would make a weaker claim, that AGI timelines are not uniquely strategically relevant, and the marginal hour of forecasting work at this point is better used on other questions.
My guess is that the timelines question has been investigated and discussed so heavily because for many people it is a crux for whether or not to work on AI safety at all—and there are many more such people than there are alignment researchers deciding what approach to prioritize. Most people in the world are not convinced that AGI safety is a pressing problem, and building very robust and legible models showing that AGI could happen soon is, empirically, a good way to convince them.

Ivan Vendrov 10 Jun 2022 2:49 UTC
10 points
0
on: AGI Ruin: A List of Lethalities
A lot of important warnings in this post. “Capabilities generalize further than alignment once capabilities start to generalize far” was novel to me and seems very important if true.
I don’t really understand the emphasis on “pivotal acts”, though; there seems to be tons of weak pivotal acts, e.g. ways in which narrow AI or barely-above-human-AGI could help coordinate a global emergency regulatory response by the AI superpowers. Still might be worth focusing our effort on the future worlds where no weak pivotal acts are available, but important to point out this is not the median world.

Ivan Vendrov 26 Apr 2022 1:38 UTC
LW: 9 AF: 4
0
AF
on: Supervise Process, not Outcomes
I don’t think I buy the argument for why process-based optimization would be an attractor. The proposed mechanism—an evaluator maintaining an “invariant that each component has a clear role that makes sense independent of the global objective”—would definitely achieve this, but why would the system maintainers add such an invariant? In any concrete deployment of a process-based system, they would face strong pressure to optimize end-to-end for the outcome metric.
I think the way process-based systems could actually win the race is something closer to “network effects enabled by specialization and modularity”. Let’s say you’re building a robotic arm. You could use a neural network optimized end-to-end to map input images into a vector of desired torques, or you could use a concatenation of a generic vision network and a generic action network, with a common object representation in between. The latter is likely to be much cheaper because the generic network training costs can be amortized across many applications (at least in an economic regime where training cost dominates inference cost). We see a version of this in NLP where nobody outside the big players trains models from scratch, though I’m not sure how to think about fine-tuned models: do they have the safety profile of process-based systems or outcome-based systems?

Ivan Vendrov 28 Sep 2022 2:06 UTC
8 points
0
on: Why we’re not founding a human-data-for-alignment org
I’ve considered starting an org that was either aimed at generating better alignment data or would do so as a side effect and this is really helpful—this kind of negative information is nearly impossible to find.
Is there a market niche for providing more interactive forms of human feedback, where it’s important to have humans tightly in the loop with an ML process, rather than “send a batch to raters and get labels back in a few hours”? One reason RLHF is so little used is the difficulty of setting up this kind of human-in-the-loop infrastructure. Safety approaches like debate, amplification and factored cognition could also become competitive much faster if it was easier and faster to get complex human-in-the-loop pipelines running.
Maybe Surge already does this? But if not, you wouldn’t necessarily want to compete with them on their core competency of recruiting and training human raters. Just use their raters (or Scale’s), and build good reusable human-in-the-loop infrastructure, or maybe novel user interfaces that improve supervision quality.

Ivan Vendrov 11 Aug 2022 18:16 UTC
8 points
7
on: What misalignment looks like as capabilities scale
Intuitively speaking, the underlying problem is that aligned goals need to generalize robustly enough to block AGIs from the power-seeking strategies recommended by instrumental reasoning, which will become much more difficult as their instrumental reasoning skills improve.
This is the clearest justification of capabilities generalize further than alignment I’ve seen, bravo!
My main disagreement with the post is that goal misgeneralization comes after situational awareness. Weak versions of goal misgeneralization are already happening all the time, from toy RL experiments to production AI systems suffering from “training-serving skew”. We can study it today and learn a lot about the specific ways goals misgeneralize. In contrast, we probably can’t study the effects of high levels of situational awareness with current systems.
The problems in the earlier phases are more likely to be solved by default as the field of ML progresses.
I think this is certainly not true if you mean the problem of “situational awareness”. Even if the problem is “weakness of human supervisors”, I still don’t think it will be solved by default—the reinforcement learning from human preferences paper was published in 2017 but very few leading AI systems actually use RLHF, preferring to use even simpler and less scalable forms of human supervision. I think it’s reasonably likely that even in worlds where scalable supervision ideas like IDA, debate, or factored cognition could have saved us, they just won’t be built due to the engineering challenges involved.
The most valuable research of this type will likely require detailed reasoning about how proposed alignment techniques will scale up to AGIs, rather than primarily trying to solve early versions of these problems which appear in existing systems.
Would love to see this point defended more. I don’t have a strong opinion but weakly expect the most valuable research to come from attempts to align narrowly superhuman models rather than detailed reasoning about scaling, though we definitely need more of both. To use Steinhardt’s analogy to the first controlled nuclear reaction, we understand the AI equivalent to nuclear chain reactions well enough conceptually; what we need is the equivalent of cadmium rods and measurements of criticality, and if we find those it’ll probably be by deep engagement with the details of current systems and how they are trained and deployed.

Ivan Vendrov 16 Jul 2022 0:34 UTC
8 points
0
in reply to: Algon’s comment on: Safety Implications of LeCun’s path to machine intelligence
The configurator dynamically modulates the cost function, so the agent is not guaranteed to have the same cost function over time, hence can be dutch booked / violate VNM axioms.

Ivan Vendrov 19 Aug 2022 19:42 UTC
LW: 6 AF: 5
0
AF
in reply to: Ramana Kumar’s comment on: Gradient descent doesn’t select for inner search
Mostly orthogonal:
- Evan’s post argues that if search is computationally optimal (in the sense of being the minimal circuit) for a task, then we can construct a task where the minimal circuit that solves it is deceptive.
- This post argues against (a version of) Evan’s premise: search is not in fact computationally optimal in the context of modern tasks and architectures, so we shouldn’t expect gradient descent to select for it.
Other relevant differences are
1. gradient descent doesn’t actually select for low time complexity / minimal circuits; it holds time & space complexity fixed, while selecting for low L2 norm. But I think you could probably do a similar reduction for L2 norm as Evan does for minimal circuits. The crux is in the premise.
2. I think Evan is using a broader definition of search than I am in this post, closer to John Wentworth’s definition of search as “general problem solving algorithm”.
3. Evan is doing worst-case analysis (can we completely rule out the possibility of deception by penalizing time complexity?) whereas I’m focusing on the average or default case.

Ivan Vendrov 16 Jul 2022 17:41 UTC
6 points
4
in reply to: Evan R. Murphy’s comment on: Safety Implications of LeCun’s path to machine intelligence
My read of LeCun in that conversation is that he doesn’t think in terms of outer alignment / value alignment at all, but rather in terms of implementing a series of “safeguards” that allow humans to recover if the AI behaves poorly (See Steven Byrnes’ summary).
I think this paper helps clarify why he believes this—he had something like this architecture in mind, and so outer alignment seemed basically impossible. Independently, he believes it’s unnecessary because the obvious safeguards will prove sufficient.