Max H

Karma: 2,506

Most of my posts and comments are about AI and alignment. Posts I’m most proud of, which also provide a good introduction to my worldview:

I also created Forum Karma, and wrote a longer self-introduction here.

PMs and private feedback are always welcome.

NOTE: I am not Max Harms, author of Crystal Society. I’d prefer for now that my LW postings not be attached to my full name when people Google me for other reasons, but you can PM me here or on Discord (m4xed) if you want to know who I am.

Max H Jun 12, 2024, 10:00 PM
11 points
0
in reply to: Richard_Ngo’s comment on: ricraz’s Shortform
Suppose we think of ourselves as having many different subagents that focus on understanding the world in different ways—e.g. studying different disciplines, using different styles of reasoning, etc. The subagent that thinks about AI from first principles might come to a very strong opinion. But this doesn’t mean that the other subagents should fully defer to it (just as having one very confident expert in a room of humans shouldn’t cause all the other humans to elect them as the dictator). E.g. maybe there’s an economics subagent who will remain skeptical unless the AI arguments can be formulated in ways that are consistent with their knowledge of economics, or the AI subagent can provide evidence that is legible even to those other subagents (e.g. advance predictions).

Do “subagents” in this paragraph refer to different people, or different reasoning modes / perspectives within a single person? (I think it’s the latter, since otherwise they would just be “agents” rather than subagents.)
Either way, I think this is a neat way of modeling disagreement and reasoning processes, but for me it leads to a different conclusion on the object-level question of AI doom.

A big part of why I find Eliezer’s arguments about AI compelling is that they cohere with my own understanding of diverse subjects (economics, biology, engineering, philosophy, etc.) that are not directly related to AI—my subagents for these fields are convinced and in agreement.

Conversely, I find many of the strongest skeptical arguments about AI doom to be unconvincing precisely because they seem overly reliant on a “current-paradigm ML subagent” that their proponents feel should be dominant, or at least more heavily weighted than I think is justified.
That will push P(doom) lower because most frames from most disciplines, and most styles of reasoning, don’t predict doom.
This might be true and useful for getting some kind of initial outside-view estimate, but I think you need some kind of weighting rule to make this work as reasoning strategy even at a meta level. Otherwise, aren’t you vulnerable to other people inventing lots of new frames and disciplines? I think the answer in geometric rationality terms is that some subagents will perform poorly and quickly lose their Nash bargaining resources, and then their contribution to future decision-making / conclusion-making will be down-weighted. But I don’t think the only way for a subagent to “perform” for the purposes of deciding on a weight is by making externally legible advance predictions.

Max H Mar 26, 2024, 8:49 PM
7 points
11
on: Modern Transformers are AGI, and Human-Level
Maybe a better question than “time to AGI” is time to mundanely transformative AGI. I think a lot of people have a model of the near future in which a lot of current knowledge work (and other work) is fully or almost-fully automated, but at least as of right this moment, that hasn’t actually happened yet (despite all the hype).

For example, one of the things current A(G)Is are supposedly strongest at is writing code, but I would still rather hire a (good) junior software developer than rely on currently available AI products for just about any real programming task, and it’s not a particularly close call. I do think there’s a pretty high likelihood that this will change imminently as products like Devin improve and get more widely deployed, but it seems worth noting (and finding a term for) the fact that this kind of automation so far (mostly) hasn’t actually happened yet, aside from certain customer support and copyediting jobs.

I think when someone asks “what is your time to AGI”, they’re usually asking about when you expect either (a) AI to radically transform the economy and potentially usher in a golden age of prosperity and post-scarcity or (b) the world to end.

And maybe I am misremembering history or confused about what you are referring to, but in my mind, the promise of the “AGI community” has always been (implicitly or explicitly) that if you call something “human-level AGI”, it should be able to get you to (a), or at least have a bigger economic and societal impact than currently-deployed AI systems have actually had so far. (Rightly or wrongly, the ballooning stock prices of AI and semiconductor companies seem to be mostly an expectation of earnings and impact from in-development and future products, rather than expected future revenues from wider rollout of any existing products in their current form.)

Max H Mar 5, 2024, 4:26 AM
6 points
3
in reply to: TurnTrout’s comment on: TurnTrout’s shortform feed
I actually agree that a lot of reasoning about e.g. the specific pathways by which neural networks trained via SGD will produce consequentialists with catastrophically misaligned goals is often pretty weak and speculative, including in highly-upvoted posts like Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover.
But to expand on my first comment, when I look around and see any kind of large effect on the world, good or bad (e.g. a moral catastrophe, a successful business, strong optimization around a MacGuffin), I can trace the causality through a path that is invariably well-modeled by applying concepts like expected utility theory (or geometric rationality, if you prefer), consequentialism, deception, Goodharting, maximization, etc. to the humans involved.
I read Humans provide an untapped wealth of evidence about alignment and much of your other writing as disagreeing with the (somewhat vague / general) claim that these concepts are really so fundamental, and that you think wielding them to speculate about future AI systems is privileging the hypothesis or otherwise frequently leads people astray. (Roughly accurate summary of your own views?)
Regardless of how well this describes your actual views or not, I think differing answers to the question of how fundamental this family of concepts is, and what kind of reasoning mistakes people typically make when they apply them to AI, is not really a disagreement about neural networks specifically or even AI generally.

Max H Mar 4, 2024, 11:11 PM
LW: 16 AF: 7
8
AF
in reply to: TurnTrout’s comment on: TurnTrout’s shortform feed

one can just meditate on abstract properties of “advanced systems” and come to good conclusions about unknown results “in the limit of ML training”

I think this is a pretty straw characterization of the opposing viewpoint (or at least my own view), which is that intuitions about advanced AI systems should come from a wide variety of empirical domains and sources, and a focus on current-paradigm ML research is overly narrow.

Research and lessons from fields like game theory, economics, computer security, distributed systems, cognitive psychology, business, history, and more seem highly relevant to questions about what advanced AI systems will look like. I think the original Sequences and much of the best agent foundations research is an attempt to synthesize the lessons from these fields into a somewhat unified (but often informal) theory of the effects that intelligent, autonomous systems have on the world around us, through the lens of rationality, reductionism, empiricism, etc.

And whether or not you think they succeeded at that synthesis at all, humans are still the sole example of systems capable of having truly consequential and valuable effects of any kind. So I think it makes sense for the figure of merit for such theories and worldviews to be based on how well they explain these effects, rather than focusing solely or even mostly on how well they explain relatively narrow results about current ML systems.

Max H Mar 1, 2024, 2:51 AM
17 points
−4
on: The Parable Of The Fallen Pendulum—Part 1
There are a bunch of ways to “win the argument” or just clear up the students’ object-level confusion about mechanics:
- Ask them to predict what happens if the experiment is repeated with the stand held more firmly in place.
- Ask them to work the problems in their textbook, using whatever method or theory they prefer. If they get the wrong answer (according to the answer key) for any of them, that suggests opportunities for further experiments (which the professor should take care to set up more carefully).
- Point out the specific place in the original on-paper calculation where the model of the pendulum system was erroneously over-simplified, and show that using a more precise model results in a calculation that agrees with the experimental results. Note that the location of the error is only in the model (and perhaps the students’ understanding); the words in the textbook describing the theory itself remain fixed.
- Write a rigid body physics simulator which can model the pendulum system in enough detail to accurately simulate the experimental result for both the case that the stand is held in place and the case that it falls over. Reveal that the source code for the simulator uses only the principles of Newtonian mechanics.
- Ask the students to pass the ITT of a more experienced physicist. (e.g. ask a physicist to make up some standard physics problems with an answer key, and then challenge the students to accurately predict the contents of the answer key, regardless of whether the students themselves believe those answers would make good experimental predictions.)
These options require that the students and professor spend some time and effort to clear up the students’ confusion about Newtonian mechanics, which may not be feasible if the lecture is ending soon. But the bigger issue is that clearing up the object-level confusion about physics doesn’t necessarily clear up the more fundamental mistakes the students are making about valid reasoning under uncertainty.
I wrote a post recently on Bayesian updating in real life that the students might be interested in, but in short I would say that their biggest mistake is that they don’t have a detailed enough understanding of their own hypotheses. Having failed to predict the outcome of their own experiment, they have strong evidence that they themselves do not possess an understanding of any theory of physics in enough mechanistic detail to make accurate predictions. However, strong evidence of their own ignorance is not strong evidence that any particular theory which they don’t understand is actually false.
The students should also consider alternatives to the “everyone else throughout history has been rationalizing away problems with Newtonian mechanics” hypothesis. That hypothesis may indeed be one possible valid explanation of the students’ own observations given everything else that they (don’t) know, but are they willing to write down some odds ratios between that hypothesis and some others they can come up with? Some alternative hypotheses they could consider:
- they are mistaken about what the theory of Newtonian mechanics actually says
- they or their professor made a calculation or modelling error
- their professor is somehow trolling them
- they themselves are trolls inside of a fictional thought experiment
They probably won’t think of the last one on their own (unless the rest of the dialogue gets very weird), which just goes to show how often the true hypothesis lies entirely outside of one’s consideration.
(Aside: the last bit of dialog from the students reminds me of the beginner computer programmer whose code isn’t working for some unknown-to-them reason, and quickly concludes that it must be the compiler or operating system that is bugged. In real life, sometimes, it really is the compiler. But it’s usually not, especially if you’re a beginner just getting started with “Hello world”. And even if you’re more experienced, you probably shouldn’t bet on it being the compiler at very large odds, unless you already have a very detailed model of the compiler, the OS, and your own code.)

Max H Feb 28, 2024, 5:16 AM
9 points
6
in reply to: Nora Belrose’s comment on: Counting arguments provide no evidence for AI doom

It’s that they’re biased strongly against scheming, and they’re not going to learn it unless the training data primarily consists of examples of humans scheming against one another, or something.

I’m saying if they’re biased strongly against scheming, that implies they are also biased against usefulness to some degree.

As a concrete example, it is demonstrably much easier to create a fake blood testing company and scam investors and patients for $billions than it is to actually revolutionize blood testing. I claim that there is something like a core of general intelligence required to execute on things like the latter, which necessarily implies possession of most or all of the capabilities needed to pull off the former.

Max H Feb 28, 2024, 2:17 AM
16 points
6
on: Counting arguments provide no evidence for AI doom
Joe also discusses simplicity arguments for scheming, which suppose that schemers may be “simpler” than non-schemers, and therefore more likely to be produced by SGD.
I’m not familiar with the details of Joe’s arguments, but to me the strongest argument from simplicity is not that schemers are simpler than non-schemers, it’s that scheming itself is conceptually simple and instrumentally useful. So any system capable of doing useful and general cognitive work will necessarily have to at least be capable of scheming.
We will address this question in greater detail in a future post. However, we believe that current evidence about inductive biases points against scheming for a variety of reasons. Very briefly:
- Modern deep neural networks are ensembles of shallower networks. Scheming seems to involve chains of if-then reasoning which would be hard to implement in shallow networks.
- Networks have a bias toward low frequency functions— that is, functions whose outputs change little as their inputs change. But scheming requires the AI to change its behavior dramatically (executing a treacherous turn) in response to subtle cues indicating it is not in a sandbox, and could successfully escape.
- There’s no plausible account of inductive biases that does support scheming. The current literature on scheming appears to have been inspired by Paul Christiano’s speculations about malign intelligences in Solomonoff induction, a purely theoretical model of probabilistic reasoning which is provably unrealizable in the real world.^[16] Neural nets look nothing like this.
- In contrast, points of comparison that are more relevant to neural network training, such as isolated brain cortices, don’t scheme. Your linguistic cortex is not “instrumentally pretending to model linguistic data in pursuit of some hidden objective.”
Also, don’t these counterpoints prove too much? If networks trained via SGD can’t learn scheming, why should we expect models trained via SGD to be capable of learning or using any high-level concepts, even desirable ones?
These bullets seem like plausible reasons for why you probably won’t get scheming within a single forward pass of a current-paradigm DL model, but are already inapplicable to the real-world AI systems in which these models are deployed.
LLM-based systems are already capable of long chains of if-then reasoning, and can change their behavior dramatically given a different initial prompt, often in surprising ways.

If the most relevant point of comparison to NN training is an isolated brain cortex, then that’s just saying that NN training will never be useful in isolation, since an isolated brain cortex can’t do much (good or bad) unless it is actually hooked up to a body, or at least the rest of a brain.

Max H Feb 18, 2024, 6:35 PM
11 points
10
in reply to: Shankar Sivarajan’s comment on: Intuition for 1 + 2 + 3 + … = −1/12
My point is that there is a conflict for divergent series though, which is why 1 + 2 + 3 + … = −1/12 is confusing in the first place. People (wrongly) expect the extension of + and = to infinite series to imply stuff about approximations of partial sums and limits even when the series diverges.

My own suggestion for clearing up this confusion is that we should actually use less overloaded / extended notation even for convergent sums, e.g. $(1, \frac{1}{2}, \frac{1}{4}, . . .) \sim 2$ seems just as readable as the usual $l i m \to$ and $+ . . . +$ notation.

Max H Feb 18, 2024, 6:09 PM
20 points
23
on: Intuition for 1 + 2 + 3 + … = −1/12
In precisely the same sense that we can write
$1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + \dots = 2$ ,
despite that no real-world process of “addition” involving infinitely many terms may be performed in a finite number of steps, we can write
$1 + 2 + 3 + 4 + \dots = - \frac{1}{12}$ .
Well, not precisely. Because the first series converges, there’s a whole bunch more we can practically do with the equivalence-assignment in the first series, like using it as an approximation for the sum of any finite number of terms. −1/12 is a terrible approximation for any of the partial sums of the second series.
IMO the use of “=” is actually an abuse of notation by mathematicians in both cases above, but at least an intuitive / forgivable one in the first case because of the usefulness of approximating partial sums. Writing things as $(1, 2, 3, . . .) \sim - \frac{1}{12}$ or $R ((1, 2, 3, . . .)) = - \frac{1}{12}$ (R() denoting Ramanujan summation, which for convergent series is equivalent to taking the limit of partial sums) would make this all less mysterious.
In other words, (1, 2, 3, …) is in an equivalence class with −1/12, an equivalence class which also contains any finite series which sum to −1/12, convergent infinite series whose limit of partial sums is −1/12, and divergent series whose Ramanujan sum is −1/12.

Max H Jan 30, 2024, 3:41 PM
8 points
4
in reply to: Ege Erdil’s comment on: Processor clock speeds are not how fast AIs think
True, but isn’t this almost exactly analogously true for neuron firing speeds? The corresponding period for neurons (10 ms − 1 s) does not generally correspond to the timescale of any useful cognitive work or computation done by the brain.
Yes, which is why you should not be using that metric in the first place.

Well, clock speed is a pretty fundamental parameter in digital circuit design. For a fixed circuit, running it at a 1000x slower clock frequency means an exactly 1000x slowdown. (Real integrated circuits are usually designed to operate in a specific clock frequency range that’s not that wide, but in theory you could scale any chip design running at 1 GHz to run at 1 KHz or even lower pretty easily, on a much lower power budget.)
Clock speeds between different chips aren’t directly comparable, since architecture and various kinds of parallelism matter too, but it’s still good indicator of what kind of regime you’re in, e.g. high-powered / actively-cooled datacenter vs. some ultra low power embedded microcontroller.

Another way of looking at it is power density: below ~5 GHz or so (where integrated circuits start to run into fundamental physical limits), there’s a pretty direct tradeoff between power consumption and clock speed.
A modern high-end IC (e.g. a desktop CPU) has a power density on the order of 100 W / cm^2. This is over a tiny thickness; assuming 1 mm you get a 3-D power dissipation of 1000 W / cm^3 for a CPU vs. human brains that dissipate ~10 W / 1000 cm^3 = 0.01 watts / cm^3.
The point of this BOTEC is that there are several orders of magnitude of “headroom” available to run whatever the computation the brain is performing at a much higher power density, which, all else being equal, usually implies a massive serial speed up (because the way you take advantage of higher power densities in IC design is usually by simply cranking up the clock speed, at least until that starts to cause issues and you have to resort to other tricks like parallelism and speculative execution).
The fact that ICs are bumping into fundamental physical limits on clock speed suggests that they are already much closer to the theoretical maximum power densities permitted by physics, at least for silicon-based computing. This further implies that, if and when someone does figure out how to run the actual brain computations that matter in silicon, they will be able to run those computations at many OOM higher power densities (and thus OOM higher serial speeds, by default) pretty easily, since biological brains are very very far from any kind of fundamental limit on power density. I think the clock speed <-> neuron firing speed analogy is a good way of way of summarizing this whole chain of inference.

Will you still be saying this if future neural networks are running on specialized hardware that, much like the brain, can only execute forward or backward passes of a particular network architecture? I think talking about FLOP/s in this setting makes a lot of sense, because we know the capabilities of neural networks are closely linked to how much training and inference compute they use, but maybe you see some problem with this also?
I think energy and power consumption are the safest and most rigorous way to compare and bound the amount of computation that AIs are doing vs. humans. (This unfortunately implies a pretty strict upper bound, since we have several billion existence proofs that ~20 W is more than sufficient for lethally powerful cognition at runtime, at least once you’ve invested enough energy in the training process.)

Max H Jan 30, 2024, 3:32 AM
12 points
6
on: Processor clock speeds are not how fast AIs think
The clock speed of a GPU is indeed meaningful: there is a clock inside the GPU that provides some signal that’s periodic at a frequency of ~ 1 GHz. However, the corresponding period of ~ 1 nanosecond does not correspond to the timescale of any useful computations done by the GPU.
True, but isn’t this almost exactly analogously true for neuron firing speeds? The corresponding period for neurons (10 ms − 1 s) does not generally correspond to the timescale of any useful cognitive work or computation done by the brain.
The human brain is estimated to do the computational equivalent of around 1e15 FLOP/s.
“Computational equivalence” here seems pretty fraught as an analogy, perhaps more so than the clock speed <-> neuron firing speed analogy.
In the context of digital circuits, FLOP/s is a measure of an outward-facing performance characteristic of a system or component: a chip that can do 1 million FLOP/s means that every second it can take 2 million floats as input, perform some arithmetic operation on them (pairwise) and return 1 million results.
(Whether the “arithmetic operations” are FP64 multiplication or FP8 addition will of course have a big effect on the top-level number you can report in your datasheet or marketing material, but a good benchmark suite will give you detailed breakdowns for each type.)
But even the top-line number is (at least theoretically) a very concrete measure of something that you can actually get out of the system. In contrast, when used in “computational equivalence” estimates of the brain, FLOP/s are (somewhat dubiously, IMO) repurposed as a measure of what the system is doing internally.
So even if the 1e15 “computational equivalence” number is right, AND all of that computation is irreducibly a part of the high-level cognitive algorithm that the brain is carrying out, all that means is that it necessarily takes at least 1e15 FLOP/s to run or simulate a brain at neuron-level fidelity. It doesn’t mean that you can’t get the same high-level outputs of that brain through some other much more computationally efficient process.
(Note that “more efficient process” need not be high-level algorithms improvements that look radically different from the original brain-based computation; the efficiencies could come entirely from low-level optimizations such as not running parts of the simulation that won’t affect the final output, or running them at lower precision, or with caching, etc.)
Separately, I think your sequential tokens per second calculation actually does show that LLMs are already “thinking” (in some sense) several OOM faster than humans? 50 tokens/sec is about 5 lines of code per second, or 18,000 lines of code per hour. Setting aside quality, that’s easily 100x more than the average human developer can usually write (unassisted) in an hour, unless they’re writing something very boilerplate or greenfield.
(The comparison gets even more stark when you consider longer timelines, since an LLM can generate code ²⁴⁄₇ without getting tired: 18,000 lines / hr is ~150 million lines in a year.)
The main issue with current LLMs (which somewhat invalidates this whole comparison) is that they can pretty much only generate boilerplate or greenfield stuff. Generating large volumes of mostly-useless / probably-nonsense boilerplate quickly doesn’t necessarily correspond to “thinking faster” than humans, but that’s mostly because current LLMs are only barely doing anything that can rightfully be called thinking in the first place.
So I agree with you that the claim that current AIs are thinking faster than humans is somewhat fraught. However, I think there are multiple strong reasons to expect that future AIs will think much faster than humans, and the clock speed <-> neuron firing analogy is one of them.

Max H Jan 16, 2024, 5:49 PM
4 points
0
on: Air Conditioner Test Results & Discussion
I haven’t read every word of the 200+ comments across all the posts about this, but has anyone considered how active heat sources in the room could confound / interact with efficiency measurements that are based only on air temperatures? Or be used to make more accurate measurements, using a different (perhaps nonstandard) criterion for efficiency?
Maybe from the perspective of how comfortable you feel, the only thing that matters is air temperature.
But consider an air conditioner that cools a room with a bunch of servers or space heaters in it to an equilibrium temperature of 70° F in a dual-hose setup vs. 72° in a single-hose setup, assuming the power consumption of the air conditioner and heaters is fixed in both cases. Depending on how much energy the heaters themselves are consuming, a small difference in temperature could represent a pretty big difference in the amount of heat energy the air conditioner is actually removing from the room in the different setups.
A related point / consideration: if there are enough active heat sources, I would expect their effect on cooling to dominate the effects from indoor / outdoor temperature difference, infiltration, etc. But even in a well-insulated room with few or no active heat sources, there’s still all the furniture and other non-air stuff in the room that has to equilibrate to the air temperature before it stops dissipating some amount of heat into the air. I suspect that this can go on happening for a while after the air temperature has (initially / apparently) equilibrated, but I’ve never tested it by sticking a giant meat thermometer into my couch cushions or anything like that.
Anecdotally, I’ve noticed that when I come back from a long absence (e.g. vacation) and turn on my window unit air conditioner for the first time, the air temperature seems to initially drop almost as quickly as it always does, but if I then turn the air conditioner off after a short while, the temperature seems to bounce back to a warmer temperature noticeably more quickly than if I’ve been home all day, running the air conditioner such that the long-term average air temperature (and thus the core temperature of all my furniture, flooring, etc.) is much lower.

Max H Jan 14, 2024, 5:59 PM
10 points
3
on: Against most AI risk analogies
Part of this is that I don’t share other people’s picture about what AIs will actually look like in the future. This is only a small part of my argument, because my main point is that that we should use analogies much less frequently, rather than switch to different analogies that convey different pictures.
You say it’s only a small part of your argument, but to me this difference in outlook feels like a crux. I don’t share your views of what the “default picture” probably looks like, but if I did, I would feel somewhat differently about the use of analogies.
For example, I think your “straightforward extrapolation of current trends” is based on observations of current AIs (which are still below human-level in many practical senses), extrapolated to AI systems that are actually smarter and more capable than most or all humans in full generality.
On my own views, the question of what the future looks like is primarily about what the transition looks like between the current state of affairs, in which the state and behavior of most nearby matter and energy is not intelligently controlled or directed, to one in which it is. I don’t think extrapolations of current trends are much use in answering such questions, in part because they don’t actually make concrete predictions far enough into the future.
For example, you write:
They will be numerous and everywhere, interacting with us constantly, assisting us, working with us, and even providing friendship to hundreds of millions of people. AIs will be evaluated, inspected, and selected by us, and their behavior will be determined directly by our engineering.
I find this sorta-plausible as a very near-term prediction about the next few years, but I think what happens after that is a far more important question. And I can’t tell from your description / prediction about the future here which of the following things you believe, if any:
- No intelligent system (or collection of such systems) will ever have truly large-scale effects on the world (e.g. re-arranging most of the matter and energy in the universe into computronium or hedonium, to whatever extent that is physically possible).
- Large-scale effects that are orders of magnitude larger or faster than humanity can currently collectively exert are physically impossible or implausible (e.g. that there are diminishing returns to intelligence past human-level, in terms of the ability it confers to manipulate matter and energy quickly and precisely and on large scales).
- Such effects, if they are physically possible, are likely to be near-universally directed ultimately by a human or group of humans deliberately choosing them.
- The answer to these kinds of questions is currently too uncertain or unknowable to be worth having a concrete prediction about.
My own view is that you don’t need to bring in results or observations of current AIs to take a stab at answering these kinds of questions, and that doing so can often be misleading, by giving a false impression that such answers are backed by empiricism or straightforwardly-valid extrapolation.
My guess is that close examination of disagreements on such topics would be more fruitful for identifying key cruxes likely to be relevant to questions about actually-transformative smarter-than-human AGI, compared to discussions centered around results and observations of current AIs.
I admit that a basic survey of public discourse seems to demonstrate that my own favored approach hasn’t actually worked out very well as a mechanism for building shared understanding, and moreover is often frustrating and demoralizing for participants and observers on all sides. But I still think such approaches are better than the alternative of a more narrow focus on current AIs, or on adding “rigor” to analogies that were meant to be more explanatory / pedagogical than argumentative in the first place. In my experience, the end-to-end arguments and worldviews that are built on top of more narrowly-focused / empirical observations and more surface-level “rigorous” theories, are prone to relatively severe streetlight effects, and often lack local validity, precision, and predictive usefulness, just as much or more so than many of the arguments-by-analogy they attempt to refute.

Max H Jan 13, 2024, 5:14 PM
13 points
7
in reply to: Carl Feynman’s comment on: Why do so many think deception in AI is important?

a position of no power and moderate intelligence (where it is now)

Most people are quite happy to give current AIs relatively unrestricted access to sensitive data, APIs, and other powerful levers for effecting far-reaching change in the world. So far, this has actually worked out totally fine! But that’s mostly because the AIs aren’t (yet) smart enough to make effective use of those levers (for good or ill), let alone be deceptive about it.

To the degree that people don’t trust AIs with access to even more powerful levers, it’s usually because they fear the AI getting tricked by adversarial humans into misusing those levers (e.g. through prompt injection), not fear that the AI itself will be deliberately tricky.

But we’re not going to deliberately allow such a position unless we can trust it.

One can hope, sure. But what I actually expect is that people will generally give AIs more power and trust as they get more capable, not less.

Max H Jan 1, 2024, 6:24 PM
5 points
3
in reply to: jessicata’s comment on: 2023 in AI predictions
Tsvi comes to mind: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce

Max H Jan 1, 2024, 6:04 PM
4 points
2
in reply to: Algon’s comment on: Bayesian updating in real life is mostly about understanding your hypotheses
Is it “inhabiting the other’s hypothesis” vs. “finding something to bet on”?
Yeah, sort of. I’m imagining two broad classes of strategy for resolving an intellectual disagreement:
- Look directly for concrete differences of prediction about the future, in ways that can be suitably operationalized for experimentation or betting. The strength of this method is that it almost-automatically keeps the conversation tethered to reality; the weakness is that it can lead to a streetlight effect of only looking in places where the disagreement can be easily operationalized.
- Explore the generators of the disagreement in the first place, by looking at existing data and mental models in different ways. The strength of this method is that it enables the exploration of less-easily operationalized areas of disagreement; the weakness is that it can pretty easily degenerate into navel-gazing.
An example of the first bullet is this comment by TurnTrout.
An example of the second would be a dialogue or post exploring how differing beliefs and ways of thinking about human behavior generate different starting views on AI, or lead to different interpretations of the same evidence.

Both strategies can be useful in different places, and I’m not trying to advocate for one over the other. I’m saying specifically that the rationalist practice of applying the machinery of Bayesian updating in as many places as possible (e.g. thinking in terms of likelihood ratios, conditioning on various observations as Bayesian evidence, tracking allocations of probability mass across the whole hypothesis space) works at least as well or better when using the second strategy, compared to applying the practice when using the first strategy. The reason thinking in terms of Bayesian updating works well when using the second strategy is that it can help to pinpoint the area of disagreement and keep the conversation from drifting into navel-gazing, even if it doesn’t actually result in any operationalizable differences in prediction.

Bayesian updating in real life is mostly about understanding your hypotheses

Max HJan 1, 2024, 12:10 AM

68 points

4 comments11 min readLW link

Max H Dec 20, 2023, 9:10 PM
7 points
0
on: Goal-Completeness is like Turing-Completeness for AGI
The Cascading Style Sheets (CSS) language that web pages use for styling HTML is a pretty representative example of surprising Turing Completeness:
Haha. Perhaps higher entities somewhere in the multiverse are emulating human-like agents on ever more exotic and restrictive computing substrates, the way humans do with Doom and Mario Kart.
(Front page of 5-D aliens’ version of Hacker News: “I got a reflective / self-aware / qualia-experiencing consciousness running on a recycled first-gen smart toaster”.)
Semi-related to the idea of substrate ultimately not mattering too much (and very amusing if you’ve never seen it): They’re Made out of Meat

Max H Dec 12, 2023, 4:41 AM
6 points
1
on: On plans for a functional society
ok, so not attempting to be comprehensive:
- Energy abundance...
I came up with a similar kind of list here!

I appreciate both perspectives here, but I lean more towards kave’s view: I’m not sure how much overall success hinges on whether there’s an explicit Plan or overarching superstructure to coordinate around.
I think it’s plausible that if a few dedicated people / small groups manage to pull off some big enough wins in unrelated areas (e.g. geothermal permitting or prediction market adoption), those successes could snowball in lots of different directions pretty quickly, without much meta-level direction.
I have a sense that lots of people are not optimistic about the future or about their efforts improving the future, and so don’t give it a serious try.

I share this sense, but the good news is the incentives are mostly aligned here, I think? Whatever chances you assign to the future having any value whatsoever, things are usually nicer for you personally (and everyone around you) if you put some effort into trying to do something along the way.
Like, you shouldn’t work yourself ragged, but my guess is for most people, working on something meaningful (or at least difficult) is actually more fun and rewarding compared to the alternative of doing nothing or hedonism or whatever, even if you ultimately fail. (And on the off-chance you succeed, things can be a lot more fun.)

Max H Dec 1, 2023, 2:24 AM
2 points
0
on: FixDT
Neat!

Does anyone who knows more neuroscience and anatomy than me know if there are any features of the actual process of humans learning to use their appendages (e.g. an infant learning to curl / uncurl their fingers) that correspond to the example of the robot learning to use its actuator?

Like, if we assume certain patterns of nerve impulses represent different probabilities, can we regard human hands as “friendly actuators”, and the motor cortex as learning the fix points (presumably mostly during infancy)?

Max H

Bayesian up­dat­ing in real life is mostly about un­der­stand­ing your hypotheses

Bayesian updating in real life is mostly about understanding your hypotheses