Veedrac

Karma: 1,155

Veedrac 22 Mar 2024 8:24 UTC
18 points
3
on: “Deep Learning” Is Function Approximation
It took me a good while reading this to figure out whether it was a deconstruction of tabooing words. I would have felt less so if the post didn’t keep replacing terms with ones that are both no less charged and also no more descriptive of the underlying system, and then start drawing conclusions from the resulting terms’ aesthetics.
With regards to Yudkowsky’s takes, the key thing to keep in mind is that Yudkowsky started down his path by reasoning backwards from properties ASI would have, not from reasoning forward from a particular implementation strategy. The key reason to be concerned that outer optimization doesn’t define inner optimization isn’t a specific hypothesis about whether some specific strategy with neural networks will have inner optimizers, it’s because ASI will by necessity involve active optimization on things, and we want our alignment techniques to have at least any reason to work in that regime at all.

Veedrac 14 Mar 2024 2:45 UTC
2 points
0
in reply to: gwern’s comment on: artifex0′s Shortform
There is no ‘the final token’ for weights not at the final layer.

Because that is where all the gradients flow from, and why the dog wags the tail.

Aggregations of things need not be of the same kind as their constituent things? This is a lot like calling an LLM an activation optimizer. While strictly in some sense true of the pieces that make up the training regime, it’s also kind of a wild way to talk about things in the context of ascribing motivation to the resulting network.

I think maybe you’re intending ‘next token prediction’ to mean something more like ‘represents the data distribution, as opposed to some metric on the output’, but if you are this seems like a rather unclear way of stating it.

Veedrac 13 Mar 2024 19:09 UTC
6 points
2
in reply to: gwern’s comment on: artifex0′s Shortform
You’re at token i in a non-final layer. Which token’s output are you optimizing for? i+1?

By construction a decoder-only transformer is agnostic over what future token it should be informative to within the context limit, except in the sense that it doesn’t need to represent detail that will be more cheaply available from future tokens.

As a transformer is also unrolled in the context dimension, the architecture itself is effectively required to be generic both in what information it gathers and where that information is used. Bias towards next token prediction is not so much a consequence of reward in isolation, but of competitive advantage: at position i, the network has an advantage in predicting i+1 over the network at previous locations by having more recent tokens, and an advantage over the network at future tokens by virtue of still needing to predict token i+1. However, if a token is more predictive of some abstract future token than the next token precisely, say it’s a name that might be referenced later, one would expect the dominant learnt effect to be non-myopically optimizing for later use in some timestamp-invariant way.

Veedrac 13 Mar 2024 2:27 UTC
2 points
0
in reply to: gwern’s comment on: artifex0′s Shortform
If they appear to care about predicting future tokens, (which they do because they are not myopic and they are imitating agents who do care about future states which will be encoded into future tokens), it is solely as a way to improve the next-token prediction.
I think you’re just fundamentally misunderstanding the backwards pass in an autoregressive transformer here. Only a very tiny portion of the model is exclusively trained on next token prediction. Most of the model is trained on what might be called instead, say, conditioned future informativity.

Veedrac 10 Mar 2024 20:28 UTC
4 points
0
in reply to: adastra22’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
I greatly appreciate the effort in this reply, but I think it’s increasingly unclear to me how to make efficient progress on our disagreements, so I’m going to hop.

Veedrac 10 Mar 2024 19:10 UTC
5 points
1
in reply to: adastra22’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
If you say “Indeed it’s provable that you can’t have a faster algorithm than those O(n^3) and O(n^4) approximations which cover all relevant edge cases accurately” I am quite likely to go on a digression where I try to figure out what proof you’re pointing at and why you think it’s a fundamental barrier, and it seems now that per a couple of your comments you don’t believe it’s a fundamental barrier, but at the same time it doesn’t feel like any position has been moved, so I’m left rather foggy about where progress has been made.
I think it’s very useful that you say
I’m not saying that AI can’t develop useful heuristic approximations for the simulation of gemstone-based nano-mechanical machinery operating in ultra-high vacuum. I’m saying that it can’t do so as a one-shot inference without any new experimental work
since this seems like a narrower place to scope our conversation. I read this to mean:
1. You don’t know of any in principle barrier to solving this problem,
2. You believe the solution is underconstrained by available evidence.
I find the second point hard to believe, and don’t really see anywhere you have evidenced it.
As a maybe-relevant aside to that, wrt.
You’re saying that AI could take the garbage and by mere application of thought turn it into something useful. That’s not in line with the actual history of the development of useful AI outputs.
I think you’re talking of ‘mere application of thought’ like it’s not the distinguishing feature humanity has. I don’t care what’s ‘in line with the actual history’ of AI, I care what a literal superintelligence could do, and this includes a bunch of possibilities like:
- Making inhumanly close observation of all existing data,
- Noticing new, inhumanly-complex regularities in said data,
- Proving new simplifying regularities from theory,
- Inventing new algorithms for heuristic simulation,
- Finding restricted domains where easier regularities hold,
- Bifurcating problem space and operating over each plausible set,
- Sending an interesting email to a research lab to get choice high-ROI data.
We can ignore the last one for this conversation. I still don’t understand why the others are deemed unreasonable ways of making progress on this task.
I appreciated the comments on time complexity but am skipping it because I don’t expect at this point that it lies at the crux.

Veedrac 10 Mar 2024 2:36 UTC
5 points
0
in reply to: adastra22’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
Thanks, I appreciate the attempt to clarify. I do though think there’s some fundamental disagreement about what we’re arguing over here that’s making it less productive than it could be. For example,
The fact that this has been an extremely active area of research for over 80 years with massive real-world implications, and we’re no closer to finding such a simplified heuristic.
I think both:
1. Lack of human progress doesn’t necessarily mean the problem is intrinsically unsolvable by advanced AI. Humans often take a bunch of time before proving things.
2. It seems not at all the case that algorithmic progress isn’t happening, so it’s hardly a given that we’re no closer to a solution unless you first circularly assume that there’s no solution to arrive at.
If you’re starting out with an argument that we’re not there yet, this makes me think more that there’s some fundamental disagreement about how we should reason about ASI, more than your belief being backed by a justification that would be convincing to me had only I succeeded at eliciting it. Claiming that a thing is hard is at most a reason not to rule out that it’s impossible. It’s not a reason on its own to believe that it is impossible.
With regard to complexity,
- I failed to understand the specific difference with protein folding. Protein folding is NP-hard, which is significantly harder than O(n³).
- I failed to find the source for the claim that O(n³) or O(n⁴) are optimal. Actually I’m pretty confused how this is even a likely concept; surely if the O(n³) algorithm is widely useful then the O(n⁴) proof can’t be that strong of a bound on practical usefulness? So why is this not true of the O(n³) proof as well?
It’s maybe true that protein folding is easier to computationally verify solutions to, but first, can you prove this, and second, on what basis are you claiming that existing knowledge is necessarily insufficient to develop better heuristics than the ones we already have? The claim doesn’t seem to complete to me.
It’s magical thinking to assume that an AI will just one-shot this into existence.
Please note that I’ve not been making the claim that ASI could necessarily solve this problem. I have been making the claim that the arguments in this post don’t usefully support the claim that it can’t. It is true that largely on priors I expect it should be able to, but my priors also aren’t particularly useful ones to this debate and I have tried to avoid making comments that are dependent on them.

Veedrac 5 Mar 2024 17:41 UTC
2 points
0
in reply to: adastra22’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
And what reason do you have for thinking it can’t be usefully approximated in some sufficiently productive domain, that wouldn’t also invalidly apply to protein folding? I think it’s not useful to just restate that there exist reasons you know of, I’m aiming to actually elicit those arguments here.

Veedrac 5 Mar 2024 2:30 UTC
2 points
0
on: Claude 3 claims it’s conscious, doesn’t want to die or be modified
Given Claude is not particularly censored in this regard (in the sense of refusing to discuss the subject), I expect the jailbreak here to only serve as priming.

Veedrac 5 Mar 2024 2:23 UTC
5 points
1
in reply to: adastra22’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
Well yes, nobody thinks that existing techniques suffice to build de-novo self-replicating nano machines, but that means it’s not very informative to comment on the fallibility of this or that package or the time complexity of some currently known best approach without grounding in the necessity of that approach.

One has to argue instead based on the fundamental underlying shape of the problem, and saying accurate simulation is O(n⁷) is not particularly more informative to that than saying accurate protein folding is NP. I think if the claim is that you can’t make directionally informative predictions via simulation for things meaningfully larger than helium then one is taking the argument beyond where it can be validly applied. If the claim is not that, it would be good to hear it clearly stated.

Veedrac 4 Mar 2024 10:20 UTC
2 points
0
in reply to: adastra22’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
Could you quote or else clearly reference a specific argument from the post you found convincing on that topic?

Veedrac 3 Mar 2024 3:42 UTC
4 points
0
in reply to: Tomás B.’s comment on: Supposing the 1bit LLM paper pans out
Communication overhead won’t drop faster than linear.

Veedrac 16 Feb 2024 8:52 UTC
2 points
in reply to: dkirmani’s comment on: Optimality is the tiger, and agents are its teeth
Which is equivalent to saying if you only care about a situation where none of your observations correlate with any of your other observations and none of your actions interact with any of your observations then your observations are valueless. Which is a true but empty statement, and doesn’t meaningfully affect whether there is an optimality axis that it’s possible to be better on.

Veedrac 13 Jan 2024 7:26 UTC
2 points
in reply to: dkirmani’s comment on: Optimality is the tiger, and agents are its teeth
There is a useful generality axis and a useful optimality axis and you can meaningfully progress along both at the same time. If you think no free lunch theorems disprove this then you are confused about no free lunch theorems.

Veedrac 10 Dec 2023 21:56 UTC
3 points
1
on: The Offense-Defense Balance Rarely Changes
Cybersecurity — this is balanced by being a market question. You invest resources until your wall makes attacks uneconomical, or otherwise economically survivable. This is balanced over time because the defense is ‘people spend time to think about how to write code that isn’t wrong.’ In a world where cyber attacks were orders of magnitude more powerful, people would just spend more time making their code and computing infrastructure less wrong. This has happened in the past.

Deaths in conflicts — this is balanced by being a market question. People will only spend so much to protect their regime. This varies, as you see in the graph, by a good few orders of magnitude, but there’s little historic reason to think that there should be a clean linear trend over time across the distant past, or that how much people value their regime is proportionate somehow to the absolute military technology of the opposing one.

Genome sequencing biorisk — 8ish years is not a long time to claim nobody is going to use it for highly damaging terror attacks? Most of this graph is irrelevant to that; unaffordable by 100x and unaffordable by 1000x still just equally resolve to it not happening. 8ish years at an affordable price is maybe at best enough for terrorists with limited resources to start catching on that they might want to pay attention.

Veedrac 6 Oct 2023 0:48 UTC
24 points
11
in reply to: Rob Bensinger’s comment on: Evaluating the historical value misspecification argument
This is a bad analogy. Phoning a human fails dominantly because humans are less smart than the ASI they would be trying to wrangle. Contra, Yudkowsky has even said that were you to bootstrap human intelligence directly, there is a nontrivial short that the result is good. This difference is load bearing!

This does get to the heart of the disagreement, which I’m going to try to badly tap out on my phone.

The old, MIRI-style framing was essentially: we are going to build an AGI out of parts that are not intrinsically grounded in human values, but rather good abstract reasoning, during execution of which human values will be accurately deduced, and as this is after the point of construction, we hit the challenge of formally specifying what properties we want to preserve without being able to point to those runtime properties at specification.

The newer, contrasting framing is essentially: we are going to bulld an AGI out of parts that already have strong intrinsic, conceptual-level understanding of the values we want them to preserve, and being able to directly point at those values is actually needle-moving towards getting a good outcome. This is hard to do right now, with poor interpretability and steerability of these systems, but is nonetheless a relevant component of a potential solution.

Veedrac 2 Oct 2023 6:20 UTC
2 points
0
on: Fifty Flips
I liked this! The game was plenty interesting and reasonably introduced. It’s a fun twist on induction games with the addition of reasoning over uncertainty rather than exactly guessing a rule, though it does have the downside the relatively small number of samples can make the payoff dominated by randomness.

To offer one small piece of constructive advice on the execution, I did wish the flip history autoscrolled to the newest entry.

Veedrac 1 Oct 2023 18:55 UTC
2 points
0
in reply to: Portia’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
I think I implicitly answered you elsewhere, though I’ll add a more literal response to your question here.

On a personal level, none of this is relevant to AI risk. Yudkowsky’s interest in it seems like more of a byproduct of his reading choices when he was young and impressionable than anything else, which is not reading I shared. Neither he nor I think this is necessary for xrisk scenarios, with me probably being on the more skeptical side, and me believing more in practical impediments that strongly encourage doing the simple things that work, eg. conventional biotech.

Due to this not being a crux and not having the same personal draw towards discussing it, I basically don’t think about this when I think about modelling AI risk scenarios. I think about it when it comes up because it’s technically interesting. If someone is reasoning about this because they do think it’s a crux for their AI risk scenarios, and they came to me for advice, I’d suggest testing that crux before I suggested being more clever about de novo nanotech arguments.

Veedrac 1 Oct 2023 18:24 UTC
3 points
1
in reply to: titotal’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
Rather than focusing on where I disagree with this, I want to emphasize the part where I said that I liked a lot of the discussion if I frame it in my head differently. I think if you opened the Introduction section with the second paragraph of this reply (“In my post I have explained”), rather than first quoting Yudkowsky, you’d set the right expectations going into it. The points you raise are genuinely interesting, and tons of people have worldviews that this would be much more convincing to than Yudkowsky’s.
What links here?
- Veedrac's comment on “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation by titotal (1 Oct 2023 18:55 UTC; 2 points)

Veedrac 1 Oct 2023 18:09 UTC
7 points
1
in reply to: 1a3orn’s comment on: “Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation

apart from pointing out the actual physical difficulties in doing the thing

This excludes most of the potential good arguments! If you can show that large areas of the solution space seem physically unrealizable, that’s an argument that potentially generalizes to ASI. For example, I think people can suggest good limits on how ASI could and couldn’t traverse the galaxy, and trivially rule out threats like ‘the AI crashes the moon into Earth’, because of physical argument.

To hypothesize an argument of this sort that might be persuasive, at least to people able to verify such claims: ‘Synthesis of these chemicals is not energetically feasible at these scales because these bonds take $X energy to form, but it’s only feasible to store $Y energy in available bonds. This limits you to a very narrow set of reactions which seems unable to produce the desired state. Thus larger devices are required, absent construction under an external power source.’ I think a similar argument could plausibly exist around object stickiness, though I don’t have the chemistry knowledge to give a good framing for how that might look.

There aren’t as many arguments once we exclude physical arguments. If you wanted to argue that it was plausibly physically realizable but that strong ASI wouldn’t figure it out, I suppose some in-principle argument that it involves solving a computationally intractable challenge in leu of experiment might work, though that seems hard to argue in reality.

It’s generally hard to use weaker claims to limit far ASI, because, being by definition qualitatively and quantitatively smarter than us, it can reason about things in ways that we can’t. I’m happy to think there might exist important, practically-solvable-in-principle tasks that an ASI fails to solve, but it seems implausible for me to know ahead of time which tasks those are.