Insub

Karma: 320

Insub 16 Apr 2026 16:25 UTC
1 point
0
in reply to: habryka’s comment on: “You Have Not Been a Good User” (LessWrong’s second album)
This title bounces off my brain too, and I did some thinking about why.
The sentence “Friday’s far enough for milk” is obviously a shortened version of a longer sentence. There are a few ways the brain might try to fill in that longer sentence, and mine does something like this:
“Friday is far enough from now that buying milk is worth it”. This is weird: Does this imply that if Friday were closer to now, then buying milk wouldn’t be worth it anymore? The whole point is that milk is relevant on a Friday-like time scale, and Friday is close enough to now that buying milk is therefore worth it. So I think that this is where the primary confusion comes from: the most straightforward fill-in of the sentence (at least to my brain) leads to the opposite of its intended meaning.
The real sentence-extension goes like “Friday is far enough for humans to survive to make buying milk now worth it”. The whole “humans and surviving” part is crucial: if your brain doesn’t fill that part in immediately from context, then your reading of the sentence will be wrong. Friday is not the thing that is far enough away; It’s actually the end of the world that is far enough away.

Insub 29 Jan 2025 17:57 UTC
2 points
0
in reply to: Steven Byrnes’s comment on: Mechanisms too simple for humans to design
Thus, the design information all has to be in the DNA
The OP mentioned non-DNA sources of information briefly, but I still feel like they’re not being given enough weight.
In order to fully define e.g. a human, you need to specify:
- The DNA
- A full specification of the egg where the DNA will start its life
- A full specification of the womb in which the egg will grow into a human
If you gave a piece of DNA to an alien and didn’t tell them how to interpret it, then they’d have no way of building a human. You’d need to give them a whole lot of other information too.
Even looking at different DNA for different organisms, each organism’s DNA expects to be interpreted differently (as opposed to source code, which mostly intends to be interpreted by the same OS/hardware as other source code). If you put a lizard’s DNA into a human’s egg and womb, I’m guessing that would not successfully build a lizard.
So I guess my question is: to what extent should the complexity of the interpreter be included in the complexity of the thing-being-interpreted? In one sense I feel like Word’s code does fully specify Word amongst all other possible software, but in another sense (including the interpreter) I feel like it does not.

Insub 16 Aug 2024 18:43 UTC
1 point
0
in reply to: Bogdan Ionut Cirstea’s comment on: Danger, AI Scientist, Danger
Transcribed from the screenshot “The AI Scientist Bloopers” in the post

Insub 16 Aug 2024 18:34 UTC
42 points
20
on: Danger, AI Scientist, Danger
Regarding spawning instances of itself, the AI said:
This will ensure the next experiment is automatically started after the current one completes
And regarding increasing the timeout, it said:
Run 2 timed out after 7200 seconds
To address the timeout issue, we need to modify experiment.py to:
1. Increase the timeout limit or add a mechanism to handle timeouts
I’ve seen junior engineers do silly things to fix failing unit tests, like increasing a timeout or just changing what the test is checking without any justification. I generally attribute these kinds of things to misunderstanding rather than deception—the junior engineer might misunderstand the goal as “get the test to show a green checkmark” when really the goal was “prove that the code is correct, using unit tests as one tool for doing so”.
The way the AI was talking about its changes here, it feels much more like a junior engineer that didn’t really understand the task & constraints than like someone who is being intentionally deceptive.
The above quotes don’t feel like the AI intentionally “creating new instances of itself” or “seeking resources” to me. It feels like someone who only shallowly understands the task just doing the first thing that comes to mind in order to solve the problem that’s immediately in front of them.
That being said, in some sense it doesn’t really matter why the AI chooses to do something like break out of its constraints. Whether it’s doing it because it fully understand the situation or because it just naively (but correctly) sees a human-made barrier as “something standing between me and the green checkmark”, I suppose the end result is still misaligned behavior.
So by and large I still agree this is concerning behavior, though I don’t feel like it’s as much of a knock-down “this is instrumental convergence in the real world” as this post seems to make out.

Insub 24 Apr 2024 21:49 UTC
6 points
2
on: Is being a trans woman (or just low-T) +20 IQ?
The U-Shaped Curve study you linked does not seem to support really any solid conclusion about a T-vs-IQ relationship (in this quote, S men = “successful educational level”, NS men = “unsuccessful educational level”):
- In the total sample (S + NS men), the correlation between T to IQ was best described by a polynomial regression (3rd order), exhibiting an inverse U-shaped regression.
- In S-men, the relationship between T and IQ was best described by a polynomial regression equation of the 3rd order; however, the relationship was not U-shaped, but rather a positive correlation (low T: low IQ and high T high IQ).
- In NS-men, there was an inverse U-shaped correlation between T and IQ (low and very high T: low IQ and moderate T: high IQ)
So there are three totally different best regressions depending on which population you choose? Sounds fishy / likely to be noise to me.
And in the population that most represents readers of this blog (S men), the correlation was that more T = more IQ.
I’m only reading the abstract here and can’t see the actual plots or how many people were in each group. But idk, this doesn’t seem very strong.
The other study you linked does say:
Interestingly, intellectual ability measured as IQ was negatively associated with salivary testosterone in both sexes. Similar results were found in our follow-up study showing significantly lower testosterone in gifted boys than controls (Ostatnikova et al. 2007).
which seems to support the idea. But it still doesn’t really prove the causality—lots of things presumably influence intelligence, and I wouldn’t be surprised if some of them influence T as well.

Insub 4 Mar 2024 3:45 UTC
6 points
11
on: The Parable Of The Fallen Pendulum—Part 1
I would say:
A theory always takes the following form: “given [premises], I expect to observe [outcomes]”. The only way to say that an experiment has falsified a theory is to correctly observe/set up [premises] but then not observe [outcomes].
If an experiment does not correctly set up [premises], then that experiment is invalid for falsifying or supporting the theory. The experiment gives no (or nearly no) Bayesian evidence either way.
In this case, [premises] are the assumptions we made in determining the theoretical pendulum period; things like “the string length doesn’t change”, “the pivot point doesn’t move”, “gravity is constant”, “the pendulum does not undergo any collisions”, etc. The fact that (e.g.) the pivot point moved during the experiment invalidates the premises, and therefore the experiment does not give any Bayesian evidence one way or another against our theory.
Then the students could say:
“But you didn’t tell us that the pivot point couldn’t move when we were doing the derivation! You could just be making up new “necessary premises” for your theory every time it gets falsified!”
In which case I’m not 100% sure what I’d say. Obviously we could have listed out more assumptions that we did, but where do you stop? “the universe will not explode during the experiment”...?

Insub 1 Feb 2024 5:16 UTC
3 points
2
in reply to: the gears to ascension’s comment on: Why I take short timelines seriously
By “reliable” I mean it in the same way as we think of it for self-driving cars. A self-driving car that is great 99% of the time and fatally crashes 1% of the time isn’t really “high skill and unreliable”—part of having “skill” in driving is being reliable.
In the same way, I’m not sure I would want to employ an AI software engineer that 99% of the time was great, but 1% of the time had totally weird inexplicable failure modes that you’d never see with a human. It would just be stressful to supervise, to limit its potential harmful impact to the company, etc. So it seems to me that AI’s won’t be given control of lots of things, and therefore won’t be transformative, until that reliability threshold is met.

Insub 30 Jan 2024 5:41 UTC
6 points
3
in reply to: Mateusz Bagiński’s comment on: Why I take short timelines seriously
Two possibilities have most of the “no agi in 10 years” probability mass for me:
- The next gen of AI really starts to scare people, regulation takes off, and AI goes the way of nuclear reactors
- Transformer style AI goes the way of self driving cars and turns out to be really hard to get from 99% reliable to the necessary 99.9999% that you need for actual productive work

Insub 2 Nov 2023 22:11 UTC
2 points
5
in reply to: avturchin’s comment on: Saying the quiet part out loud: trading off x-risk for personal immortality
Well sure, but the interesting question is the minimum value of P at which you’d still push

Insub 2 Nov 2023 18:32 UTC
6 points
0
on: Saying the quiet part out loud: trading off x-risk for personal immortality
I also agree with the statement. I’m guessing most people who haven’t been sold on longtermism would too.

When people say things like “even a 1% chance of existential risk is unacceptable”, they are clearly valuing the long term future of humanity a lot more than they are valuing the individual people alive right now (assuming that the 99% in that scenario above is AGI going well & bringing huge benefits).

Related question: You can push a button that will, with probability P, cure aging and make all current humans immortal. But with probability 1-P, all humans die. How high does P have to be before you push? I suspect that answers to this question are highly correlated with AI caution/accelerationsim

Insub 16 Oct 2023 16:06 UTC
1 point
0
in reply to: pathos_bot’s comment on: My AI Predictions 2023 − 2026
Not sure I understand; if model runs generate value for the creator company, surely they’d also create value that lots of customers would be willing to pay for. If every model run generates value, and there’s ability to scale, then why not maximize revenue by maximizing the number of people using the model? The creator company can just charge the customers, no? Sure, competitors can use it too, but does that really override losing an enormous market of customers?

Insub 12 Oct 2023 14:23 UTC
8 points
5
on: Evolution Solved Alignment (what sharp left turn?)
I won’t argue with the basic premise that at least on some metrics that could be labeled as evolution’s “values”, humans are currently doing very well.
But, the following are also true:
1. Evolution has completely lost control. Whatever happens to human genes from this point forward is entirely dependent on the whims of individual humans.
2. We are almost powerful enough to accidentally cause our total extinction in various ways, which would destroy all value from evolution’s perspective
3. There are actions that humans could take, and might take once we get powerful enough, that would seem fine to us but would destroy all value from evolution’s perspective.
Examples of such actions in (3) could be:
- We learn to edit the genes of living humans to gain whatever traits we want. This is terrible from evolution’s perspective, if evolution is concerned with maximizing the prevalence of existing human genes
- We learn to upload our consciousness onto some substrate that does not use genes. This is also terrible from a gene-maximizing perspective
None of those actions is guaranteed to happen. But if I were creating an AI, and I found that it was enough smarter than me that I no longer had any way to control it, and if I noticed that it was considering total-value-destroying actions as reasonable things to maybe do someday, then I would be extremely concerned.
If the claim is that evolution has “solved alignment”, then I’d say you need to argue that the alignment solution is stable against arbitrary gains in capability. And I don’t think that’s the case here.

Insub 15 Sep 2023 21:40 UTC
3 points
0
in reply to: Hastings’s comment on: Instrumental Convergence Bounty
That’s great. “The king can’t fetch the coffee if he’s dead”

Princeton, New Jersey, USA – ACX Meetups Everywhere Fall 2023

Insub25 Aug 2023 23:44 UTC

3 points

0 comments1 min readLW link

Insub 12 Apr 2023 16:21 UTC
7 points
0
on: Scaffolded LLMs as natural language computers
Wow. When I use GPT-4, Ive had a distinct sense of “I bet this is what it would have felt like to use one of the earliest computers”. Until this post I didnt realize how literal that sense might be.

This is a really cool and apt analogy—computers and LLM scaffolding really do seem like the same abstraction. Thinking this way seems illuminating as to where we might be heading.

Insub 10 Mar 2023 16:18 UTC
16 points
16
on: Stop calling it “jailbreaking” ChatGPT
I always assumed people were using “jailbreak” in the computer sense (e.g. jailbreak your phone/ps4/whatever), not in the “escape from prison” sense.

Jailbreak (computer science), a jargon expression for (the act of) overcoming limitations in a computer system or device that were deliberately placed there for security, administrative, or marketing reasons

I think the definition above is a perfect fit for what people are doing with ChatGPT

Insub 8 Mar 2023 18:11 UTC
15 points
12
on: The Kids are Not Okay
I am going to go ahead and say that if males die five times as often from suicide, that seems more important than the number of attempts. It is kind of stunning, or at least it should be, to have five boys die for every girl that dies, and for newspapers and experts to make it sound like girls have it worse here.
I think the strength of your objection here depends on which of two possible underlying models is at play:
1. The boys who attempt suicide and the girls who attempt suicide are in pretty much the same mental state when they attempt suicide (that is: definitely wanting to end their life). But, for whatever reason, the boys use more effective methods, and so they end up actually dying at a higher rate.
2. There are two categories of mental states for suicide-attempters: one in which they genuinely and definitely want to die, and will therefore seek out and use effective methods; and one in which they are ok with dying but also would be ok with living and receiving attention, in which case they may use a less effective method
If (1) is the case, then I think it is at least arguable that girls have it worse here, since they end up in the mental state of “definitely wanting to die” more often than boys, and that sucks. That said, it’s still true that they’re not actually dying as much, and so I think it’s still kinda disingenuous to frame it the way the newspaper and experts have here.
If (2) is the case, then that means that boys are ending up in the “definitely wanting to die” state much more often than girls, in which case I’d agree that it’s very wrong to say that girls have it worse.

Insub 27 Feb 2023 22:40 UTC
9 points
3
on: Beginning to feel like a conspiracy theorist
If you’re getting comments like that from friends and family, it’s possible that you havent been epistemically transparent with them? E.g. do you think your friends who made those comments would be able to say why you believe what you do? Do you tell them about your reaearch process and what kinds of evidence you look for, or do you just make contrarian factual assertions?

There’s a big difference between telling someone “the WHO is wrong about salt, their recommendations are potentially deadly” versus “Ive read a bunch of studies on salt, and from what Ive found, the WHOs recommendations don’t seem to agree with the latest research. Their recs are based on [studies x,y] and say to do [a], but [other newer/better studies] indicate [b].”

Insub 27 Feb 2023 4:27 UTC
5 points
1
in reply to: Duncan Sabien (Inactive)’s comment on: Respect Chesterton-Schelling Fences

Cut to a few decades later, and most people think that the way it’s been done for about two or three generations is the way it’s always been done (it isn’t)

As possibly one of those people myself, can you give a few examples of what specifically is being done differently now? Are you talking about things like using lots of adderall?

Insub 23 Feb 2023 5:31 UTC
6 points
1
in reply to: Robert_AIZI’s comment on: Pretraining Language Models with Human Preferences
I’m also morbidly curious what the model would do in <|bad|> mode.
I’m guessing that poison-pilling the <|bad|> sentences would have a negative effect on the <|good|> capabilities as well? I.e. It seems like the post is saying that the whole reason you need to include the <|bad|>s at all in the training dataset is that the model needs them in order to correctly generalize, even when predicting <|good|> sentences.

Insub

Prince­ton, New Jersey, USA – ACX Mee­tups Every­where Fall 2023

Princeton, New Jersey, USA – ACX Meetups Everywhere Fall 2023