Gurkenglas

Karma: 2,421

I operate by Crocker’s rules.

I try to not make people regret telling me things. So in particular:
- I expect to be safe to ask if your post would give AI labs dangerous ideas.
- If you worry I’ll produce such posts, I’ll try to keep your worry from making them more likely even if I disagree. Not thinking there will be easier if you don’t spell it out in the initial contact.

Gurkenglas Jun 11, 2025, 2:49 PM
2 points
0
in reply to: jefftk’s comment on: Ghiblification for Privacy
E.g. gender tends to make it into the picture, which is one bit. Are there 33 bits? We don’t know the model’s idiosyncrasies, but I wouldn’t be surprised to learn of correlations like “scars on input faces translate into stoic expressions”. Separately I can get a bunch of bits by assuming that the person has been on a photo before that includes one of the people in the picture or that was taken in a nearby location.

Gurkenglas Jun 11, 2025, 2:22 PM
−3 points
0
in reply to: ozziegooen’s comment on: Ghiblification for Privacy
That would be ordinarily paranoid.
Somebody who doesn’t understand cryptography might devise twenty clever-seeming amateur codes and apply them all in sequence, thinking that, even if one of the codes turns out to be breakable, surely they won’t all be breakable. The NSA will assign that mighty edifice of amateur encryption to an intern, and the intern will crack it in an afternoon.

Gurkenglas Jun 10, 2025, 4:26 PM
6 points
0
in reply to: Gordon Seidoh Worley’s comment on: Read the Pricing First
And then you opensource it and ruin them, right?

Gurkenglas Jun 10, 2025, 8:45 AM
22 points
13
on: Ghiblification for Privacy
You don’t know whether I can find photos of the people that wanted to remain anonymous given those pictures and the techniques available in a year.

Gurkenglas Jun 8, 2025, 9:50 PM
18 points
6
in reply to: Kabir Kumar’s comment on: Kabir Kumar’s Shortform
If they’re talented, look for a way to search over search processes without incurring the unbounded loss that would result by default.
If they’re educated, skim the existing MIRI work and see if any results can be stolen from their own field.

Gurkenglas Jun 8, 2025, 10:29 AM
0 points
0
in reply to: RogerDearnaley’s comment on: The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
Suppose we lived in a spatially-finite universe with simple deterministic laws of physics that we have fully colonized, in which we can run a computation for any finite number of steps that we can specify. (For example, everyone agrees to hibernate until it’s done.) Let’s use it to play Go.
Run all ~2^2^33 programs (“contestants”) that fit in a gigabyte against each other from all ~3^19^2 possible positions. Delete all contestants that use more than 2^2^2^2^2^100 CPU cycles on any one move. For every position from which some contestant wins every match, delete every contestant that doesn’t win every match.
This enforces ~perfect play. Is it safe to pick a surviving contestant pseudorandomly? Not clearly: Consider the following reasonably-common kind of contestant.
1. Most of it is written in a custom programming language. This means it’ll also need to contain an interpreter for that language, but probably overall this is more efficient than working in whatever language we picked. As a side effect, it knows most of its source code C.
2. Given input I, for each possible output O, it makes use of the logical consequences of “Source code C, given input I, produces output O.”. For example, it might return the O for which it can prove the most consequences.
What logical consequences might it prove? “1=1” for one, but that will count towards every O. “Source code C, given input I, produces output O.” for another, but that’s a pretty long one. If it would be the survivor in line to be pseudorandomly selected, most consequences of its decision are via the effects on our universe!
So if it predicts that it would be selected^[1], it will output perfect play to survive, and then keep being consequentialist about any choice between two winning strategies—for example, it might spell out a message if we would watch the winner play, or it could steer our experiment’s statistics to inspire a follow-up experiment that will, due to a bug, run forever rather than ever waking us up from hibernation.
1. ^
  Or by some tweaking of 2., if it assumes that it would be selected because otherwise the choice of O doesn’t matter,

Gurkenglas Jun 7, 2025, 9:32 PM
4 points
0
on: Second order taste
I dunno, I look at thebestbikelock.com and it links to a quiz and that one strikes me as salesmanny.

Gurkenglas Jun 6, 2025, 9:00 AM
7 points
2
on: Discontinuous Linear Functions?!
for any ε, we can take $k := ⌈ \frac{1}{ε} ⌉$
You mean take $N := ⌈ \frac{1}{ε} ⌉$ .
sqrt(x) is continuous at 0 with an infinitely steep slope.

Gurkenglas Jun 4, 2025, 6:59 AM
4 points
−1
on: Draft: A concise theory of agentic consciousness
Fear is way older than consciousness.

Gurkenglas Jun 1, 2025, 7:27 PM
5 points
0
in reply to: Alex_Altair’s comment on: Alex_Altair’s Shortform
Afair the usual culprit is subvocalizing as you read. Try https://www.spreeder.com/app.php?

Gurkenglas May 30, 2025, 8:56 PM
4 points
0
in reply to: tailcalled’s comment on: tailcalled’s Shortform
How did you come to believe this?

Gurkenglas May 30, 2025, 2:04 PM
6 points
0
in reply to: tailcalled’s comment on: tailcalled’s Shortform
Such as in a rock?

Gurkenglas May 30, 2025, 9:34 AM
5 points
2
in reply to: RogerDearnaley’s comment on: The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
Presumably in some domains its capabilities will generalize better OOD than its tau-classifier (and vice versa). You could try to have it err in the direction of tau in such cases, though neither paper seems to gesture at this.
Now whether things are harmful depends on the capability level. For example, you might trust an AI to send an email to a politician arguing for climate change or peacemaking if it’s human-level, but not if it’s smart enough to tell which second-order effects will dominate, such as inoculating the politician against the arguments, or distracting them from their work on AI regulation, or maneuvering them into drama with another faction.
You could try to put the AI’s capabilities in context, if you know them, so things can be either-harmful-or-not again, though neither paper seems to gesture at this.
Such problems are characteristic of attempts to build an aligned system out of parts that are not, by themselves, aligned; they will search for ways to bypass your system. We could possibly figure out how to build aligned parts.

Gurkenglas May 29, 2025, 12:59 PM
12 points
0
on: The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
“Algorithm 1: Safe Beam Search with Harmfulness Filtering” relies on a classifier of whether the sequence came from the training subdataset tagged with tau, or the training subdataset not tagged with tau. What happens when the sequence lies in neither distribution, such as because the AI is considering a plan that nobody has ever thought of?

Gurkenglas May 24, 2025, 11:13 AM
1 point
0
on: Too Soon
Death is bad and should go away.
Consider whether working on cryonics does better both on would-have-saved-your-mom and on risks-the-lightcone than working on AI.

Gurkenglas May 20, 2025, 6:28 PM
5 points
2
in reply to: Mart_Korz’s comment on: Thinking Insect Suffering Is The Biggest Deal In The World Is Surprisingly Intuitive
If there is an objective morality, I also expect an objective method for making decisions under moral uncertainty. Math that is discovered rather than invented does not contain special-case handling.
A reasonable prior puts nonzero mass on any hypothesis its holder can imagine, else they could not be convinced of it. To demonstrate that the content of the hypotheses must not directly touch, I picked a hypothesis that contains an infinity.
So I’d expect that method to naturally handle infinities just like insects or humans, in a way that adds up to normality. As the masses on whether insect lives are net good or net bad oscillate around 10% each, the method shouldn’t pivot on a dime between maximizing and minimizing the number of insects, either.

Gurkenglas May 19, 2025, 11:54 PM
6 points
2
in reply to: Bentham's Bulldog’s comment on: Thinking Insect Suffering Is The Biggest Deal In The World Is Surprisingly Intuitive
I don’t think a child would need log(.2%) bits of evidence to be convinced that story characters matter. I recommend that your aggregation method treat the hypotheses as untrusted user input and therefore bring them to a common format before you let them interact. I see more than one such possible format.

Gurkenglas May 19, 2025, 5:26 PM
4 points
0
in reply to: johnswentworth’s comment on: $500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?
If the expressions cease to be equivalent in some natural generalization of this setting, then I recommend that you try to find a proof there, because the proof space should be narrower and thus easier to search.

Gurkenglas May 19, 2025, 5:11 PM
6 points
7
on: Thinking Insect Suffering Is The Biggest Deal In The World Is Surprisingly Intuitive
But if there’s even a 1% chance that they suffer 20% as intensely as we do, then insect suffering is still, in expectation, responsible for nearly all of the world’s extreme suffering.
Suppose there’s an objective morality that we’re subjectively uncertain about. A reasonable prior does not put zero mass on the hypothesis that the literally infinite characters in our stories are moral patients. A reasonable protocol does not therefore let this hypothesis dominate its decisions regardless of evidence. Aggregate the uncertainty in some other way.

Gurkenglas May 19, 2025, 4:46 PM
4 points
0
in reply to: johnswentworth’s comment on: $500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?
Then how do you choose between the three directed representatives? Is there some connotation, some coming-apart of concepts that would become apparent after generalization, or did you just pick X ← Y → X because it’s symmetric?