Looking at this and this, I’d guess that it’s just harder to produce super toxic toxins artificially than it is to produce super sweet sweeteners. IIRC the mass of neotame it takes to taste any sweetness is lower than the mass of VX it takes to kill someone.
metachirality
An explanation of decision theories
A nod to Lena Forsen, a photo of whom is often used as a test image in image processing papers.
This is the first LW census I’ve taken!
[Question] What’s up with psychonetics?
What if we had a setting to hide upvotes/hide reactions/randomize order of comments so they aren’t biased by the desire to conform?
[Question] Is anyone working on formally verified AI toolchains?
Aw man there goes my weekend plans
Notice your everything
My immediate thought is that the cat is already out of the bag and whatever risk there was of AI safety people accelerating capabilities is nowadays far outweighed by capabilities hype and in general, much larger incentives, and that the most we can do is to continue to build awareness of AI risk. Something about this line of reasoning strikes me as uncritical though.
I actually dislike making everything a set, it feels similar to programming in Brainfuck. Sure it’s Turing complete, but the way programs are structured don’t map cleanly to how a human would conceptualize it and you need to write a lot of boilerplate for things you ordinarily don’t even think about.
In practice, this leads to confusing notation like using “subset of” or “element of” for “lesser than”, which makes it harder to see whether to think of something as a number or just a generic set. Here, since X is not “typed”, it is hard to see that it should be thought of as a set of sets rather than just a generic set.
Also you get weird pathological stuff like {1,2} being a topological space.
As a formalism for mathematics, I much prefer type theory which not only more cleanly maps onto how humans think, but also uses simpler axioms. It also has connections to logic, computer science, and category theory (and by extension many other fields of math).
[Question] Has anyone thought about how to proceed now that AI notkilleveryoneism is becoming more relevant/is approaching the Overton window?
I feel like just “gender” would be better than “physical sex” here. For instance, I’d expect trans women to fall in the female cluster (although being trans intersects with that in ways that strain this model but it still rounds to what I said so)
I don’t think the specific part of decision theory where people argue over Newcomb’s problem is large enough as a field to be subject to the EMH. I don’t think the incentives are awfully huge either. I’d compare it to ordinal analysis, a field which does have PhDs but very few experts in general and not many strong incentives. One significant recent result (if the proof works then the ordinal notation in question would be most powerful proven well-founded) was done entirely by an amateur building off of work by other amateurs (see the section on Bashicu Matrix System): https://cp4space.hatsya.com/2023/07/23/miscellaneous-discoveries/
We don’t actually know if it’s GPT 4.5 for sure. It could be an alternative training run that preceded the current version of ChatGPT 4 or even a different model entirely.
How do you know that this isn’t how human consciousness works?
I think it makes more sense to word this as “others are not remarkably more irrational than you are” rather than saying that disagreements are not caused by irrationality.
Formal alignment proposals avoid this problem by doing metaethics, mostly something like determining what a person would want if they were perfectly rational (so no cognitive biases or logical errors), otherwise basically omniscient, and had an unlimited amount of time to think about it. This is called reflective equilibrium. I think this approach would work for most people, even pretty terrible people. If you extrapolated a terrorist who commits acts of violence for some supposed greater good, for example, they’d realize that the reasoning they used to determine that said acts of violence were good was wrong.
Corrigibility, on the other hand, is more susceptible to this problem and you’d want to get the AI to do a pivotal act, for example, destroying every GPU to prevent other people from deploying harmful AI, or unaligned AI for that matter.
Realistically, I think that most entities who’d want to use a superintelligent AI like a nuke would probably be too short-sighted to care about alignment, but don’t quote me on that.
I think artificial sweeteners are so often discovered serendipitously because artificial sweeteners also tend to be insanely sweet (you usually find them mixed with a higher volume of filler because of how sweet they are), which makes them easy to notice even with standard safety measures.