I lurk and tag stuff.
Agree it doesn’t belong; I have downvoted it.
There is no tag that encompasses all of AI alignment and nothing else.
I think the reason you gave is basically correct—when I look at the 15 posts with the highest relevance score on the AI tag, about 12 of them are about alignment.
On the other hand, when a tag doesn’t exist it may just be because no one ever felt like making it.
Merge candidate with startups?
“Transformer Circuits” seems like too specific of a tag—I doubt it applies to much beyond this one post. Probably should be broadened to encompass https://www.lesswrong.com/posts/MG4ZjWQDrdpgeu8wG/zoom-in-an-introduction-to-circuits and related stuff.
“Circuits (AI)” to distinguish from normal electronic circuits?
This sounds a lot like the “Precisely Bound Demons and their Behavior” concept that Yudkowsky described but never wrote the story for.
Ra also features magic-as-engineering.
Chiming in later to say that I think the tag should stay, especially now that multiple people are doing them. Compare “Rationality Quotes” and “Open Threads” for other tags that could be accused of just being sequences.
Should this tag include stuff about print versions of HPMOR or Rationality: From AI to Zombies, or just the review collections from 2018 forward?
Something similar came up in the post:
If it has some sensory dominion over the world, it can probably estimate a pretty high mainline probability of no humans booting up a competing superintelligence in the next day; to the extent that it lacks this surety, or that humans actually are going to boot a competing superintelligence soon, the probability of losing that way would dominate in its calculations over a small fraction of materially lost galaxies, and it would act sooner.
Though rereading it, it’s not addressing your exact question.
Removed this from the page itself now that talk pages exist:
[pre-talk-page note] I think this should maybe be merged with Distillation and Pedagogy – Ray
That 80k guide seems aimed at people who don’t yet have any software engineering experience. I’m curious what you think the path is from “Average software engineer with 5+ years experience” to the kind of engineer you’re looking for, since that’s the point I’m starting from.
I don’t quite understand what this tag is supposed to be about from the title, nor does the single example clarify sufficiently.
Another potential problem with the first scenario: the AI is indifferent about every long-term consequence of its actions, not just how many paper clips it gets long-term. If it finds a plan that creates a small number of paperclips immediately but results in the universe being destroyed tomorrow, it takes it.
In the Romeo and Juliet example, the final summary gets a key fact disastrously wrong:
Romeo buys poison to kill Juliet at her grave.
(In the original he buys it to kill himself).
It looks like a single snippet of the original got blown up until it was 10% of the final summary, and the surrounding context was not sufficient to fix it.
Come, cordial and not poison, go with me To Juliet’s grave; for there must I use thee.
Come, cordial and not poison, go with me
To Juliet’s grave; for there must I use thee.
Low conversion rate to grabbiness is only needed in the model if you think there are non-grabby aliens nearby. High conversion rate is possible if the great filter is in our past and industrial civilizations are incredibly rare.
What is this tag for? I don’t see how it applies to the one post tagged with it.
The implied machine learning technique here seems to be a model which has been pretrained on the reward signal, and then during a single rollout it’s given a many-shot prompt in an interactive way, with the reward signal on the early responses to the prompt used to shape the later responses to the prompt.
This means a single pretrained model can be finetuned on many different tasks with high sample efficiency and no additional gradient descent needed, as long the task is something within the capabilities of the model to understand.
(I’m assuming that this story represents a single rollout because the AI remembers what happened earlier, and that the pleasure/pain is part of its input rather than part of a training process because it is experienced as a stimulus rather than as mind control.)
Is this realistic for a future AI? Would adding this sort of online finetuning to a GPT improve its performance? I guess you can already do it in a hacky way by adding something like “Human: ‘That’s not what I meant at all!’” to the prompt.
I quickly disqualified III because of its inconsistent capitalization of “idea”, which doesn’t seems like something Hegel would do. From there I noticed that I is a completion that continues about vaguely the same sort of things as the first paragraph, while II specifically focuses on the things mentioned right at the end of the first paragraph. I wasn’t sure whether GPT or Hegel was more likely to choose the vague completion, or the specific completion. I ended up guessing correctly after scrutinizing the first paragraph, but I was uncertain about my guess.
“Mind Crime” was the term Bostrom used in Superintelligence. I don’t know of a better term that covers the same things.
Usually when people talk about mind crime they’re talking about torture simulations or something similar, which is different than the usual use of “thought crime”. My sense is that if you really believed that thinking certain thoughts was immoral, thought crime would be a type of mind crime, but I’m not sure if anyone has used the term in that way.
Edit: https://www.lesswrong.com/posts/BKjJJH2cRpJcAnP7T/thoughts-on-human-models says:
Many computations may produce entities that are morally relevant because, for example, they constitute sentient beings that experience pain or pleasure. Bostrom calls improper treatment of such entities “mind crime”.
so maybe the accepted meaning is narrower than I thought and this wiki page should be updated accordingly.
I reread the relevant section of Superintelligence, which is in line with that, and have rewritten the page.
Update 2021: The original link is dead. Link to the current version, archived for posterity: https://web.archive.org/web/20210225082506/http://becomingeden.com/summary-of-how-to-win-friends-and-influence-people/