For AIs we can use the above organizational methods in concert with existing AI-specific training methodologies, which we can’t do with humans and human organizations.
It doesn’t seem particularly fair to compare all human organizations to what we might build specifically when trying to make aligned AI. Human organizations have existed in a large variety of forms for a long time, they have mostly not been explicitly focused on a broad-based “promotion of human flourishing”, and have had to fit within lots of ad hoc/historically conditional systems (like distributions between for profit vs non profit entities) that have significant influence on the structure of newer human organizations.
I grew up in Arizona and live here again now. It has had a good system of open enrollment for schools for a long time, meaning that you could enroll your kid into a school in another district if they have space (though you’d need to drive them, at least to a nearby school bus stop). And there are lots of charter schools here, for which district boundaries don’t matter. So I would expect the impact on housing prices to be minimal.
Godzilla strategies now in action: https://simonwillison.net/2022/Sep/12/prompt-injection/#more-ai :)
No super detailed references that touch on exactly what you mention here, but https://transformer-circuits.pub/2021/framework/index.html does deal with some similar concepts with slightly different terminology. I’m sure you’ve seen it, though.
Is the ordering intended to reflect your personal opinions, or the opinions of people around you/society as a whole, or some objective view? Because I’m having a hard time correlating the order to anything in my wold model.
This is the trippiest thing I’ve read here in a while: congratulations!
If you’d like to get some more concrete feedback from the community here, I’d recommend phrasing your ideas more precisely by using some common mathematical terminology, e.g. talking about sets, sequences, etc. Working out a small example with numbers (rather than just words) will make things easier to understand for other people as well.
My mental model here is something like the following:
a GPT-type model is trained on a bunch of human-written text, written within many different contexts (real and fictional)
it absorbs enough patterns from the training data to be able to complete a wide variety of prompts in ways that also look human-written, in part by being able to pick up on implications & likely context for said prompts and proceeding to generate text consistent with them
Slightly rewritten, your point above is that:
The training data is all written by authors in Context X. What we want is text written by someone who is from Context Y. Not the text which someone in Context X imagines someone in Context Y would write but the text which someone in Context Y would actually write.After all, those of us writing in Context X don’t actually know what someone in Context Y would write; that’s why simulating/predicting someone in Context Y is useful in the first place.
The training data is all written by authors in Context X. What we want is text written by someone who is from Context Y. Not the text which someone in Context X imagines someone in Context Y would write but the text which someone in Context Y would actually write.
After all, those of us writing in Context X don’t actually know what someone in Context Y would write; that’s why simulating/predicting someone in Context Y is useful in the first place.
If I understand the above correctly, the difference you’re referring to is the difference between:
prompt = “A lesswrong post from a researcher in 2050:”
GPT’s internal interpretation of context = “A fiction story, so better stick to tropes, plot structure, etc. coming from fiction”
GPT’s internal interpretation of context = “A lesswrong post (so factual/researchy, rather than fiction) from 2050 (so better extrapolate current trends, etc. to write about what would be realistic in 2050)”
Similar things could be done re: the “stable, research-friendly environment”.
The internal interpretation is not something we can specify directly, but I believe sufficient prompting would be able to get close enough. Is that the part you disagree with?
Alas, querying counterfactual worlds is fundamentally not a thing one can do simply by prompting GPT.
Citation needed? There’s plenty of fiction to train on, and those works are set in counterfactual worlds. Similarly, historical, mistaken, etc. texts will not be talking about the Current True World. Sure right now the prompting required is a little janky, e.g.:
But this should improve with model size, improved prompting approaches or other techniques like creating optimized virtual prompt tokens.
And also, if you’re going to be asking the model for something far outside its training distribution like “a post from a researcher in 2050”, why not instead ask for “a post from a researcher who’s been working in a stable, research-friendly environment for 30 years”?
Please consider aggregating these into a sequence, so it’s easier to find the 1⁄2 post from this one and vice versa.
Sounds similar to what this book claimed about some mental illnesses being memetic in certain ways: https://astralcodexten.substack.com/p/book-review-crazy-like-us
If you do get some good results out of talking with people, I’d recommend trying to talk to people about the topics you’re interested in via some chat system and then go back and extract out useful/interesting bits that were discussed into a more durable journal. I’d have recommended IRC in the distant past, but nowadays it seems like Discord is the more modern version where this kind of conversation could be found. E.g. there’s a slatestarcodex discord at https://discord.com/invite/RTKtdut
YMMV and I haven’t personally tried this tactic :)
Well written post that will hopefully stir up some good discussion :)
My impression is that LW/EA people prefer to avoid conflict, and when conflict is necessary don’t want to use misleading arguments/tactics (with BS regulations seen as such).
I agree I’ve felt something similar when having kids. I’d also read the relevant Paul Graham bit, and it wasn’t really quite as sudden or dramatic for me. But it has had a noticeable effect long term. I’d previously been okay with kids, though I didn’t especially seek out their company or anything. Now it’s more fun playing with them, even apart from my own children. No idea how it compares to others, including my parents.
Love this! Do consider citing the fictional source in a spoiler formatted section (ctrl+f for spoiler in https://www.lesswrong.com/posts/2rWKkWuPrgTMpLRbp/lesswrong-faq)
Also small error “from the insight” → “from the inside”
The most similar analysis tool I’m aware of is called an activation atlas (https://distill.pub/2019/activation-atlas/), though I’ve only seen it applied to visual networks. Would love to see it used on language models!
As it is now, this post seems like it would fit in better on hacker new, rather than lesswrong. I don’t see how it addresses questions of developing or applying human rationality, broadly interpreted. It could be edited to talk more about how this is applying more general principles of effective thinking, but I don’t really see that here right now. Hence my downvote for the time being.
Came here to post something along these lines. One very extensive commentary with reasons for this is in https://twitter.com/kamilkazani/status/1497993363076915204 (warning: long thread). Will summarize when I can get to laptop later tonight, or other people are welcome to do it.
Have you considered lasik much? I got it about a decade ago and have generally been super happy with the results. Now I just wear sunglasses when I expect to benefit from them and that works a lot better than photochromatic glasses ever did for me.
The main real downside has been slight halos around bright lights in the dark, but this is mostly something you get used to within a few months. Nowadays I only noticed it when stargazing.
This seems like something that would be better done as a Google form. That would make it easier for people to correlate questions + answers (especially on mobile) and it can be less stressful to answer questions when the answers are going to be kept private.