This was a beautifully written essay, thank you. You’re quite skilled with your hand axe.
RomanHauksson(Roman Hauksson-Neill)
Tools for finding information on the internet
Avoid large group discussions in your social events
[Question] What’s the deal with Effective Accelerationism (e/acc)?
During the past few months, I ran an undergraduate computer science research program at my university, and I chose to use Zulip to organize our communication (between 25 people). I wanted to use Zulip because it was open-source and, like you, I was a fan of the threads model. Unfortunately, the participants reported that the notifications were unreliable, the mobile app was janky, and the threads were confusing.
Keep in mind that these weren’t average software users but rather CS majors filtered through an application process – even for them, threads took a while to get used to. I concluded that Zulip would work well if every team member was on board with (and understood) the threads model, but a team that doesn’t care would prefer Discord or Slack.
I don’t think it’s a good idea to frame this as “AI ethicists vs. AI notkilleveryoneists”, as if anyone that cares about issues related to the development of powerful AI has to choose to only care about existential risk or only other issues. I think this framing unnecessarily excludes AI ethicists from the alignment field, which is unfortunate and counterproductive since they’re otherwise aligned with the broader idea of “AI is going to be a massive force for societal change and we should make sure it goes well”.
Suggestion: instead of addressing “AI ethicists” or “AI ethicists of the DAIR / Stochastic Parrots school of thought”, why not address “AI X-risk skeptics”?
Technological unemployment as another test for rationalist winning
How do we know it didn’t copy this code from somewhere on the internet?
Edit: I’ve expanded this into a full post.
Three related concepts.
On redundancy: “two is one, one is none”. It’s best to have copies of critical things in case they break or go missing, e.g. an extra cell phone.
On authentication: “something you know, have, and are”. These are three categories of ways you can authenticate yourself.
Something you know: password, PIN
Something you have: key, phone with 2FA keys, YubiKey
Something you are: fingerprint, facial scan, retina scan
On backups: the “3-2-1” strategy.
Maintain 3 copies of your data:
2 on-site but on different media (e.g. on your laptop and on an external drive) and
1 off-site (e.g. in the cloud).
Inspired by these concepts, I propose the “2/3” model for authentication:
Maintain at least three ways you can access a system (something you have, know, and are). If you can authenticate yourself using at least 2 out of the 3 ways, you’re allowed to access the system.
This prevents both false positives (hackers need to breach at least two methods of authentication) and false negatives (you don’t have to prove yourself using all methods). It provides redundancy on both fronts.
- 30 Jan 2023 1:22 UTC; 1 point) 's comment on RomanHauksson’s Shortform by (
Medlife Crisis: “Why Do People Keep Falling For Things That Don’t Work?”
Why didn’t GPT-3.5 also copy it if it was in the training data?
Two possible answers:
The quine wasn’t in the training data of GPT-3.5 but was in the training data of GPT-4
GPT-4 is better at “retrieving” answers from the training data
That being said, I also briefly tried to search for this quine online and couldn’t find anything. So I agree, it probably does exhibit this new ability. The reason I was suspicious at first is because the quine prompt seemed generic enough that it could have existed before, but I see that’s not the case.
Are there any organizations or research groups that are specifically working on improving the effectiveness of the alignment research community? E.g.
Reviewing the literature on intellectual progress, metascience, and social epistemology and applying the resulting insights to this community
Funding the development of experimental “epistemology software”, like Arbital or Mathopedia
Np! I actually did read it and thought it was high-quality and useful. Thanks for investigating this question :)
This is exactly the kind of strange, experimental, large-action-space kind of idea that I’ve only so far seen in effective altruist / rationalist circles. I love it.
A research team’s ability to design a robust corporate structure doesn’t necessarily predict their ability to solve a hard technical problem. Maybe there’s some overlap, but machine learning and philosophy are different fields than business. Also, I suspect that the people doing the AI alignment research at OpenAI are not the same people who designed the corporate structure (but this might be wrong).
Welcome to LessWrong! Sorry for the harsh greeting. Standards of discourse are higher than other places on the internet, so quips usually aren’t well-tolerated (even if they have some element of truth).
I’ve also reflected on “microhabits” – I agree that the epistemics are tricky, of maintaining a habit even when you can’t observe causal evidence for it being beneficial. I’ll implement a habit if I’ve read some of the evidence and think it’s worth the cost, even if I don’t observe any effect in myself. Unfortunately, that’s the same mistake homeopathics make.
I’m motivated to follow microhabitats mostly out of faith that they have some latent effects, but also out of a subconscious desire to uphold my identity, like what James Clear talks about in Atomic Habits.
Like when I take a vitamin D supplement in the morning, I’m not subconsciously thinking “oh man, the subtle effects this might have on my circadian rhythm and mood are totally worth the minimal cost!”. Instead, it’s more like “I’m taking this supplement because that’s what a thoughtful person who cares about their cognitive health does. This isn’t a chore; it’s a part of what it means to live Roman’s life”.
Here’s a list of some of my other microhabits (that weren’t mentioned in your post) in case anyone’s looking for inspiration. Or maybe I’m just trying to affirm my identity? ;P
Putting a grayscale filter on my phone
Paying attention to posture – e.g., not slouching as I walk
Many things to help me sleep better
Taking 0.3 mg of melatonin
Avoiding exercise, food, and caffeine too close to bedtime
Putting aggressive blue light filters on my laptop and phone in the evening and turning the lights down
Taking a warm shower before bed
Sleeping on my back
Turning the temperature down before bed
Wearing headphones to muffle noise and a blindfold
Backing up data and using some internet privacy and security tools
Anything related to being more attractive or likable
Whitening teeth
Following a skincare routine
Smiling more
Active listening
Avoiding giving criticism
Flossing, using toothpaste with Novamin, and tounge scraping
Shampooing twice a week instead of daily
I haven’t noticed any significant difference from any of these habits individually. But, like you suggested, I’ve found success with throwing many things at the wall: it used to take me a long time to fall asleep, and now it doesn’t. Unfortunately, I don’t know what microhabits did the trick (stuck to the wall).
It seems like there are three types of habits that require some faith:
Those that take a while to show effects, like weightlifting and eating a lot to gain muscle.
Those that only pay off for rare events, like backing up your data or looking both ways before crossing the street.
Those with subtle and/or uncertain effects, like supplementing vitamin D for your cognitive health or whitening your teeth to make a better first impression on people. This is what you’re calling microhabits.
Suppose a family values the positive effects that screening would have on their child at $30,000, but in their area, it would cost them $50,000. Them paying for it anyway would be like “donating” $20,000 towards the moral imperative that you propose. But would that really be the best counterfactual use of the money? E.g. donating it instead to the Against Malaria Foundation would save 4-5 lives in expectation.[1] Maybe it would be worth it at $10,000? $5,000?
Although, this doesn’t take into account the idea that an additional person doing polygenic screening would increase its acceptance in the public, incentivizing companies to innovate and drive the price down. So maybe the knock-on effects would make it worth it.
- ^
Okay, I’ve heard that this scale of donations to short-termist charities is actually a lot more complicated than that, but this is just an example.
- ^
Sometimes I stumble across a strange genre of writing on the internet.
From GameB Home:
We’re gaining the power of gods, but without the love, wisdom and discernment of gods—that is a self-extinctionary scenario. Welcome to Game B, a transcontextual inquiry into a new social operating system for humanity. Game A is what got us to this time of metacrisis and collapse. Game B is what emerges in response. Come play with us as we learn to become wiser, together, gain coherence and begin to move towards a new social operating system emphasizing human wellbeing, metastability, and built on good values that we will be happy to call home and we will be proud to leave to our descendants.
From Bryan Johnson’s Blueprint:
The enemy is Entropy. The path is Goal Alignment via building your Autonomous Self; enabling compounded rates of progress to bravely explore the Zeroth Principle Future and play infinite games.
If you’ve also done random walks through cyberspace you might have read this kind of language in some deep corner of the internet as well. It’s characteristically esoteric and stuffed with complex vocabulary. I can tell that the writers have something in mind when they’re writing it – that they’re not totally off the rails – but it still comes off as spiritual nonsense.
Look, I get it! Rationalist writing is stuffed with jargon and machine learning analogies, and self-help books feature businessy pseudoframeworks and vapid motivational prose. It’s okay for your field to have its own linguistic subculture! But when you try to dress up your galaxy brain insights in similarly galaxy brain vocabulary, you lose me.
This kind of writing makes me uncomfortable in a way I can’t put into words, like the feeling one gets when they look at a liminal photograph. Maybe because it’s harder for me to judge the epistemics of the writing. I feel it trying to unfairly hijack the part of my brain which measures the insightfulness of text by presenting itself as mystical, like forbidden knowledge I’ve finally revealed. But if these insights were really all that, they’d have the balls to present themselves candidly!
Yes, metaphors and complex language are sometimes necessary to get your point across and make text engaging. In Cyborgism, @janus writes:
Corridors of possibility bloom like time-lapse flowers in your wake and burst like mineshafts into nothingness again. But for every one of these there are a far greater number of voids–futures which your mind refuses to touch. Your Loom of Time devours the boundary conditions of the present and traces a garment of glistening cobwebs over the still-forming future, teasing through your fingers and billowing out towards the shadowy unknown like an incoming tide.
But unlike the previous examples, this beautifully flowery depiction of GPT-assisted writing works because it’s clearly demarcated within a more down-to-earth post. Good insights survive scrutiny even when nude.
I find it interesting that all but one toy is a transportation device or a model thereof.
Maybe one upside to the influx of “agents made with GPT-N API calls and software glue” is that these types of AI agents are more likely to cause a fire alarm-y disaster which gets mitigated, thus spurring governments to take X-risk more seriously, as opposed to other types of AI agents, whose first disaster would blow right past fire alarm level straight to world-ending level?
For example, I think this situation is plausible: ~AutoGPT-N[1] hacks into a supercomputer cluster or social-engineers IT workers over email or whatever in the pursuit of some other goal, but ultimately gets shut down by OpenAI simply banning the agent from using their API. Maybe it even succeeds in some scarier instrumental goal, like obtaining more API keys and spawning multiple instances of itself. However, the crucial detail is that the main “cognitive engine” of the agent is bottlenecked by API calls, so for the agent to wipe everyone out, it needs to overcome the hurdle of pwning OpenAI specifically.
By contrast, if an agent that’s powered by an open-source language model gets to the “scary fire alarm” level of self-improvement/power-seeking, it might be too late, since it wouldn’t have a “stop button” controlled by one corporation like ~AutoGPT-N has. It could continue spinning up instances of itself while staying under the radar.
This isn’t to say that ~AutoGPT-N doesn’t pose any X-risk at all, but rather that it seems like it could cause the kind of disaster which doesn’t literally kill everyone but which is scary enough that the public freaks out and nations form treaties banning larger models from being trained, et cetera.
I’d like to make it very clear that I do not think it is a good thing that this type of agent might cause a disaster. Rather, I think it’s good that the first major disaster these agents will cause seems likely to be non-existential.
Future iteration of AutoGPT or a similar project