Ulisse Mini

Karma: 1,654

Born too late to explore Earth; born too early to explore the galaxy; born just the right time to save humanity.

https://uli.rocks/about

Ulisse Mini 25 Dec 2020 19:06 UTC
3 points
on: Covid 12/24: We’re F***ed, It’s Over
In regard to priorities between young frontline workers and the at-risk elderly. I hope they’re optimizing for saving life-years, and not lives (ie. if a healthy 20yo has 60yrs ahead of them, and a healthy 70yo has 10yrs ahead of them, saving the 20yo saves 6x as many life-years)
Other than that interesting post, I’ll be keeping an eye on that new strain.

Ulisse Mini 10 Apr 2022 21:10 UTC
2 points
in reply to: johnswentworth’s comment on: Finally Entering Alignment
Thanks! I will definitely read those!

Ulisse Mini 10 Apr 2022 23:30 UTC
2 points
in reply to: Chris_Leong’s comment on: Finally Entering Alignment
Thanks! Some other people recommended the Atlas Fellowship and I’ve applied. Regarding (9) I think I worded it badly, I meant reach out to local politicians (I thought the terms were interchangeable)

Ulisse Mini 11 Apr 2022 0:13 UTC
2 points
in reply to: Ulisse Mini’s comment on: Finally Entering Alignment
Read it, that study guide is really good, really motivates me to branch out since I’ve definitely overfocused on depth before and not done enough applications/”generalizing”

This also reminds me of Miyamoto Musashi’s 3rd principle: Become acquainted with every art

Ulisse Mini 11 Apr 2022 12:58 UTC
1 point
in reply to: Chris_Leong’s comment on: Finally Entering Alignment
Noted

Ulisse Mini 29 Apr 2022 1:54 UTC
1 point
in reply to: philh’s comment on: Don’t be afraid of the thousand-year-old vampire
Yeah I read about 1/3d of the proof of Cox’s theorem until I realized even if I followed every step I wouldn’t gain any intuition from it, then I skipped the rest

Ulisse Mini 5 May 2022 14:21 UTC
7 points
on: Ulisse Mini’s Shortform
Some realizations about memory and learning I’ve been thinking about recently EDIT: here are some great posts on memory which are a deconfused version of this shortform (and written by EY’s wife!)
- Anki (and SRS in general) is a tool for efficiently writing directed graph edges to the brain. thinking about encoding knowledge as a directed graph can help with making good Anki cards.
- Memory techniques are somewhat-analogous to data structures as well, e.g. the link method corresponds to a doubly linked list
- “Memory techniques” should be called “Memory principles” (or even laws).
- The “Code is Data” concept makes me realize memorization is more widely applicable, you could e.g. memorize the algorithm for integration in calculus. Many “creative” processes like integration can be reduced to an algorithm.
- Truely part of you is not orthogonal to memory ~~techniques~~ principles, it uses the fact that a densely connected graph is less likely to be disconnected from randomly deleting edges, similar to how the link and story methods. Just because you aren’t making silly images doesn’t mean you aren’t using the principles.
- (untested idea for math): Journal about your thought processes after solving each problem, then generalize to form a problem solving algorithm / checklist and memorize the algorithm

Ulisse Mini 5 May 2022 21:23 UTC
1 point
in reply to: MackGopherSena’s comment on: Ulisse Mini’s Shortform
= finding shortest paths on a weighted directed graph, where the shortest path cost must be below some threshold :)

Ulisse Mini 5 May 2022 23:12 UTC
1 point
in reply to: eigen’s comment on: Ulisse Mini’s Shortform
I’ll write some posts when I get stuff working, I feel a Sense That More Is Possible in this area, but I don’t want to write stuff till I can at least say it works well for me.

Ulisse Mini 20 May 2022 13:20 UTC
4 points
on: In defence of flailing
Upvoted because I think there should be more of a discussion around this then “Obviously getting normal people involved will only make things worse” (which seems kind of arrogant / assumes there are no good unknown unknowns)

Ulisse Mini 20 May 2022 14:28 UTC
6 points
in reply to: acylhalide’s comment on: In defence of flailing
Yes, I’m not convinced either way myself but here are some arguments against:
- If the USA regulates AGI, China will get it first which seems worse as there’s less alignment-activity in China (as for US China coordination, lol, lmao)
- Raising awareness of AGI Alignment also raises awareness of AGI. If we communicate the “AGI” part without the “Alignment” part we could speed up timelines
- If there’s a massive influx of funding/interest from people who aren’t well informed, it could lead to “substitution hazards” like work on aligning weak models with methods that don’t scale to the superintelligent case (In climate change people substitute “solve climate change” to “I’ll reduce my own emissions” which is useless)
- If we convince the public AGI is a threat, there could be widespread flailing (the bad kind) which reflects badly on Alignment researchers (e.g. if DeepMind researchers are receiving threats, their system 1 might generalize to “People worried about AGI are a doomsday cult and should be disregarded”)
Most of these I’ve heard from reading conversations on EleutherAI’s discord, Connor is typically the most pessimistic but some others are pessimistic too (Connor’s talk discusses substitution hazards in more detail)

TLDR: It’s hard to control the public once they’re involved. Climate change startups aren’t getting public funding, the public is more interested in virtue-signaling (In the climate case the public doesn’t really make things worse, but for AGI it could be different)

EDIT: I think I’ve presented the arguments badly, re-reading them I don’t find them convincing. You should seek out someone who presents them better.

Ulisse Mini 23 May 2022 14:48 UTC
6 points
on: PSA: The Sequences don’t need to be read in sequence
Personally my process goes something like:
1. Click a citation/link on LW that sends me to a sequence post
2. Read the post, opening any interesting citations in new tabs
3. Repeat until I run out of time or run out of interesting citations (the latter never happens)

Ulisse Mini 2 Jun 2022 21:34 UTC
2 points
on: Ulisse Mini’s Shortform
People Power

To get a sense that more is possible consider
1. The AI box experiment, and its replication
2. Mentalists like Derren Brown (which is related to 1)
3. How the FBI gets hostages back with zero leverage (they aren’t allowed to pay ransoms)
(This is an excerpt from a post I’m writing which I may or may not publish. the link aggregation here might be useful in of itself)

Ulisse Mini 11 Jun 2022 19:56 UTC
2 points
on: Show LW: YodaTimer.com
It would be nice to be able to change 5 minutes to something else, I know this isn’t in the spirit of the “try harder luke”, but 5 minutes is arbitrary, it could just as easily have been 10 minutes.

Ulisse Mini 30 Jun 2022 21:47 UTC
6 points
in reply to: MSRayne’s comment on: Cultivating And Destroying Agency
Interesting, I’m homeschooled (unschooled specifically) and that probably benefited my agency (though I could still be much more agentic). I guess parenting styles matter a lot more then surface level “going to school”

You’re super brave for sharing this, it’s hard to stand up and say “Yes I’m the stereotypical example of the problem mentioned here”, stay optimistic though; people starting lower have risen higher.

Those who take delight in their own might are merely pretenders to power. The true warrior of fate needs no adoration or fear, no tricks or overwhelming effort; he need not be stronger or smarter or innately more capable than everyone else; he need not even admit it to himself. All he needs to do is to stand there, at that moment when all hope is dead, and look upon the abyss without flinching.

Ulisse Mini 13 Jul 2022 19:42 UTC
2 points
0
AF
in reply to: Raemon’s comment on: On how various plans miss the hard bits of the alignment challenge
I think even without point #4 you don’t necessarily get an AI maximizing diamonds. Heuristically, it feels to me like you’re bulldozing open problems without understanding them (e.g. ontology identification by training with multiple models of physics, getting it not to reward-hack by explicit training, etc.) all of which are vulnerable to a deceptively aligned model (just wait till you’re out of training to reward-hack). Also, every time you say “train it by X so it learns Y” you’re assuming alignment (e.g. “digital worlds where the sub-atomic physics is different, such that it learns to preserve the diamond-configuration despite ontological confusion”)

IMO shard theory provides a great frame to think about this in, it’s a must-read for improving alignment intuitions.

Ulisse Mini 14 Jul 2022 18:13 UTC
LW: 13 AF: 5
21
AF
in reply to: Ege Erdil’s comment on: Humans provide an untapped wealth of evidence about alignment

If the title is meant to be a summary of the post, I think that would be analogous to someone saying “nuclear forces provide an untapped wealth of energy”. It’s true, but the reason the energy is untapped is because nobody has come up with a good way of tapping into it.

The difference is people have been trying hard to harness nuclear forces for energy, while people have not been trying hard to research humans for alignment in the same way. Even relative to the size of the alignment field being far smaller, there hasn’t been a real effort as far as I can see. Most people immediately respond with “AGI is different from humans for X,Y,Z reasons” (which are true) and then proceed to throw out the baby with the bathwater by not looking into human value formation at all.

Planes don’t fly like birds, but we sure as hell studied birds to make them.

If you come up with a strategy for how to do this then I’m much more interested, and that’s a big reason why I’m asking for a summary since I think you might have tried to express something like this in the post that I’m missing.

This is their current research direction, The shard theory of human values which they’re currently making posts on.

Ulisse Mini 14 Jul 2022 18:20 UTC
3 points
0
in reply to: MSRayne’s comment on: Humans provide an untapped wealth of evidence about alignment
I can’t speak for Alex and Quintin, but I think if you were able to figure out how values like “caring about other humans” or generalizations like “caring about all sentient life” formed for you from hard-coded reward signals that would be useful. Maybe ask on the shard theory discord, also read their document if you haven’t already, maybe you’ll come up with your own research ideas.

Ulisse Mini 17 Jul 2022 16:50 UTC
3 points
0
on: $500 bounty for alignment contest ideas
Alignment researchers have given up on aligning an AI with human values, it’s too hard! Human values are ill-defined, changing, and complicated things which they have no good proxy for. Humans don’t even agree on all their values!

Instead, the researchers decide to align their AI with the simpler goal of “creating as many paperclips as possible”. If the world is going to end, why not have it end in a funny way?

Sadly it wasn’t so easy, the first prototype of Clippy grew addicted to watching YouTube videos of paperclip unboxing, and the second prototype hacked its camera feed replacing it with an infinite scrolling of paperclips. Clippy doesn’t seem to care about paper clips in the real world.

How can the researchers make Clippy care about the real world? (and preferably real-world paperclips too)

This is basically the diamond-maximizer problem. in my opinion, the “preciseness” we can specify diamonds at is a red herring. At the quantum level or below what counts as a diamond could start to get fuzzy

Ulisse Mini 28 Jul 2022 19:30 UTC
1 point
on: Ulisse Mini’s Shortform
KL-divergence and map territory distinction

Crosspost from my blog

The cross-entropy is defined as the expected surprise when drawing from $p (x)$ , which we’re modeling as $q (x)$ . Our map is $q (x)$ while $p (x)$ is the territory.

$H (p, q) = \sum x p (x) log \frac{1}{q (x)}$

Now it should be intuitively clear that $H (p, q) \geq H (p, p)$ because an imperfect model $q (x)$ will (on average) surprise us more than the perfect model $p (x)$ .

To measure unnecessary surprise from approximating $p (x)$ by $q (x)$ we define

$D_{K L} (p ∥ q) = H (p, q) - H (p, p)$

This is KL-divergence! The average additional surprise from our map approximating the territory.

Now it’s time for an exercise, in the following figure $q^{*} (x)$ is the Gaussian that minimizes $D_{K L} (p ∥ q)$ or $D_{K L} (q ∥ p)$ , can you tell which is which?
Left is minimizing $D_{K L} (p ∥ q)$ while the right is minimizing $D_{K L} (q ∥ p)$ .

Reason as follows:
- If $p$ is the territory then the left $q^{*}$ is a better map (of $p$ ) than the right $q^{*}$ .
- If $p$ is the map, then the territory $q^{*}$ on the right leads to us being less surprised than the territory on the left, because on the on left $p$ will be very surprised at data in the middle, despite it being likely according to the territory $q^{*}$ .
On the left we fit the map to the territory, on the right we fit the territory to the map.

Ulisse Mini

People Power

KL-divergence and map territory distinction