If you want to chat, message me!
LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.
If you want to chat, message me!
LW1.0 username Manfred. PhD in condensed matter physics. I am independently thinking and writing about value learning.
Once you play with time travel having retrocausal power a few years in the past, there’s no reason to limit yourself to a few years. You might as well say that Trump will cause a nuclear war, thus giving humanity enough time to solve alignment properly by the time it gets around to building transformative AI. And why should aligned AIs have all the fun with time travel? Maybe Trump’s election leads through no fault of his own to a butterfly effect that causes a specific unaligned AI to gain these magic timestream powers, even as it disassembles the Earth for raw materials in 2031.
Well, I was going by the original post where you’re rotationally symmetric, not mirror-symmetric. But for mirror symmetry, pick the mirror plane to be the z direction, then you just make your spins either both +z or both -z.
If you’re in a classical-physics simulation that’s guaranteed to have rotational symmetry, then yep, you stay mirrored.
In any messy real world setting (even ignoring quantum mechanics), chaos theory would indeed kick in pretty quickly—the thermal jostling of air molecules, or radiation from outside, would slowly lead to neurons firing at slightly different times.
But maybe the most sciencey way to unmirror yourselves is to suppose you have the ability to prepare and measure an entangled quantum state. E.g. suppose you meet in the middle and put the spin of a pair of atoms into the state |up,down>+|down,up>. This state is rotationally symmetric (for the nitpickers, I think I’ve implied the atoms have integer spin), so you can do this, but when you measure the atoms you’ll get opposite results.
Yeah, patisserie is one of those kitchen-french words that have made it to american english, but I’ve never heard conjugated :P
I’m imagining a future post about how society has defense mechanisms against people trying to focus on legitimacy[1] advising us to stop doing that so much :P
1: Public criticism of people trying to persuade the public.
2: Powerful actors refusing to go along with distributed / cooperative plans for the future.
3: Public criticism of anyone trying to make Our Side give up power over the future.
4: Conspiracy theories about what The Man is trying to persuade you of.
5. The evolution of an accelerationist movement who want to avoid anti-centralization measures from society insofar as they require limiting the size of individual advances.
I guess this is 70% aimed at Ramana. I find most attempts at “objective definition” unsatisfying qua definitions since they cut out fuzzy intuitive bits that often matter for everyday speech, but that’s less interesting to talk about.
https://www.lesswrong.com/posts/eDpPnT7wdBwWPGvo5/2-place-and-1-place-words
You can always take a concept that’s relative to your own model of the world and curry it—folding the relevant information from your model of the world into a curried version of the concept.
If simultaneously have such a curried concept (feel like it “obviously exists”), and deny the legitimacy of constructing concepts in this way when you examine concepts reflectively, you will be confused.
I think acting to reduce overhang by accelerating research on agents is getting lost in the sauce. You can’t blaze a trail through the tech tree towards dangerous AI and then expect everyone else to stop when you stop. The responsible thing to do is to prioritize research that differentially advances beneficial AI even in a world full of hasty people.
This seems reasonable, I’m glad you’ve put some thought into this. I think there are situations where training for situational awareness will seem like a good idea to people. It’s only a dangerous capability because it’s so instrumentally useful for navigating the real world, after all. But maybe this was going to be concentrated in top labs anyway.
After some thought, I think making the dataset public is probably a slight net negative, tactically speaking. Benchmarks have sometimes driven progress on the measured ability. Even though monomaniacally trying to get a high score on SAD is safe right now, I don’t really want there to be standard SAD-improving finetuning procedures that can be dropped into any future model. My intuition is that this outweighs benefits from people being able to use your dataset for its original purpose without needing to talk to you, but I’m pretty uncertain.
Separately, I think your neuroanatomy is off—visual object recognition is conventionally associated with the occipital and temporal lobes (cf. “ventral stream”)
Well, object recognition is happening all over :P My neuroanatomy is certainly off, but I was more thinking about integrating multiple senses (parietal lobe getting added to the bingo card) with abstract/linguistic knowledge.
Maybe we should switch away from bleggs/rubes to a real example of coke cans / pepsi cans. There is a central node—I can have a (gestalt) belief that this is a coke can and that is a pepsi can. And the central node is in fact important in practice. For example, if you see some sliver of the label of an unknown can, and then you’re trying to guess what it looks like in another distant part of the can (where the image is obstructed by my hand), then I claim the main pathway used by that query is probably (part of image) → “this is a coke can” (with such-and-such angle, lighting, etc.) → (guess about a distant part of image). I think that’s spiritually closer to a Network 2 type inference.
Yeah, filling in one part of the coke can image based on distant parts definitely seems like something we should abstract as Network 2. I think part of why this is such a good example is because the leaf nodes are concrete pieces of sensory information that we wouldn’t expect to be able to interact without lots of processing.
If we imagine the leaf nodes as more processed/abstract features that are already “closer together,” I think the Network 1 case gets stronger.
Gonna go read about semantic dementia.
Suppose the reward at each timestep is the number of paperclips the agent has.
At each timestep the agent has three “object-level” actions, and two shutdown-related actions:
Object-level:
use current resources to buy the paperclips available on the market
invest its resources in paperclip factories that will gradually make more paperclips at future timesteps
invest its resources in taking over the world to acquire more resources in future timesteps (with some risk that humans will notice and try to shut you down)
Shutdown-related:
Use resources to prevent a human shutdown attempt
Just shut yourself down, no human needed
For interesting behavior, suppose you’ve tuned the environment’s parameters so that there are different optimal strategies for different episode lengths (just buy paperclips at short timescales, build a paperclip factory at medium times, try to take over the world at long times).
Now you train this agent with DREST. What do you expect it to learn to do?
Interesting idea. Not sure I want to build an advanced agent that deliberately tries to get shut down with a Poisson distribution, but I agree that given certain desiderata that’s what peak performance looks like.
Hm, now I’m not sure if I’ve gotten things wrong :)
So a few things I think might clarify what I’m thinking, and I guess loosely argue for it:
There’s various specialized areas of the brain, where killing off some neurons will cause loss of capabilities (e.g. the fusiform face area for recognizing faces). But my impression was there isn’t a region where “the blegg neurons” (or the tiger neurons, or the chocolate chip cookie neurons) are, such that if they get killed you (selectively) lose the ability to associate the features of a blegg with other features of a blegg.
Top-down or lateral connections are more common than many used to think. Network 2 can still have plenty of top-down feedback, it just has to originate from a localized Blegg HQ[1]. Lateral connections are a harder problem for network 2 - I found numenta’s youtube channel a few weeks ago and half-understood a talk about lateral connections, but somewhere along the line I got sold on the idea that lateral connections, while sparse, are dense enough to allow information to percolate every-which-way.
Although, given sparsity, a specific patch at a specific time might have strictly hierarchical information flow with some high (?) probability.
I suspect you’re thinking about object recognition in the prefrontal cortex (maybe even activation of a specific column). Which… is a good point. I guess my two questions are something like: How much distributed processing bypasses the prefrontal cortex? E.g. suppose I cut off someone’s frontal lobe[2], and then put an egg in their hand—they’re more likely to say “egg” or do egg-related things, surely—how does that fit into a coarse-grained graph like in this post? And second, how distributed is object recognition in the PFC? if we zoom in on object-recognition, does the information actually converge hierarchically to a single point, or does it get used in a lot of ways in parallel that are then sent back out?
I guess in that latter case, drawing network 2 can still be appropriate if from “far away in the brain” it’s hard to see internal structure of object recognition.
Although that assumes the other nodes are far away—e.g. identifying the “furred” node with a representation in the somatosensory cortex, rather than as a more abstract concept of furriness.
Unless Blegg HQ isn’t localized, in which case one would be interpreting the diagram more figuratively—maybe even as a transition diagram between what thoughts predominate?
Okay, I just googled this and got the absolutely flooring quote “Removal of approximately the anterior half of the right frontal lobe in a third case was not associated with any noticeable alteration, neurological or psychological.”
For anyone else confused, I can confirm that this post is normal physics.
An asterisk: superconductivity can be mediated by other excitations, like spin density wave excitations (as in iron-based superconductors), not just phonons. So the phonon spectrum of LK-99 is informative but not vital for superconductivity.
Bringing up the role of social media in why this blew up is definitely interesting. I’d mostly been shrugging my shoulders and saying “who knows why anything goes viral?”
I think a moral of the story for me is: when it comes to science reporting, just be aware a decent fraction is bullshit, and exercise caution.
Sorry for your acquaintance. Publishing bullshit is common, falsifying data is also regrettably something to be aware of especially when it seems like a “white lie,” but bullying your grad student in such an egregious way is not common. I guess there’s always a bad apple.
Revisiting this with the advantage of more neuroscience knowledge, it’s likely this isn’t how the brain does things. It’s more likely (going mostly off secondary literature e.g. Jeff Hawkins) that the cortex is more like a sparsely-connected version of network 1. In that picture, our brains treat “blegg/rube” (or rather, linguistic associations that function like ‘thinking about the word blegg’) as just another part of the cortex that can activate other parts, and be activated in turn.
Back in 2008, it was a common intuition that for neural networks (artificial or natural) to work well, the neurons had to assemble to form hierarchical logical circuits. Sort of the network 2 side of the dichotomy. “It’s more efficient!” they said. But a lot of those intuitions have had to be unlearned. I place a major sea change in 2015, with the ResNet paper. ResNets (networks that default to only lightly massaging the data at each layer) make perfect sense if you think about flow and gradients in activation-space, but no sense if you think the NN should be implementing human-intuition-scale logical circuits.
The lesson of this post is of course still right, and still valuable, but the background assumptions about brains and other neural networks are dated.
Yeah, the energy radiated to infinity only gets reduced if it’s being used for something long-term, like disassembling the sun or sending off energy-intensive intergalactic probes.
Imperfect efficiency isn’t because it’s transparent (as everyone keeps trying to say, it doesn’t have to let through any sunlight at all) - it’s because of Carnot efficiency. If you want to convert sunlight into electrical energy, you can’t do it perfectly, which means your Dyson swarm heats up, which means it radiates light in the infrared.
So if 2⁄3 of the sun’s energy is getting re-radiated in the infrared, Earth would actually stay warm enough to keep its atmosphere gaseous—a little guessing gives an average surface temperature of −60 Celsius.
Very different in architecture, capabilities, and appearance to an outside observer, certainly. I don’t know what you consider “fundamental.”
The atoms inside the H-100s running gpt4 don’t have little tags on them saying whether it’s “really” trying to prevent war. The difference is something that’s computed by humans as we look at the world. Because it’s sometimes useful for us to apply the intentional stance to gpt4, it’s fine to say that it’s trying to prevent war. But the caveats that comes with are still very large.
For a change of pace, I think it’s useful to talk about behaviorism.
In this context, we’re interpreting positions like “behaviorism” or “computationalism” as strategies for responding to the question “what are the differences that make a difference to my self?”
The behaviorist answers that the differences that make a difference are those that impact my behavior. But secretly, behaviorism is a broad class of strategies for answering, because what’s “my behavior,” anyhow? If you have a choice to put either a red dot or a blue dot on the back of my head, does that make a difference to my self even before I could possibly know what color the dot was, because if you make one choice me turning my head will be a “moving a blue dot behavior” while if you make the other choice it will be a “moving a red dot behavior”?
Your typical behaviorist will say that which color dot you put on my head has not caused a meaningful change. A description of my behavior isn’t intended (by this hypothetical behaviorist) to be a complete description of all the atoms in the universe, or even all the atoms of my body. Instead, “my behavior” should be described in an ontology centered on what information I have access to through my senses, and what affordances I use to exhibit behavior. In such a blinkered and coarse-grained ontology, the “moving a blue dot behavior” and the “moving a red dot behavior” have identical descriptions as a “turning my head behavior.”
This is useful to talk about, because the same song and dance still applies once you reject behaviorism.
Suppose some non-behaviorist answers that the It’s not just my behavior that matters, but also what’s going on inside. What’s “what’s going on inside”, and why how’s it different from “my behavior”?
Does “what’s going on inside” require a description of all the atoms of my body? But that was one of the intermediate possibilities for “my behavior”. And again, suppose I have some cells on the back of my head, and you can either dye them red, or dye them blue—it seems like that doesn’t actually change what’s going on inside.
So our typical non-behaviorist naturalist will say that a description of what’s going on inside isn’t intended to be a complete description of all the atoms in my body, instead “what’s going on inside” should be described in an ontology centered on...
Well, here as someone with computationalist leanings I want to fill in something like “information flow and internal representations, in addition to my senses and behavioral affordances”, and of course since there are many ways to do this, this is actually gesturing at a broad class of answers.
But here I’m curious if you’d want to fill in something else instead.