Still haven’t heard a better suggestion than CEV.
TristanTrim
That makes sense. I read somewhere that in hunter gatherer contexts we evolved in, being shunned from the tribe could be life or death. I think to a certain extent that’s still true, but less so than in the past. In any case, it feels like a compelling reason that we would be hard coded to find interpersonal conflict really innately compelling.
(1) Future Ability to Remember Things
I don’t have this one! My ADHD in my youth and throughout my life has painfully and eternally etched into my mind that I do not have the ability to insert things into my future contexts through the will of my mind alone, and often physical interventions like notes can still fail. A classic example is writing a note that I fail to ever reference in the relevant context. So afaik I basically never think “I can let this detail leave my current context knowing I can remember it.” I can’t. Instead what I think is “I can keep this in my current context from now until it’s relevant” or “the amount of effort to return this to my future context is more expensive than what I hope to gain by having this detail in my future context”. I am still regularly blindsided by unforeseen failures in my strategy to insert details into future contexts. For example, setting three reminders on my phone, then turning off my phone for an exam and forgetting to turn it back on after.
(2) Local Optima of Comfort
I definitely have this one! I noticed one I called “sleepiness in the morning is because you are coming out of sleep, not because you didn’t get enough sleep” but “local optima of comfort” is a really good generalization.
Recently I’ve started swimming in the ocean again (about 9℃ where I live). I feel really good after, but always have an aversion to going in. Interestingly, I find myself motivated by eating a granola bar after as a motivation. It seems like the pre-swim version of myself still requires the future granola bar to motivate swimming even though the post-swim version of myself that actually eats the granola bar enjoys the feeling of being capable of regulating my temperature and the relaxed calming feeling of having swum much more than eating the granola bar.
(3) Interpersonal Conflict
I’m not sure about this one. I think in conflict I’m somewhat likely to try to depersonalize and describe myself and the people I’m in conflict with in 3rd person… but I think I do experience similar effects around stubbornness and feeling like people owe me communication, for example, when people downvote without saying why I get irritated, like, how am I supposed to understand why you are downvoting if you don’t say anything! But of course taking the time to put things into words is a scarce resource that strangers on the internet are in fact, not obligated to spend on me.
(4) Bonus: Recognizing I’m in a Dream
I really like lucid dreaming and dream incubation. One strategy I used to use is regularly “trying to teleport to a predetermined location”. If I fail to actually teleport there, I conclude I’m awake. If I do teleport there, I conclude I’m dreaming and start doing whatever dream exploration I wanted to do in that location. A very fun one I used to do is teleporting to an open field and increasing the amount of force I can jump with to the point I can jump a hundred feet in the air and try to do flips and focus on the feeling of my legs pushing against the ground and the way the world seems to spin around me as I am in the air.
...
This was a fun post. Thanks for writing it.
Being responsible for the blinding was annoying for my girlfriend, and it would be better to find ways to either do it myself in future experiments, or at least streamline the process to not have her involved every day of the experiment
I like this sort of thing. I think you could get a small lazy susan, and 3 shot glasses. Fill and dose each glass placing a note card under each one specifying the date and dose or placebo. Then close your eyes and spin the lazy susan. Choose a cup from the randomized lazy susan and add it to a larger drink to dilute the taste. Take the card from that glass and store it for reference. Pour out the other glasses and destroy their cards.
I think that would get you self administered blind testing at the cost of wasting some melatonin and note cards.
I end up mentally exhausted and resistant to demands.
I relate to this, but it makes me wonder if progressive overload / antifragile dynamics apply here. In resistance training you (1) expose your musculoskeletal system to stress, (2) consume nutrition for recovery, (3) rest to allow recovery beyond the initial setpoint, and (4) repeat step 1 with progressively increasing stress. So I wonder if mental stress works the same way.
I’ve tried looking into kinds of brain training such as dual-n-back and it seems like research is generally pessimistic about skill transfer. It seems if you practice one cognitively demanding task it makes you better at that task, but that doesn’t transfer to dissimilar tasks.
But I wonder if those trials have been misunderstanding mental effort. At first starting dual-n-back it feels effortful, but once you have practice, it no longer feels effortful, even as n increases, so what would be required for progressive overload would be a better model of mental effort. It seems like silentbob’s procrastination drill would come closer than any fixed cognitive test practice, but in general I would think it is the switching to cognitively dissimilar tasks that causes the cognitive stress. Feeling like you’ve been thrown in the deep end without knowing how to swim, so to speak. Once you figure out how to swim you can’t use swimming to practice figuring out how to swim anymore, then you’re just practicing swimming.
Indeed. Thank you 🙏 I’ll edit the post based on this. I think “injective” is most correct for the claim, although I don’t know of any commonly used discontinuous activation functions.
You might also be interested in the second half of this comment.
Before engaging more I should note I don’t really know what the “sceptic” stance is, other than just being dubious of things. If you think I should be more informed please point me at resources.
From your linked post:
In the end the illusion that demon creates has to be all-encompasing.
I like this. This is a good point. I think there’s still unstated nuance but I like noticing the way the scope of demonic illusions must creep far outside of any capability we are aware of.
At which point… what is the actual difference between the “reality” and such “illusion”? They work according to the same rules and produce all the same observations.
Not so.
In the world model of the “naturalistic universe”, it feels right to assume the existence of a past and future that give my memories, past experiments, and future plans real context. A reality illusion demon would provide no such assurances. Perhaps it dreamed of this single moment, having fully fabricated the past and with no plan of further exploring the future. In this case, my plans to perform future tests and make a nice cup of tea gain me no real utility since the demon will not be emulating me in the future where I can see the results of those tests or enjoy that tea. That is the sort of demon to which I say “ok, whatever, I’m just gonna assume you don’t exist because if you do you clearly break and control all my attempts to influence reality in any way.”
I’ll be talking more about Münchhausen trilemma in future posts.
I wasn’t aware of the term “Munchhausen trilemma”. It’s a good term. Thanks!
People seemed to like this comment describing details of Maat.
This feels similar to what I was exploring in this post.
A few thoughts on dispelling illusions imposed by evil demons:
Invoking the anthropic principle makes me uncomfortable. I feel it is an unfinished concept, but I suspect there are vastly more instances of the experience of having the memory of temporal events in worlds where there are indeed temporal events than in worlds lacking temporal events. This lends evidence decreasing the likelihood of particularly pernicious demons, but indeed, it doesn’t disprove them and it is not based on solid ground.
Setting aside that uncomfortable argument using deduction based on assumptions about the set of possible worlds, if everything I observe is a demonic illusion, I admit defeat, and so I gain more utility by focusing on the worlds where I expect my actions can, in principle, lead to more utility.
A similar principle applies to trusting my tools of logical analysis. First, I try to subject my tools of logical analysis themselves to logical analysis to improve my confidence they should work well. There is a long tradition of people attempting to do so. But ultimately all of this rests on using tools we already have. We could search for new tools, but how would we evaluate their effectiveness? We would need to use our existing tools. So there is no escaping the need to rely on our existing tools.
Math is a branch of deductive reason which is a branch of philosophy ;^p
Thanks : )
I am interested in semantic distance, but before that I am interested in semantic continuity. I think the idealized topology wouldn’t have a metric, but the geometric spaces in which that topology is embedded gives it semantic distance, implicitly giving it a metric.
For example, in image space slight changes in lighting would give small distances, but translating or rotating an image would move it a very great distance away. So the visual space is great for humans to look at, but the semantic metric describes things about pixel similarity that we usually don’t care about outside of computer algorithms.
The labelling space would have a much more useful metric. Assuming a 1d logit, distance would correspond to how much something does or does not seem like a cat. With 2d or more logits the situation would become more complicated, but again, distance represents motion towards or away from confidence of whether we’re looking at a cat, a dog, or something else.
But in both cases, the metric is a choice that tells you something about certain kinds of semantics. I’m not confident there would exist a universal metric for semantic distance.
You could define the topology on the output space so that by definition the network is continuous (quotient topology) but then topology really is getting you nothing.
I’d actually be more inclined to do this. I agree it immediately gets you nothing, but it becomes more interesting when you start asking questions like “what are the open sets” and “what do the open sets look like in the latent spaces”.
Bringing back the cat identifier net, if I look at the set of high cat confidence, will the preimage be the set of all images that are definitely cats? I think that’s a common intuition, but could we prove it? Would there be a way to systematically explore diverse sections of that preimage to verify that they are indeed all definitely cats?
The fact that it’s starting from a trivial assertion doesn’t make it a bad place to start exploring imo.
I think that kinda direction might be what you’re getting at mentioning “informal ideas I discuss in between the topology slop”. So it’s true, I might stop thinking in terms of topology eventually, but for now I think it’s helping guide my thinking. I want to try to move towards thinking in terms of manifolds, and I think noticing the idea of semantic connectivity, ie, a semantic topological space, without requiring the idea of semantic distance is worthwhile.
I think that might be one of the ideas I’m trying to zero in on: The distributions in the data are always the same and what networks do is change from embedding that distribution in one geometry to embedding it in a different geometry which has different (more useful?) semantic properties.
Oh yeah, that makes sense. I wouldn’t want to make that assumption though, since activation functions are explicitly non-linear, otherwise the multiple layers can be multiplied together and a multi-layer perceptron would just be an indirect way of doing a single linear map.
Yeah, that’s probably part of it, although technically they are only the same with the quotient function being the very natural function of throwing away whatever component is not in the vector subspace to project straight down into that subspace, but this is not the only possible choice of function and so not the only possible space to get as a result.
I think all monotonic functions would give a homeomorphic space, but functions with discontinuities would not and I’m not sure about functions that are surjective but not injective. And functions that are not surjective fail the criteria for generating a quotient space.
Edit: I think maybe functions with discontinuities do still give a continuous space so long as they are surjective, which is required It would just break the vector properties between the two spaces, but that’s not required for a topological space. This is inspiring me to want to study topology more : )
I do not have a formal definition, but it’s the sort of thing I’m interested in.
In future posts I’d like to explore how I’m sorta talking about the distribution that exists in the actual data structures while gesturing at the idea of an idealized semantic space representing the natural phenomena being described. The natural phenomena and idealized semantic space are what I’m interested in with the actual data structures being a way to learn about that ideal space and with the motivation that understanding of the ideal space could be applied inside the domain of neural nets and machine learning and potentially applied in broader scientific/engineering domains directly.
Trying to formalize what I’m talking about would be a big part of that exploration.
I did describe this stuff in more detail in Zoom Out: Distributions in Semantic Spaces so you might want to read and comment there, but I’ll try to answer your questions a bit here.
By the “input space” and “output space” I am fuzzily both referring to the space of possible values that the data structure of the networks input and output can take, and also referring to the space of configurations of phenomena generating that data. I might call these the “digital space” and “ideal space” respectively.
So in the case of visual/image space, the digital space would be the set of possible rgb values, while the ideal space would be the space of possible configurations of cameras capturing images (or other ways of generating images). Although there are many more images in the digital space than look like anything other than static to people, the ideal space is actually a much larger space than is distinguishable by the resulting data structure, because, for example, I can aim a camera at a monitor and display any pattern of static on that monitor, or aim a different camera at a different monitor and generate all the same images of static. The same data resulting from different phenomena.
So you could think of the underlying sets being:
For digital space the underlying set is a nice clean finite digital set with cardinality two to the power of however many bits there are in your data structure.
For ideal space the underlying set is the incomprehensibly large set of possible phenomena.
I also have two topologies in mind. The topology I’m more interested in I might call the “semantic topology” which would have as open sets any semantically related objects. But I’m also thinking of the semantic topology as being approximated by sufficiently high dimensional spaces with the usual topology, although the semantic topology is probably coarser than the usual topology. But that is all very ungrounded speculation.
Wouldn’t the output space just be the interval [0,1]
That depends on the network architecture and training. I think it’s more natural to have [0,1]^2 with one dimension mapped to “likelihood of cat” and the other to “likelihood of dog”, rather than have some “cat-not-cat” classifier which might be predisposed to think dogs are even more not a cat than nothing at all. But you could train such a network and in that case, yes, the output space would be the interval [0,1].
But another consideration is whether the semantics you’re actually interested in span the entire input space. It’s very likely they do not, in which case it’s likely they also don’t span the output [0,1], but maybe [0.003, 0.998] or (0.1, pi/4) or some other arbitrary bound. This is quite certain in the case logits which get normalized by a softmax, in which case it would surprise me if the semantic distribution spanned from -infinity to infinity on any dimension.
and the input space [0,1]^N
My answer is essentially the same as the above with the exception that the digital space might be quite explicitly the entire [0,1]^N even if most of it is in an open set of the semantic topology linked by the semantic of “it’s a picture of static noise”.
I also note [0,1]^N has infinite resolution of colour variability between white and black. This is not true for actual pixels which have a large but finite set of possible values.
ToW: Timelines for AI progress: Epistemics of Prediction, Distributions, and Chained Mappings
You are of course correct!
I just opened up a topology textbook and found that I was using the word “subspace” while thinking about quotient topologies induced by a surjective function. (I wonder if there is a shorthand word for that like there is for induced subspace topologies? I think I’ll just say a “quotient space”.)
I’m getting the impression you are more familiar with this than me, but in case you want help recalling, or for the sake of other readers:
A subspace of topology X is a set S containing a subset of the elements of X that is “clipped”, so it does not containing the topological information of elements of X not found in S. Or more formally, the set of open sets of S is the set of each open set in X intersected with S.
A quotient topology Y from X wrt function f, where f is a surjective function from X to Y, is the topology I had been thinking of where f can be thought of as squishing, folding, or projecting points of a larger set into a smaller set. Formally, the open sets of Y is each set U for which the inverse image of U is an open set in X.
Thanks for catching that! I’m thinking I should change the article with some kind of note of correction.
( I’m not sure how embarrassed I should be about making this mistake. I think if I was a professor this would be quite embarassing. It’s less embarrassing as a recent BSc graduate who has only struggled through one course on topology, but is nevertheless very interested in it. Next time I’ll try to notice I should reference my textbook while writing the article. I think I got confused because I was thinking about both vector subspaces, which are topological subspaces of the larger vector space, but a different topology getting mapped into that vector subspace would be a quotient space not a subspace. )
Awesome! I need to state things that are obviously true more since they might be false or at least not obvious.
I think it is false for actual neural networks since floating point doesn’t perfectly approximate real numbers, so if that’s your intuition I very much agree, but it doesn’t seem likely to matter much in practice.
The claim may also fail to hold for more exotic architectures, such as transformers. Not sure. I should maybe have specified vanilla nets, but I’m not sure which architectures it would and wouldn’t apply to.
Is your intuition that it is false is coming from another direction from those I mentioned? I’m interested to know more about your perspective.
ToW: Some thinking along the lines of Semantic Topological Spaces’s claim 1 and The Natural Abstraction Hypothesis: Implications and Evidence. Should we expect the path networks take through geometries to “bend” towards certain geometries that are “more abstract”? What would that really mean and how would we tell and what would the implications be?
ToW: Exploration of different kinds of “wanting” within OIS with different kinds of minds.
a natural immunity (well, aversion) to adopting complex generators, and a natural affinity for simple explanations
I think this is wrong in an important way… most people find math complex, even when it describes simple things, and they find (simple) human and animal behaviours and emotions simple, even though they are some of the most complex natural phenomena I am aware of. So a more accurate statement would be “people have biases in the space of possible explanations that sometimes lead them toward overly complex explanations and sometimes lead them to overly simple explanations”.
Then we could recover your original claim that in the space of explanations of human behaviour, people are more likely to look for overly simple explanations than for overly complex explanations. But I think in explaining human behaviour people often do both, and when looking for sufficiently complex explanations, it is still true that more of them are wrong than are right.
Nope, that’s all coming from your expectations, not from me.
Right. Sorry if it caused any offence. People often seem motivated to misunderstand and condemn others, so I wouldn’t fault anyone for not wanting to say what they truly mean when discussing topics like these.
ToW: Exploration of what the space is that semantics map references into. Since a statement can be false, it can’t be reality. But since we can talk about true or false, reality must be involved somehow.