Sci-hub has a telegram bot which you can use with a desktop application. It is fast, and more importantly reliable. No more scrounging through proxy lists to find a link that works. Unfortunately, you need to install telegram on a phone first. Still, it has saved me some time and is probably necessary for an independant researcher.
I agree with everything you’ve said, but note that there’s some evidence they were in a relationship (AllAmericanBreakfast, a friend of Shekinah, says she was in “distress about being separated from Alex”. Moreover, they appear to have gone swimming together, which is somewhat intimate). Plus, I don’t see how someone can be raped (non consensual coitus instead of fingering) in a moment unless both parties are naked, one is sexually aroused and they are physically very near. But possibly I’m reading too much into the words there: memory can be faulty, and sometimes people block traumatic memories from their minds, or could just be bad writers.
In some fraction of worlds, I can see how someone could think that consent was implicitly given. Human communication is highly context dependant, with all of us living in our own models of reality. So maybe Alex made an honest mistake or Shekinah gave and retracted consent or so on. But in the vast majority of those worlds, I would guess Shekinah would have been fine with having sex. And in those worlds, I imagine Alex would have described things quite differently in his post.
Anyway, I’m going to stop replying because I don’t want to keep losing karma and I expect further discussion will not be that useful.
Thanks for posting this. I was not aware of the incident you linked to, and though I am left with some confusions about how things played out, it seems plausible to me that Alex is a rapist. Until other information comes to light, I’d caution people (maybe not men?) to avoid one on one situations with him.
EDIT: The other reason I found this comment valuable was because I realized I hadn’t updated enough in light of other rationalist adjacent orgs doing shady stuff. Nor did I updated enough in light of philh’s spoilered reply to this post of his. Henceforth, I am going to be more wary of attending month long training sessions hosted by rat-adjacent orgs, especially those with features similair to OAK.
How could someone be penetrated in an instant without being in a compromising position? What exactly did the penetration? If Alex viewed the situation as someone propositioning him, why didn’t he give more details? I am confused.
Wouldn’t “Shard theory requires work” or “Shard theory requires novel insights” work?
Complex Singularities⟺Fewer Parameters⟺Simpler Functions⟺Better Generalization[...]In a Bayesian learning process, the relevant singularity becomes progressively simpler with more data. In general, learning processes involve trading off a more accurate fit against “regularizing” singularities. Based on Figure 7.6 in .
Complex Singularities⟺Fewer Parameters⟺Simpler Functions⟺Better Generalization
In a Bayesian learning process, the relevant singularity becomes progressively simpler with more data. In general, learning processes involve trading off a more accurate fit against “regularizing” singularities. Based on Figure 7.6 in .
What’s going on here? Are you claiming that you get better generalization if you have a large complexity gap between the local singularities you start out with and the local singularities you end up with?
But ReLU networks are not analytic. Idk man, seems unimportant.
But smoothness is nice.
There’s speculation that we might be able to transfer the machinery of the renormalization group, a set of techniques and ideas developed in physics to deal with critical phenomena and scaling, to understand phase transitions in learning machines, and ultimately to compute the scaling coefficients from first principles.
I thought the orginal scaling laws paper was based on techniques from statistical mechanics? Anyway, that does sound exciting. Do you know if anyone has a plausible model for the Chinchilla scaling laws? Also, I’d like to see if anyone has tried predicting scaling laws for systems with active learning.
Thank you for writing this, I feel like it makes the core idea you’re expressing at much clearer.
My intuition is that abstract Wiener spaces won’t get you the sort of measure you’re looking for alone, based off my experience with measures over big spaces in physics. But, that said, I feel like there should be some such measure over large physical spaces, as presumably power has a definition in terms of physical concepts, or else how the heck can we recover our intuition of power in our world? It should all add up to normality, after all. It seems to me that looking over those physics papers which descibed single particles as agentic because our distributions over them tend towards max entropy, which we can view as the particle seeking the greatest “option value” it can, would be a good place to build up the latter intuition.
I think I am undecided as to whether you can use the rich structre of reward functions to limit the allowed transormations in a useful way. Partly because I suspect that this rich structure reflects a physical structure (something like the natural abstractions thesis + selection pressure from reality for the sorts of rewards we typically see) or perhaps a simplicity prior of some sort. But maybe it will work out. I don’t know.
My lack of optimism as to the possibility of your agenda is basically why I was willing to accept the strange probability distribution TurnTrout went with, I guess. But on reflection, perhaps I should have used that as an existence proof of distribution over reward which allows something like our intuitive picture of power seeking. And tried to see if I could interpret it to be something less weird, use it to find something less weird, or just go look for less weird things because maybe they’d work.
Sorry for the long post, but I just realized I didn’t update based off Turntrout’s results. It seems more likely to me now that your agenda might work. Though I’d be more optimistic if you were using Turntrout’s distribution as inspiration for what to look for in some way.
We failed to point out any interesting concrete priors over distributions of reward functions, but I am optimistic that it should be possible with better understanding of the metric (topological, linear, differential, …?) structure of the reward functions on an MDP.
I want to know what spark of intuition led to your optimism. The technical details didn’t feel like they contributed to conveying this intuition of yours. It would help if you gave some examples of using the metric structure of the some functions in a space to pin down some sort of probability distribution.
I predict that 10-100k is the sweet spot, provided that there is some kind of competitive scene. So storybookbrawl looks like it might work.
I’m very interested in Wei Dai’s work, but I haven’t followed closely in recent years. Any pointers to what I might read of his recent writings?
Unfortunately, everything I’m thinking of was written as comments, and I can’t remember when or where he wrote them. You’d have to talk to Wei Dai on the topic if you want an accurate summary of his position.
So I think his account of meta-ethics is helpful but not complete.
Yeah, I agree he didn’t seem to come to a conclusion. The most in depth examples I’ve seen on LW of people trying to answer these questions are about as in-depth as this post in lukeprog’s no-nonsense metaethics sequence. Maybe there’s more available, but I don’t know how to locate it.
How would the semantics for that work out?
A set of axioms Σ within a vanilla first-order-logic language does not necessarily allow for the ability to speak about what statements are provable from Σ within that language. Consider the axioms scheme ∀x∀y(x=y). There’s not enough strength in Σ to create encodings of statements, which is neccesary to make formal statements in the language which could be interpreted in some standard model to mean “Statement X is provable”. Seriously, try doing it. It should clear things up the need to build in some notion of provability into the syntax and semtantics of a logic if you want to talk about provability within a language. Then read upto section 3 in this SEP article.
EDIT: That is assuming you know stuff about provability predicates already, and things like arithmetization. If you don’t, then the above exercise is way too hard. In that case, I’d just say that Σ⊢ψ and Σ⊢□(ψ) represent “axioms Σ proves statement ψ” and “axioms Σ proves that ψ is provable in some set of axioms T”. T is a “sufficiently strong” set of axioms, like Peano arithmetic, which can do interesting things.
EDIT: I meant what Wei Dai has been “talking about”, not “trying to do”.
This problem you are pointing at sounds like what Wei Dai has been trying to do for years. In some sense, it is like getting a fully specified meta-ethical framework, of the kind Eliezer attempted to describe in the Sequences. Does that sound right?
because I should also myself use a reciprocity-like strategy where my based on my prediction
Shouldn’t this be ”… where I based my prediction …”?
Over time, a virtuous cycle emerged that eased some long-standing tense disconnection and left us more secure/connected and free
The footnote at the end seems very important. Please don’t leave it out of the main body of text.
I think I read a thread somewhere which said that Google has a lot of tooling built and many teams already dedicated to integrating LLMs into their products. But the economics don’t make sense at the moment, apparently. The cost of using these models would need to come down by 1-2 OOM before they’d deploy things. And that seems plausible? Like, I haven’t done a detailed analysis, but Davinci is at around $0.1/1000 words, which sounds way too high to use to augment search.
On the other hand, I expect that few people will need Gopher-like models. The mythical average person probably wants to hear what’s new about celebrity X, or a link to a youtuber’s channel or so on. When they need a link to a wikipedia page, or an answer to a pub quiz question, I suspect GOFAI is enough. So maybe cost is slightly less of an issue if only 1/10-1/100 queries needs these models to be used.
Chat? That is how most people will use it, I imagine.EDIT: It is still early days though, and the shape of things is unclear. What will be the most popular usecases, the ones that stands out in people’s minds as what you use LLMs for? I don’t know yet, so any naming seems pre-mature.
Vanessa’s reply is excellent, and Steven clearly descibed the gauge transforms of EM that I was gesturing at. That said, if you want to see some examples of fibre bundles in physics other than those in non-relativistic QM, nLab has a great article on the topic.
Also, if you know some GR, the inability to have more than local charts in general spacetimes makes clear that you do need to user bundles other than trivial ones i.e. ones of the form M×T. In my view, GR makes the importance of between global structure = fibre bundle clearer than QM, but maybe you have different intuitions.
Do you want to get some transcripts for this? I could do it pretty cheaply using whisper and editing the maths. I think I’d charge $30/hour of work.
For some reason I haven’t understood yet, some madlad physicist was like “ok, but what if we postulate some sort of superpowered version of phase invariance, where the angle that we’re rotating each of the complex numbers by depends on where things are in space.”
This just corresponds to the gauge symmetry of the classical EM field. That was the first gauge theory, but there you added a field to another field instead of multiplying a field by a field dependant factor, subject to some constraints. That was the first gauge theory. General relativity is another example of a gauge theory, with gauge being determined by co-ordinate transformations. That was the second gauge theory, or at least the second one that was apparent. With two historical examples of local gauge invariance, in particular the classical analogue of QED, I think the story of how U(1) local invariance was discovered becomes less strange. Though admittedly, you have to wonder why it took so decades to be discovered.
By doing something like(?) taking a configuration on n particles, and multiplying the amplitude of that configuration by the n different points on the unit circle (aka U(1)) corresponding to the positions of each of the n different particles? I think? So far I’ve mostly stared at simple versions of the equation with 1-particle systems, and haven’t managed to fully understand the texts about the more general case here, and it’s also plausible to me that there’s n different functions from spacetime to U(1), or that I’ve totally misunderstood what I’m doing.
Not sure what you mean by this. Maybe you mean a field with n excitations, but that’s a weird way to put it. I’m guessing you’re talking about a QM system with n particles? In which case, you can ask for the state space of n particles to be invariant under n copies of a constant-across-space U(1) transformation. That’s just the state space of n qubits! I’d recommend looking at the first 2-3 chapters of Peter Woit’s book on this, as it is unusually clear. He has a free draft version on his university web page. But that leaves me confused as to why you’d bring that up in the context of QFT.
What does this mean, for our purposes? Well, if I understand correctly, there is not (in general) a way to choose the map to U(1) such that there seem to be no photons. (You can probably choose a map to U(1) such that any given photon looks ‘fictitious’, but you can’t in general make them all disappear simultaneously.) My guess is that there’s not a canonical way to choose “the” constant map to U(1), similar to how you can’t just choose “the” rest frame in relativity: somebody’s gotta pick out the basis vectors, and picking them out involves choosing quite a lot of data.
You certainly can’t eliminate all photons from spacetime, in general. And there is no canonical choice of gauge. Just use whatever simplifies the calculations, like you would in classical EM.