I honestly don’t think the tradeoff is real (but please tell me if you don’t find my reasons compelling). If I study category theory next and it does some cool stuff with the base map, I won’t reject that on the basis of it contradicting this book. Ditto if I actually use LA and want to do calculations. The philosophical understanding that matrix-vector multiplication isn’t ultimately a thing can peacefully coexist with me doing matrix-vector multiplication whenever I want to. Just like the understanding that the natural number 1 is a different object from the integer number 1 peacefully coexists with me treating them as equal in any other context.
I don’t agree that this view is theoretically limiting (if you were meaning to imply that), because it allows any calculation that was possible before. It’s even compatible with the base map.
I wouldn’t be heartbroken if it was defined like that, but I wouldn’t do it if I were writing a textbook myself. I think the LADR approach makes the most sense – vectors and matrices are fundamentally different – and if you want to bring a vector into the matfrix world, then why not demand that you do it explicitly?
If you actually use LA in practice, there is nothing stopping you from writing Av. You can be ‘sloppy’ in practice if you know what you’re doing while thinking that drawing this distinction is a good idea in a theoretical text book.
That looks like it also works. It’s a different philosophy I think, where LADR says “vectors and matrices are fundamentally different objects and vectors aren’t dependent on bases, ever” and your view says “each basis defines a bijective function that maps vectors from the no-basis world into the basis-world (or from the basis1 world into the basis2 world)” but it doesn’t insist on them being fundamentally different objects. Like if V=Rn then they’re the same kind of object, and you just need to know which world you’re in (i.e. relative to which basis, if any, you need to interpret your vector to).
II don’t think not having matrix-vector multiplication is an issue. The LADR model still allows you to do everything you can do in normal LA. If you want to multiply a matrix A with a vector v, you just make v into the n-by-1 matrix and then multiply two matrices. So you multiply A⋅M(B,v) rather than A⋅v. It forces you to be explicit about which basis you want the vector to be relative to, which seems like a good thing to me. If B is the standard basis, then M(B,v) will have the same entries as v, it’ll just be written as ⎡⎢
⎥⎦ rather than (v1,...,vn).
Afaik, in ML, the term bias is used to describe any move away from the uniform / mean case. But in common speech, such a move would only be called a bias if it’s inaccurate. So if the algorithm learns a true pattern in the data (X is more likely to be classified as 1 than Y is) that wouldn’t be called a bias. Unless I misunderstand your point.
Ow. Yes, you do. This wasn’t a typo either, I remembered the result incorrectly. Thanks for pointing it out, and props for being attentive enough to catch it.
Or to be more precise, you only need one scalar, but the scalar is for y not z, because z isn’t given. The theorem says that, given x and y, there is a scalar a and a vector z such that x=ay+z and y is orthogonal to z.
I wonder, what do you think about the chapter about dual spaces, dual maps, annihilator, etc.?
Nothing, because it wasn’t in the material. I worked through the second edition of the book, and the parts on duality seem to be new to the third edition.
I believe when mathematicians say that in general P(x) holds, they mean that for any x in the domain of interest P(x) holds. Perhaps you want to you typical instead of general here. E.g. there is a notion called typical tensor rank of tensors of given shape, which means a tensor rank which occurs with non-zero probability when a random tensor of given shape is sampled.
Thanks for that, I changed it.
It seems to me that one of the qualities of a good question for the LBO is that a correct answer has high utility, while another is the ability to judge the oracle’s answer. If we take only the intersection between the two sets of questions that meet 1 and 2, it’ll be a relatively small set. But if there is a set of questions where one might or might not be able to judge the answer but which are high utility, one could ask the LBO a set of such questions. Then, once the episode is over, give reward on those where one can evaluate the answers, and give null reward on the others.
Not really a submission, just thinking out loud.
Submission for LBO:
Input a corpus of text (could be multiple posts) describing technical approaches to align a powerful AI. Split this into a finite number of items that are relatively short (such as paragraphs). Ask the oracle to choose the part that is most worth spending more time on. (For example, there might be a paragraph with a dangerous hidden assumption in an otherwise promising approach, and thinking more about it might reveal that and lead to conceptual progress.)
Have a team of researches look into it for an adequate amount of time which is fixed (and told to the oracle) in advance (maybe three months?) After the time is over, have them rate the progress they made compared to some sensible baseline. Use this as the oracle’s reward.
Of course this has the problem of maximizing for apparent insight rather than actual insight.
Yes. Well put. This is related (though not identical) to the excellent Rest in Motion post from Nate Soares.
Either charities like the Gates Foundation and Good Ventures are hoarding money at the price of millions of preventable deaths
My assumption before reading this has been that this is the case. Given that, does a reason remain to update away from the position that the GiveWell claim is basically correct?
For the rest of this post, let’s suppose the true amount of money needed to save a life through GiveWell’s top charities is 50.000$. I don’t think anything about Singer’s main point changes.
For one, it’s my understanding that decreasing animal suffering is at least an order of magnitude more effective than decreasing human suffering. If the arguments you make here apply equally to that (which I don’t think they do), and we take the above number, well that’s 5000$ for a benefit-as-large-as-one-life-saved, which is still sufficient for Singer’s argument
Secondly, I don’t think your arguments apply to existential risk prevention and even if they did and we decrease effectiveness there by one order of magnitude, that’d also still validate Singer’s argument if we take my priors.
I notice that I’m very annoyed at your on-the-side link to the article about OpenAI with the claim that they’re doing the opposite of what the argument justifying the intervention recommends. It’s my understanding that the article, though plausible at the time, was very speculative and has been falsified since it’s been written. In particular, OpenAI has pledged not to take part in an arms race under reasonable conditions, which directly contradicts one of the points of that article. Quote:
Therefore, if a value-aligned, safety-conscious project comes close to building AGI before we do, we commit to stop competing with and start assisting this project. We will work out specifics in case-by-case agreements, but a typical triggering condition might be “a better-than-even chance of success in the next two years.”
That, and they seem to have an ethics board with significant power (this is based on deciding not to release the full version of GPT). I believe they also said that they won’t publish capability results in the future, which also contradicts one of the main concerns (which, again, was reasonable at the time). Please either reply or amend your post.
I’ll also be attending the full 10 day version. I’ve only been meditating for a couple of months so the prospect of such a long retreat feels fairly threatening, but looking at the mean outcome, I think it’s the correct call.
What is the best textbook on analysis out there?
My goto source is Miri’s guide, but analysis seems to be the one topic that’s missing. TurnTrout mentioned this book which looks decent on first glance. Are there any competing opinions?
I’ve noticed that I cannot tell, from casual conversation, whether someone is intelligent in the IQ sense.
I can’t really do anything except to state this as a claim: I think a few minutes of conversation with anyone almost always gives me significant information about their intelligence in an IQ sense. That is, I couldn’t tell you the exact number, and probably not even reliably predict it with an error of less than 20 (maybe more), but nonetheless, I know significantly more than zero. Like, if I talked to 9 people evenly spaced within [70, 130], I’m pretty confident that I’d get most of them into the correct half.
This does not translate into and kind of disagreement wrt to GPT’s texts seeming normal if I just skim them. Or to Robin Hanson’s thesis.
No, but I’ve read almost all of the sequences on website, I think. I didn’t do it systematically, so it’s almost a guarantee that I missed a few, but not many. Read some stuff twice, but again, not systematically.
I think they’re amazing, and they’ve had a profound impact on me.
I do have a spreadsheet where I keep track of predictions, though only tracking the prediction, my confidence, and whether it came true or false. It’s low effort and I think worth doing, but I can’t confidently say that it has improved my calibration.
This does not answer the question, but it seems plausible to me that the leftist-centrist axis only has a very small impact on who is likely to win, which would be consistent with PredictIt’s estimates.
6.7 Systems composed of rational agents need not maximize a utility function There is no canonical way to aggregate utilities over agents, and game theory shows that interacting sets of rational agents need not achieve even Pareto optimality.
Is [underlined] true? I know it’s true if you have agents following CDT, but does it still hold if agents follow FDT? (I think if you say ‘rational’ it should not mean ‘CDT’ since CDT is strictly worse than FDT).
(v,w) is defined just for one particular graph. It’s the first edge in that graph such that f(v)<0<f(w). (So it could have been called (vn,wn)). Then for the next graph, it’s a different v. Basically, x1 looks at where the first graph skips over the zero mark, then picks the last vertex before that point, then x2 looks at the next larger graph, and if that graph skips later, it updates to the last vertex before that point in that graph, etc. I think the reason I didn’t add indices to (v,w) was just that there ar ealready the v with two indices, but I see how it can be confusing since having no index makes it sound like it’s the same value all throughout.