Gratitude thread.
What a load of work, Ingres. Thank you for doing this.
Gratitude thread.
What a load of work, Ingres. Thank you for doing this.
Today in Hacker News there’s a research article speaking exactly of this.
https://news.ycombinator.com/item?id=11909111
Makes me think that a possible method to mitigate spam would be to answer each email with a LSTM-generated blob of text, so the attackers are swarmed with false positives and cannot continue the attack. Of course, this would have to be implemented by the email provider.
This is what I thought. But ChristianKl is right: it doesn’t need to. From the first false positive you’re already doing damage with almost no cost to you. Sure your address will start to receive more spam, but it will be filtered like the spam you already have is.
But having it in the ISP, or as a really popular extension, would deal a big blow to spam.
I don’t think we are that far away from AGI.
At the very least 20 years. And yes Alphabet are the closest, but in 20 years a lot of things can change.
Almost 5 years now.
Is it possible to enter the contest as a group? Meaning, can the article written for the contest have several coauthors?
Would it be possible to just apply model-based planning and show the treacherous turn on the first time?
Model-based planning is also AI, and we clearly have an available model of this environment.
Typo in pg. 31 of the ceremony guide: “sir ead” → “is read”.
Upvoting to people see that the project failed earlier, and don’t have to spend a couple hours reading the main article given how this turned out.
I usually think that logic-based reasoning systems are the canonical example of of an AI without goal-directed behaviour. They just try to prove or disprove a statement, given a database of atoms and relationships. (Usually they’re restricted to statements that are decidable by construction so that is always possible).
You can also frame their behaviour as a utility function: U(time, state) = 1 if you have correctly decided the statement at t ≤ time, 0 otherwise. But your statement that
>It seems possible to build systems in such a way that these properties are inherent in the way that they reason, such that it’s not even coherent to ask what happens if we “get the utility function slightly wrong”.
very much applies. I’m fairly sure you can specify the behaviour of _anything_, including “dumb” things like trousers, screwdrivers, rocks and saucepans, as an utility function + perfect optimization, even though for most things this is a very unhelpful way of thinking. Or at least human artifacts. E.g. a screwdriver optimizes “transmit the rotational force that is applied to you”, a rock optimizes “keep these molecules bound and respond to forces according to the laws of physics”.
Yup. I actually made this argument two posts ago.
Ah, that’s good. I should probably read the rest of the sequence too.
Though it’s not clear how you’d use a logic-based reasoning system to act in the world
The easy way to use them would be as they are intended: oracles that will answer questions about factual statements. Humans would still do the questioning and implementing here. It’s unclear how exactly you’d ask really complicated, natural-language-based questions (obviously, otherwise we’d have solved AI), but I think it serves as an example of the paradigm.
Yes, though I’m fairly sure he’s talking about using trained neural networks to e.g. classify an image, which is known to be fairly cheap, rather than training them. In other words, he’s talking about using an AI service rather than creating one.
He also says that “Machine learning and human learning differ in their relationship to costs” which is also evidence for my interpretation: training is expensive, testing on one example is very cheap.
This idea also excludes the robotic direction in AI development, which will anyway produce agential AIs.
Recursive self-improvement that makes the intelligence “super” quickly is what makes the misaligned utility actually dangerous, as opposed to dangerous like a, say, current day automatized assembly line.
A robot that self-improves would need to have the capacity to control its actuators and also to self-improve. Since none of these capabilities directly depends on the other, each time one of them improves, the improvement is much more likely to be first demonstrated independently of an improvement in the other one.
Thus we’re likely to already have some experience with self-improving AI, or the recursively improved AI to help us, when we get to dealing with people wanting to build self-improving robots. Even though with advanced AI in hand to help we should maybe still start early on that, it seems more important to get the not-necessarily-and-also-probably-not-robotic AI right.
Great post, thank you for writing it!
By taking squares we are more forgiving when the model gets the answer almost right but much less forgiving when the model is way off
It’s really unclear why this would be better. I was going to ask for a clarification but I found something better instead.
The first justification for taking the least-squares estimator that came to mind is that it’s the maximum likelihood solution if you assume your errors E are independent and Gaussian. Under those conditions, the probability density of the data is . We take the product because the errors are independent for each point. If you take the logarithm (which is a transformation that preserves the maximum), this comes out to be plus a constant that doesn’t depend on g.
It turns out there’s a stronger justification, which doesn’t require assuming the errors are Gaussian. The Gauss-Markov Theorem shows that the best unbiased estimator of the coefficients of g that is linear in the powers of the data, i.e. . That it’s unbiased means that, for several independent samples of data, the mean of the least-squares estimator will be the true coefficients of the polynomial you used to generate the data. The estimator is the best because it has the lowest variance under the above sampling scheme. If you assume you have a lot of data, by the central limit theorem, this is similar to lowest squared error in coefficient-space.
However, if you’re willing to drop the requirement that the estimator is unbiased, you can get even better error on average. The James-Stein estimator has less variance than least-squares, at the cost of biasing your estimates of the coefficients towards zero (or some other point). So, even within a certain maximum allowed degree of the polynomial, it is helpful to penalise complexity of the coefficients themselves.
On a related note, for some yet unknown reason, neural networks that have way more parameters than is necessary in a single layer seem to generalize better than networks with few parameters. See for example Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks. It seems to be related to the fact that we’re not actually finding the optimal configuration of parameters with gradient descent, but only approximating it. So the standard statistical analysis that you outlined here doesn’t apply in that case.
What about short technical reports / forum posts like the original “All Mathematicians are Trollable: Divergence of Naturalistic Logical Updates”? It’s short, doesn’t have any references, doesn’t hold your hand with much background like academic articles do. On the other hand it contains more detailed information than “An Untrollable Mathematician Illustrated”.
Is this a clarification post because it’s the first example of an idea?
I’ve spotted a mistake, I think: the relationship inside the first prover should be an if and only if relationship. This is because if says everything is a theorem, then trivially holds. Thus, the condition to give “proof privileges” should be . There might still be some problems from the modal logic, I’ll check when I have some more time.
I can’t believe that I was the first one to spot this. The post also has very few upvotes. Did nobody that knows this stuff see this and spot the mistake immediately? Was the post dismissed to start with? (in my opinion unfairly, but perhaps not).
After reading the title, my main objections to voting theory were:
The theory is already understood well enough, what is hard is convincing existing institutions to change.
Convincing existing institutions to change is really hard, so there’s not much point in advancing the theory
Though I do agree that public elections are a process by which huge amount of resources get allocated, in many of the world’s richest countries, so it’s an important problem.
You make some arguments against these two:
The debate among reform activists between various voting methods (IRV, approval, Condorcet, score, STAR, etc.; as well as the understanding of proportional representation) has progressed substantially (emphasis mine) in the 20 years that I’ve been a part of it, and I think it can progress further.
Can you point to an example of this? Something the understanding of which has improved in the last 20 years. Also, are activists and the academia one and the same? Is this improved understanding because new theory was developed, or because the activists started understanding theory that had already been developed in the academia.
Voting reform has happened before in various contexts, and it should be expected that sooner or later and somewhere or other it will happen again. Will it happen in the particular ways and places I’d like it to? There’s no way to be sure either way, but I’d say that the probability is certainly over 1%,
The key question is not whether it will happen somewhere with a probability of at least 1%, but rather whether you or the people you inspire can move the needle with probability 1 in a millionth (or less), in a place that’s sufficiently big (or even bigger). Are you thinking of influencing the US Gov in particular? That would certainly qualify for a big entity. What are the wins that the voting reform movement has done in the past few years? ( I suppose the #1 USA example is Fargo, in North Dakota, that passed approval voting )
(There’s probably a lot of people unhappy with voting in some way, so if you can convince them that your proposal is going to make their group more powerful, maybe it’s not so hard).
Frequentist statistics were invented in a (failed) attempt to keep subjectivity out of science in a time before humanity really understood the laws of probability theory
I’m a Bayesian, but do you have a source for this claim? It was my understanding that Frequentism was mostly promoted by Ron Fisher in the 20th century, well after the work of Bayes.
Synthesised from Wikipedia:
While the first cited frequentist work (the weak law of large numbers, 1713, Jacob Bernoulli, Frequentist probability) predates Bayes’ work (edited by Price in 1763, Bayes’ Theorem), it’s not by much. Further, according to the article on “Frequentist Probability”, “[Bernoulli] is also credited with some appreciation for subjective probability (prior to and without Bayes theorem).”
The ones that pushed frequentism in order to achieve objectivity were Fisher, Neyman and Pearson. From “Frequentist probability”: “All valued objectivity, so the best interpretation of probability available to them was frequentist”. Fisher did other nasty things, such as using the fact that causality is really hard to soundly establish to argue that tobacco was not proven to cause cancer. But nothing indicates that this was done out of not understanding the laws of probability theory.
AI scientists use the Bayesian interpretation
Sometimes yes, sometimes not. Even Bayesian AI scientists use frequentist statistics pretty often.
This post makes it sound like frequentism is useless and that is not true. The concepts of: a stochastic estimator for a quantity, and looking at whether it is biased, and its variance; were developed by frequentists to look at real world data. AI scientists use it to analyse algorithms like gradient descent, or approximate Bayesian inference schemes, but the tools are definitely useful.
I clicked this because it seemed interesting, but reading the Q&A:
In atypical game we consider, one player offers bets, another decides how to bet, and a third decides the outcome of the bet. We often call the first player Forecaster, the second Skeptic, and the third Reality.
How is this any different from the classical Dutch Book argument, that unless you maintain beliefs as probabilities you will inevitably lose money?
I would probably use better spelling in the messages. It reduces credibility of the scammer.