I’m a different person but I have a similar experience, I don’t use the “Your Feed > For You” section because it shows me subjectively random posts, though perhaps I could tune it to work better. Previously there was a “Top Comments” section on the front page which I personally found more useful.
cubefox
Another one: Nathan Young, Dear AGI,
Mine is that it’s indeed simply saying the government doesn’t want to work their products, even indirectly.
The first is very different from the second. “Indirectly” is an extremely wide concept, and very few companies are designated supply chain risks. Usually when deciding to buy or not buy some product, you don’t care at all which tools the company uses, or the company from which the first company buys tools, etc.
I assume there is a reason you don’t want New Foundations?
I thought usually ∈ is assumed to be acyclic, then one could identify a set with its ∈ DAG. But if you have a universal set it probably must be an element of itself, so we don’t even have a DAG. (I recently learned that the classical definition of “acyclic” only rules out finite cycles, so even a DAG allows for infinite cycles? Strange.)
So identify a set with its directed ∈ graph? Not sure whether this makes sense.
More to the point, this SEP article might be interesting.
This is cool research. But I want to emphasize that the usage of AUROC for the evaluation of any binary classifier is generally questionable because a high AUROC value only indicates 1) a large true positive rate and 2) a large true negative rate (= low false positive rate).
But a high value of an appropriate statistic for a binary predictor should maximize all four of these values:
P(actually true | predicted true) (= true positive rate, recall, sensitivity)
P(actually false | predicted false) (= true negative rate, specificity, converse of false positive rate)
P(predicted true | actually true) (= positive predictive value, precision)
P(predicted false | actually false) (= negative predictive value)
A high value of the AUROC only means 1 and 2 are large, while 3 and 4 could be very small. In which case we would clearly have a bad classifier despite a high AUROC value. For details see this paper.
There is another statistic which can be used instead of the AUROC while avoiding this problem: the phi coefficient / “MCC”. This is simply the binary version of the standard Pearson correlation.
A high value of the MCC (close to +1) indicates that all four of the above probabilities are large, and a high negative value (close to −1) that all four are small. If the predictor and the measured variable are independent (the classifier guesses randomly) the value of the MCC is 0.
The linked paper above goes so far as to say
In this short study, we explain why the Matthews correlation coefficient should replace the ROC AUC as standard statistic in all the scientific studies involving a binary classification, in all scientific fields.
I consider it coherent to have qualia and be unaware of the fact, insofar as we’re taking qualia to be a good concept.
This would only make sense if they misunderstand the word “qualia”. Which seems quite unlikely for Carl Zimmerman who said he studied philosophy at MIT.
There is a difference between solving intent alignment for instruction following, and full value alignment. The latter would plausiblybbe guaranteed to be “safe”, or even more optimal than merely safe. (Utopia etc.)
Related: The Zombie Preacher of Somerset
So AI papers are currently good enough that they can’t be trivially distinguished from human papers, making Pangram necessary, but not yet good enough to produce AI research that is at least on a human level. From the outside this looks like a sign that RSI fairly close now.
Tangentially, it’s somewhat interesting that Pangram is a twist on Turing’s original test: In the original, it was a human who had to distinguish between a human and an AI based on text, now it is an AI that distinguishes between both, since AIs are apparently better now than humans in distinguishing between humans and AIs. So Pangram is a CAPTCHA, but conventional captchas weren’t better than humans at distinguishing between AIs and humans.
I don’t think this is plausible either, but even if it’s true: The data center operators could just not build any new electric power. At some point the price for electricity could increase, but as was pointed out, energy is currently only 7% of data center cost.
This is a great post. Though this doesn’t seem convincing:
Solar is very cheap; falling battery costs are fixing the intermittency issue.
All existing energy technologies are more expensive. There’s little prospect for them to catch up with solar today, much less the solar a decade from now.
There is a big difference between something being cheap and estimating something to be cheap at some point in the future. Currently Germany (with a relatively large amount of solar) has much higher electricity cost than France (mostly nuclear) and many other countries. This suggests that solar is not very cheap, even if it will be in the future.
(Maybe Germany isn’t sunny enough? Well, solar being cheap in certain regions doesn’t generally mean that it “is very cheap”.)
The current rules don’t prevent data centers (prevention in some areas is different from overall prevention) and it seems highly improbable that they will prevent them in the future.
Related to coherence: humans usually have an “agenda” when writing an article, they want to expess something, some intention or world-view, while LLMs, prompted to write an article, don’t have an agenda. They only want to satisfy the prompt. They are usually happy to write some bullshit if that fulfills the prompt. When reading something, we automatically (and perhaps unconsciously) try to infer the agenda behind the text, but in the case of bullshit text there is no agenda. There is nobody who would defend the text when challenged, there is nobody who is intellectually responsible.
hot response: analogical inference does clearly provide some evidence. For example: I am conscious, you are otherwise pretty similar to me, therefore you are probably also conscious.
Yeah. Even more generally, it’s hard to notice your own mistakes because otherwise you wouldn’t have made them in the first place.
There is also the related but probably much larger group “has something to say but is bad at English”, which indirectly also results in bad writing: broken English. Many people claim they don’t mind broken English, but I think that’s an illusion. Any essay that’s unpleasant to read, no matter the reason, will be perceived as substantially worse on average.
Though automatic text translation to English was already pretty decent a few years before LLMs, so that group would survive an LLM ban. The problem is more that people who are bad at English often still insist on writing directly in English. I regularly skim Chinese AI papers with lots of bad English and wonder why these clearly intelligent people don’t just write in Chinese, let an LLM (or even Google Translate) do the translation, and proofread the result.
Some form of “prior uncertainty” can be described with the Beta distribution. If we have a coin (or any other binary event) there is a difference between having observed 0 outcomes (0 heads and 0 tails), having observed one heads and one tails, and having observed 50 heads and 50 tails. All of them can be quantified as evidence for “50 probability for heads”, but the degree of reliability of this estimate varies widely between these cases. The Beta distribution is a subjective probability distribution over possible but unknown objective probability values of an event. If you are very sure the objective probability of your event is around 50%, the Beta distribution has a narrower peak than if you are more uncertain about your probability.
Of course, this doesn’t cover the more general types of prior uncertainty (e.g. rational vs irrational) you are discussing here, but at least it is something.