this should show up as a completely dark sphere in the universe
Which, notably, we do see (https://en.m.wikipedia.org/wiki/Boötes_void). Though they don’t conflict with our models of how the universe would end up naturally.
this should show up as a completely dark sphere in the universe
Which, notably, we do see (https://en.m.wikipedia.org/wiki/Boötes_void). Though they don’t conflict with our models of how the universe would end up naturally.
This probably isn’t the case, but I secretly wonder if the people in camp #1 are p-zombies.
Is anyone worried about AI one-shotting comprehensive nano-technology? It can make as many tries as it wants, and in fact, we’ll be giving it as many tries as we can.
100% this. Some optimists make money, some get scammed.
Is there a reason to believe this is likely? Outside of a strong optimization pressure for niceness (of which there is definitely some, but relative to other optimization pressures it’s relatively weak) I’d expect these organizations to be of roughly average possible niceness for their situation.
I’m curious about your reasons for making your monitors greyscale. What are the benefits of that for you?
While it’s important to bear in mind the possibility that you’re not as below average as you think, I don’t know your case so I will assume you’re correct in your assessment.
Perhaps give up on online dating. “Offline” dating is significantly more forgiving than online.
I think you’re on to something with the “good lies” vs “bad lies” part, but I’m not so sure about your assertion that ChatGPT only looks at how closely the surface level words in the prompt match the subject of interest.
“LLMs are just token prediction engines” is a common, but overly reductionist viewpoint. They commonly reason on levels above basic token matching, and I don’t see much evidence that that’s what’s causing the issue here.
Please don’t do this, this is dangerous.
How much Test E did you take? 200mg/ml, but how many ml?
Usually, one dose of testosterone isn’t enough for a noticeable difference in mental state, and by the time it is enough you’ll need a plan for managing mental side effects from your increased estrogen.
I’m usually a pretty big fan of bioengineering, self-experimentation, etc. but this strikes me as particularly reckless.
I think the long gap between GPT-3 and GPT-4 can be explained by Chinchilla. That was the point where OpenAI realized their models were undertrained for their size, and switched focus from scaling to fine-tuning for a couple of years. InstructGPT, Codex, text-davinci-003, and GPT-3.5 were all released in this period.
I think this touches on the issue of the definition of “truth”. A society designates something to be “true” when the majority of people in that society believe something to be true.
Using the techniques outlined in this paper, we could regulate AIs so that they only tell us things we define as “true”. At the same time, a 16th century society using these same techniques would end up with an AI that tells them to use leeches to cure their fevers.
What is actually being regulated isn’t “truthfulness”, but “accepted by the majority-ness”.
This works well for things we’re very confident about (mathematical truths, basic observations), but begins to fall apart once we reach even slightly controversial topics. This is exasperated by the fact that even seemingly simple issues are often actually quite controversial (astrology, flat earth, etc.).
This is where the “multiple regulatory bodies” part comes in. If we have a regulatory body that says “X, Y, and Z are true” and the AI passes their test, you know the AI will give you answers in line with that regulatory body’s beliefs.
There could be regulatory bodies covering the whole spectrum of human beliefs, giving you a precise measure of where any particular AI falls within that spectrum.
I think #1 is the most important here. I’m not a professional economist, so someone please correct me if I’m wrong.
My understanding is that TFP is calculated based on nominal GDP, rather than real GDP, meaning the same products and services getting cheaper doesn’t affect the growth statistic. Furthermore, although the formulation in the TFP paper has a term for “labor quality”, in practice that’s ignored because it’s very difficult to calculate, making the actual calculation roughly (GDP / hours worked). All this means that it’s pretty unsuitable as a measure of how well a technology like the Internet (or AI) improves productivity.
TFP (utilization adjusted even more so) is very useful for measuring impacts of policies, shifts in average working hours, etc. But the main thing it tells us about technology is “technology hasn’t reduced average working hours”. If you use real GDP instead, you’ll see that exponential growth continues as expected.
Reporting back two weeks later: my phone usage is down about 25%, but that’s within my usual variance. If there’s an effect, it’s small enough to not be immediately obvious, and would need some more data to get anything resembling a low p-value.
Anecdotally, though, I’m quite liking having my phone on “almost-greyscale” (chromatic reading mode on my OnePlus phone). When I have to turn it off, the colours feel overwhelming. It also feels like it encourages me to focus on the real world, rather than staring at my phone in a public place.
If EfficientZero-9000 is using 10,000 times the energy of John von Neumann, and thinks 1,000 times faster, it’s actually actually 10 times less energy efficient.
The point of this post is that there is some small amount of evidence that you can’t make a computer think significantly faster, or better, than a brain without potentially critical trade offs.
I wonder if this makes any testable predictions. It seems to be a plausible explanation for how some people are extremely good at some reflexive mental actions, but not the only one. It’s also plausible that some people are “wired” that way from birth, or that a single or small number of developmental events lead to them being that way (rather than years of involuntary practice).
I suppose if the hypothesis laid out in this post is true, we’d expect people to exhibit get significantly better at some of these “cup-stacking” skills within a few years of being in an environment that builds them. Perhaps it could be tested by seeing if people get significantly better at the “soft skills” required to succeed in an office after a few years working in one.
A quick Google search of probe tuning doesn’t turn up anything. Do you have more info on it?
Probe-tuning doesn’t train on LLM’s own “original rollouts” at all, only on LLM’s activations during the context pass through the LLM.
This sounds like regular fine tuning to me. Unless you mean that the loss is calculated based on one (multiple?) of the network’s activations rather than on the output logits.
Edit: I think I get what you mean now. You want to hook a probe to a model and fine-tune it to perform well as a probe classifier, right?
Not very familiar with US culture here: is AI safety not extremely blue-tribe coded right now?
This highlights an interesting case where pure Bayesian reasoning fails. While the chance of it occurring randomly is very low (but may rise when you consider how many chances it has to occur), it is trivial to construct. Furthermore, it potentially applies in any case where we have two possibilities, one of which continually becomes more probable while the other shrinks, but persistently doesn’t become disappear.
Suppose you are a police detective investigating a murder. There are two suspects: A and B. A doesn’t have an alibi, while B has a strong one (time stamped receipts from a shop on the other side of town). A belonging of A’s was found at the crime scene (which he claims was stolen). A has a motive: he had a grudge against the victim, while B was only an acquaintance.
A naive Bayesian (in both senses) would, with each observation, assign higher and higher probabilities to A being the culprit. In the end, though, it turns out that B commited the crime to frame A. He chose someone B had a grudge against, planted the belonging of A’s, and forged the receipts.
It’s worth noting that, assuming your priors are accurate, given enough evidence you *will* converge on the correct probabilities. Actually acquiring that much evidence in practice isn’t anywhere near guaranteed, however.
Some people have strong negative priors toward AI in general.
When the GPT-3 API first came out, I built a little chatbot program to show my friends/family. Two people (out of maybe 15) flat out refused to put in a message because they just didn’t like the idea of talking to an AI.
I think it’s more of an instinctual reaction than something thought through. There’s probably a deeper psychological explanation, but I don’t want to speculate.
It seems implied that the chance of a drought here is 50%. If there is a 50% chance of basically any major disaster in the foreseeable future, the correct action is “Prepare Now!”.