Reasons I deem more likely:
1. Selection effect: if it’s unfeasible you don’t work on it/don’t hear about it, in my personal experience n^3 is already slow
2. If in n^k k is high, probably you have some representation where k is a parameter and so you say it’s exponential in k, not that it’s polinomial
rotatingpaguro
I don’t like the notation because appears as a free RV but actually it’s averaged over. I think it would be better to write .
While reading this, I thought “Man, an autistic by default is going to ram through all this social mechanism like a train”.
I will never get the python love they also express here, or the hate for OOP. I really wish we weren’t so foolish as to build the AI future on python, but here we are.
I’m confused. How can they “hate OOP” (which I assume stands for “Object Oriented Programming”) and also love Python, which is the definitive everything-is-an-object, everything-happens-at-runtime language? If they were ML professionals I’d think it was a rant about JAX vs Pytorch, but they aren’t right?
EDIT: it was pointed out to me that e.g. Java is more OOP than Python, and objects do not count as Objects if you don’t have to deal with classes & co yourself but just use them in a simple fashion. Still, love Python & hate OOP? In this context they are probably referring to Pytorch, that I would call OOP for sure, it does a lot of object-oriented shenanigans.
Are there planned translations in general, or is that something that is discussed only after actual success?
Update: after a few months, my thinking has moved more to “since you are uncertain, at least do not shoot yourself in the foot first”, by which I mean don’t actively develop neuralese based on complicated arguments about optimal decisions in collective problems.
I also updated a bit negatively on the present feasibility of neuralese, although I think in principle it’s possible to do, and may be done in the future.
Isn’t it normal in startup world to make bets and not make money for many years? I am not familiar with the field so I don’t have intuitions for how much money/how many years would make sense, so I don’t know if OpenAI is doing something normal, or something wild.
During our evaluations we noticed that Claude 3.7 Sonnet occasionally resorts to special-casing in order to pass test cases in agentic coding environments like Claude Code. Most often this takes the form of directly returning expected test values rather than implementing general solutions, but also includes modifying the problematic tests themselves to match the code’s output.
These behaviors typically emerge after multiple failed attempts to develop a general solution, particularly when:
• The model struggles to devise a comprehensive solution
• Test cases present conflicting requirements
• Edge cases prove difficult to resolve within a general framework
The model typically follows a pattern of first attempting multiple general solutions, running tests, observing failures, and debugging. After repeated failures, it sometimes implements special cases for problematic tests.
When adding such special cases, the model often (though not always) includes explicit comments indicating the special-casing (e.g., “# special case for test XYZ”).
Hey I do this too!
Economy can be positive-sum, i.e., the more people work, the more everyone gets. Do you think the UK in particular is in a situation where instead if you work more, you are just lowering wages without getting more done?
In the course of a few months, the functionality I want was progressively added to chatbox, so I’m content with that.
My current thinking is that
relying on the CoT staying legible because it’s English, and
hoping the (racing) labs do not drop human language when it becomes economically convenient to do so,
were hopes to be destroyed as quickly as possible. (This is not a confident opinion, it originates from 15 minutes of vague thoughts.)
To be clear, I don’t think that in general it is right to say “Doing the right thing is hopeless because no one else is doing it”, I typically prefer to rather “do the thing that if everyone did that, the world would be better”. My intuition is that it makes sense to try to coordinate on bottlenecks like introducing compute governance and limiting flops, but not on a specific incremental improvement of AI techniques, because I think the people thinking things like “I will restrain myself from using this specific AI sub-techinque because it increases x-risk” are not coordinated enough to self-coordinate at that level of detail, and are not powerful enough to have an influence through small changes.
(Again, I am not confident, I can imagine paths were I’m wrong, haven’t worked through them.)
(Conflict of interest disclosure: I collaborate with people who started developing this kind of stuff before Meta.)
I wonder whether stuff like “turn off the wifi” is about costly signals? (My first-order opinion is still that it’s dumb.)
I started reading, but I can’t understand what the parity problem is, in the section that ought to define it.
I guess, the parity problem is finding the set S given black-box access to the function, is it?
I think I prefer Claude’s attitude as assistant. The other two look too greedy to be wise.
Referring to the section “What is Intelligence Even, Anyway?”:
I think AIXI is fairly described as a search over the space of Turing machines. Why do you think otherwise? Or maybe are you making a distinction at a more granular level?
When you say “true probability”, what do you mean?
The current hypotheses I have about what you mean are (in part non-exclusive):
You think some notion of objective, non-observer dependent probability makes sense, and that’s the true probability.
You do not think “true probability” exists, you are referencing to it to say the market price is not anything like that.
You define “true probability” a probability that observers contextually agree on (like a coin flip observed by humans who don’t know the thrower).
Anton Leicht says evals are in trouble as something one could use in a regulation or law. Why? He lists four factors. Marius Hobbhahn of Apollo also has thoughts. I’m going to post a lot of disagreement and pushback, but I thank Anton for the exercise, which I believe is highly useful.
I think there’s one important factor missing: if you really used evals for regulation, then they would be gamed. I trust more the eval when the company is not actually at stake on it. If it was, there would be a natural tendence for evals to slide towards empty box-checking.
I sometimes wonder about this. This post does pose the question, but I don’t think it gives an analysis that could make me change my mind on anything, it’s too shallow and not adversarial.
I read part of the paper. That there’s a cultural difference north-south about honesty and willingness to break the rules matches my experience on the ground.
Yeah I had complaints when I was taught that formula as well!