I also write at https://splittinginfinity.substack.com/
harsimony
- These are good points. I’m uncertain about what models will form the foundation of RLaaS. But I think your point about where the task-specific data teams are working is more important. Off the top of my head, I think there’s 3 bins: - For a lot of programming tasks, big AI companies already have lots of expertise and users in-house, so I expect them to dominate production of code generation. 
- For some tasks like writing marketing copy, LLM’s are already good enough at this. There’s no business training models further here. 
- Most interesting are tasks that require lots of tacit knowledge or iteration. For example, getting to self-driving cars required a decade plus of iterating on algorithms and data. I imagine lots of corporations will privately put a bunch of effort into making AI work on their specific problems. Physical tasks in specialized trades are another example. 
 - For tasks in #3, the question is whether to join up with the big AI companies, or develop your own solution to the problem and keep it private. 
RL-as-a-Service will outcompete AGI companies (and that’s good)
- You may be interested in this series, especially the post on “three prong bundle theory”: https://www.greaterwrong.com/s/EA2uNqKjmu2NzFhRx - One good framing is to consider the rights digital minds need in order to participate in a market economy. They need property rights, freedom of speech, freedom of association, and so on. By being able to participate in market exchange, digital minds may prefer to be part of society rather than fight against it. Comparative advantage is a particularly good reason to cooperate with others. - Market rights: https://splittinginfinity.substack.com/p/markets-dont-work-without-individual - My comment on personhood and the value of being punish-able: https://www.lesswrong.com/posts/4m2MTPass3Ri2zZ43/legal-personhood-three-prong-bundle-theory#bpavKPBwJbCJtv8QA 
The Data Scaling Hypothesis
- Good points on an important topic. Thank you for this series. - One thing I’d like to point out is that receiving this form of personhood is highly valuable. In other words, being punishable makes you more trustworthy and safer to engage with. So AI’s and digital minds might voluntarily construct methods by which they can be punished for wrongdoing. - This right-to-be-sued is an important legal right. I covered some discussion on twitter about this in point 3 here: https://splittinginfinity.substack.com/p/links-15 - The key quote from Issac King here: - If the laws are enforced on everyone equally, then people can safely interact with each other, knowing that they have recourse if they are wronged. But when one particular group of people is exempt from the law, the safest thing for everyone else to do is to avoid having any contact with that group, because they are now uniquely threatening. - The people being stolen from are not the only victims of the decriminalization of theft. The victims that nobody sees are all of the unlucky but perfectly trustworthy people who are now pariahs because society has decided to remove their ability to enter into binding agreements. To remove the social safety net that allows everyone else to feel safe around them. 
- Nicely written proposal, thank you. - In truth, I’m quite concerned with such a proposal being implemented. Part of this is, as you mentioned, the risk of lock in. In particular, a global entity with a monopoly on violence is alarmingly close to global authoritarianism. This is an existential risk all on its own. - Such an entity would have to: - Limit access to space (to avoid self-sustaining expansion faster than everyone else). 
- Limit/monitor the uses of AI and AI training. 
- Possess a credible monopoly on violence. 
- Surveil enough to prevent surprise attacks and overthrow attempts. 
 - Considering the risk that such a system is corrupted or misaligned or permanent, I feel better about a future that emphasized freedom and acceleration of defensive technologies. - (I could be convinced that the “multiple night watchmen overseeing each other” is viable. Rather than oversee each other, it might be better to give them completely separate jurisdictions. Federalism and free movement allows people to choose night watchmen that suit their needs. Risk of war between jurisdictions is low since they both have watchmen. Some watchmen may allow AI’s to develop and entities to leak into space, but this laxity a feature to avoid global totalitarianism.) 
- I feel like people are dismissing this study out of hand without updating appropriately. If there’s at least a chance that this result replicates, that should shift our opinions somewhat. - First, a few reasons why the common counterarguments aren’t strong enough to dismiss the study: - I’ve been seeing arguments against this result based on vibes or claims that the next generation of LLM’s will overturn this result. But that is directly contradicted by the results of this study, people’s feelings are poor indicators of actual productivity. 
- On Cursor experience, I think Joel Becker had a reasonable response here. Essentially, many of the coders had tried cursor, had some experience with it, and had a lot of experience using LLM’s for programming. Is the learning curve really so steep that we shouldn’t see them improve over the many tasks? See image below. Perhaps the fact that these programmers don’t use it and saw little improvement is a sign that Cursor isn’t very helpful. 
- While this is a challenging environment for LLM coding tools, this is the sort of environment I want to see improvement in for AI to have a transformative impact on coding. Accelerating experienced devs is where a lot of the value of automating coding will come from. 
 - That aside, how should we change our opinions with regard to the study? - Getting AI to be useful in a particular domain is tricky, you have to actually run tests and establish good practices. 
- Anecdotes about needing discipline to stay on task with coding tools and the cursor learning curve suggest that AI adoption has frictions and requires tacit knowledge to use. 
- Coding is one of the cleanest, most data-rich, most LLM-developer-supported domains. As of yet, AI automation is not a slam dunk, even here. Every other domain will require its own iteration, testing, and practice to see a benefit. 
- If this holds, the points above slow AI diffusion, particularly when used as a tool for humans. Modelling the impact of current and near-future AI’s should take this into account. 
 
- Yeah I think sleep probably serves other roles, I just don’t see why those roles require 7 hours of sleep rather than say 5 hours. - I do agree that basic research is what will actually get sleep need reduction therapies to work at scale. I’m hoping that citizen science and discussion of the topic will encourage more work on this. 
- Oh I see. So the hypothesis is “In a healthy animal, stress is a highly-informative signal that inhibits risk-taking. Sleep ensures the stress system continues to inhibit risk taking appropriately.” - Makes sense. It’s consistent with sleep deprivation raising the level of cortisol and the brain developing a tolerance to high levels of certain hormones. 
- Oh I think sleep probably plays other roles today! But I don’t think those roles require exactly 7 hours of sleep. - And agreed, need to look at long term effects of sleep need reduction too. My vision is more that people have 3-4 nights of 1-2 hours less sleep and then take a break for 3 nights rather than taking a drug to stop sleeping entirely. 
- We are fundraising for a self-experiment soon! - I think there’s a substantial chance that orexin agonists are “just stimulants” and you can’t reduce sleep need much with them. But short sleepers prove it’s biologically possible and I want to encourage people to start working on this. 
- The point about children is a good one, I have to think on it more. But it seems consistent with children needing more calories to grow (and they are too young to gather their own calories), so they rest more. - It might be something more complex like: (1) Animals that aren’t careful tend to take a lot of risks that result in them dying. (2) There’s a process that builds up stress that’s about taking less risks. (3) Sleep exists to process that build-up stress and resolve it. - By this do you mean: “when stress builds up, animals take more risks. Risk-taking animals die. Sleep relieves stress, thus enabling less risky behavior?” - It’s a reasonable hypothesis, and I’m open to it happening to some degree. But I think all the arguments against “neurons need rest” apply here just as well. Aren’t there brain regions more or less exposed to stress? Don’t some neurons experience constant (metabolic) stress simply by being active all the time? Are animals with more stress (prey) sleeping more? Do more stressed people adapt by sleeping more? - Why? Orexin-A is perfectly capable of crossing the blood-brain barrier. If you create a gene therapy to produce more of it, it doesn’t need to be produced in the brain. - I should have provided more context, the proposals of Minjune Song and Issak Freeman center on short sleep mutations found in the literature. These are typically mutations of receptors found on neurons in the brain. - Overproducing orexin-A is fine, but a gene therapy for a hormone that can cross the BBB is overkill when you can supply it exogenously right? - By the way, Orexin-A supplementation is the subject of the proposal I mentioned at the end. Should come out today. - Why do you believe that? For Orexin-A? You can buy it these days from a neurotropic store if you want as a nasal spray. - I think you’re mostly asking about production costs right? Yes, you can buy orexin peptides and custom RNA, but the cost-per-unit-effect is orders of magnitude higher than small molecules. It’s hard to beat the ~$100/kg that you can get with generic drug manufacturing. 
- Good to have a number for this. Though I think a better counterfactual is between sleeping and actively foraging. Foraging + thermoregulation costs even more calories. - But let’s say for the sake of argument that being awake + foraging takes 20% more calories compared to sleeping. Would sleeping actually get selected for? I think so. Evolution can make pretty fine distinctions given enough generations. - For example, cavefish (who live in an environment without light) quickly evolve less pigmentation and underdeveloped eyes to save energy. This is a convergent trait, it’s been observed in several different species. Though I’m not sure what kind of energy penalty eyes and pigment have. 
- I’m not as familiar with insomnia treatments, but orexin antagonists seem to be an improvement over existing meds. Probably the biggest improvement is the lower risk of abuse and tolerance compared to other medications. Belsomra has been around for over 10 years and seems to be well tolerated and effective. Though it doesn’t work for everyone. - The argument that orexin antagonists could help people sleep more without making you sleepy during the day makes sense to me, with one caveat. If the half-life is long enough, the antagonist could block the orexin signal your body normally produces in the morning, making you feel sleepier than you otherwise would have. But perhaps this would be outweighed by the alertness one gets from a good nights sleep. - As for side effects (beyond daytime sleepiness and things we already know to watch for) I guess I would look for changes in motivation. Orexin has a (tenuous) link to reward seeking, addiction, and motivation. If you feel less energy and motivation during the day that might be something to look into. Indeed the point of a good nights sleep is to feel good and energetic the following day, so it’s good to check if an insomnia treatment actually delivers on that. - You may also want to consider adding psychological approaches like CBT-i or paradoxical intention. It seems like some insomnia features a self-reinforcing loop of bad sleep leading to frustration which begets more bad sleep. - (ofc, none of this is medical advice) 
Sleep need reduction therapies
- Wonderful to get more numbers on this! - These examples seem to contradict note 2 where D/N falls for larger C. Now I’m not sure what the trend should be. - It feels like you could derive a rule of thumb based on the loss and the entropy of the dataset e.g. “If my model starts at a loss of 4 bits/token and the asymptote is 2 bits/token, I need X tokens of data to fully specify a model with Y bits stored in the parameters.” 
On AI Scaling
- Oh that makes sense! - If the predictors can influence the world in addition to making a prediction, they would also have an incentive to change the world in ways that make their predictions more accurate than their opponents right? For example, if everyone else thinks Bob is going to win the presidency, one of the predictors can bribe Bob to drop out and then bet on Alice winning the presidency. - Is there work on this? To be fair, it seems like every AI safety proposal has to deal with something like this. 
- This is super cool stuff, thank you for posting! - I may have missed this, but do these scoring rules prevent agents from trying to make the environment more un-predictable? In other words, if you’re competing against other predictors, it may make sense to influence the world to be more random and harder to understand. - I think this prediction market type issue has been discussed elsewhere but I can’t find a name for it. 
I would count your consulting service as RLaaS essentially. I’ll admit, RLaaS is a buzzword that obscures a lot. “Have AI researchers and domain experts iterate on current AI models until they are performant at a particular task” would be more accurate. Things I think this model will apply to:
Anything involving robots. Consider the journey to self driving cars with lots of human data collection, updating the hardware, cleaning the dataset, and tweaking algorithms. Any physical manipulation task that has to be economically competitive will need a lot of input from experts. Factory managers will need robots that operate under idiosyncratic requirements. It’ll take time to iron out the kinks.
To a lesser extent, repetitive internal company processes will need some fine tuning. Filling out forms specific to a company, filing reports in the local format, etc. Current LLM’s can probably do this with 90% success, but pushing that to 99% is valuable and will take a little work.
Research-heavy domains. The stuff covered in publications is 10% of the knowledge you need to do science. I expect LLM research assistants to need adjustment for things like “write code using all these niche software packages”, “this is the important information we need from this paper”, “results from this lab are BS so ignore them”.
My priors are that reality is detailed and getting a general purpose technology like modern AI to actually work in a particular domain takes some iteration. That’s my key takeaway from that METR study:
https://www.lesswrong.com/posts/m2QeMwD7mGKH6vDe2/?commentId=T5MNnpneEZho2CuZS