Null Results From An Orexin RCT
Over the last few months we[1] have been doing a sleep experiment inspired by our suspicion that orexin is an exciting target for sleep need reduction.
We mildly deprived ourselves of sleep (5-5.5 hours, relative to 7-7.5 hours normally) and took either a placebo or orexin intranasally. We tracked our sleep the night before and after taking a dose in the morning and completed various tests of mental acuity during the day.
The results from our initial experiment are exclusively null results that don’t cross standard thresholds for statistical significance. Not that this was particularly surprising, we expected a ~60% chance of this happening. We’re considering next steps, and need your feedback!
For now, there are a few things to cover in the results.
Trial Design
We performed a self-blinded randomized controlled trial with blocking, each participant took either the placebo (2.5 mL of sterile water) or the orexin (100 μg of orexin-A dissolved in 2.5 mL of sterile water). Here’s the procedure, repeated for every block:
Prepare two nasal atomizers, one with saline solution and one with orexin+saline solution
Night to the first day: Sleep 5-5.5 hours.
First day:
At a consistent time of day, randomly select one and administer.
Take mental acuity tests. Once mid-day, once in the evening.
Track sleep, heartrate, etc. using a Fitbit Inspire 3.
Night to the second day: Sleep normally.
Night to the third day: Sleep 5-5.5 hours.
Third day: Repeat 3.1-3.3, using the remaining dose/placebo you prepared.
Night to the fourth day and fifth day: Sleep normally.
Fifth day: Record baseline sleep measurements and mental tests
Each person had a substantial amount of leeway in how they structured their day. On sleep deprived days, I personally preferred to get up early while No Magic Pill and niplav preferred to stay up late and get up at the usual time. We each took doses at a consistent time, but the time differed between people.
We didn’t standardize things because we thought it was more important to have ecological validity, i.e. that we were using orexin the way we would actually use it in everyday life. This is a higher variance, but lower bias approach.
The Results
In our initial proposal, we pointed out that the main thing we wanted to see was orexin causing less rebound sleep the following night. A simple stimulant effect isn’t enough for us, we wanted to use orexin to sleep less and get away with it.
So here’s the average sleep time for the night after taking orexin vs the night after taking placebo:
Unfortunately, the difference wasn’t significant and the effect size is small. This could be for a couple reasons that we want to address in the next trial.
Did orexin have any sort of stimulant effect during the day? Nope, none of the mental acuity tests are significantly different.
In setting up this trial we had a sneaky second hypothesis: Does sleep deprivation actually make you dumber?
One caveat before we look at the data. Typically our “baseline” days would come after our sleep-deprivation days. So that means baseline days enjoy more cumulative practice compared to sleep-deprivation days. That should bias the results by making baseline days look better. On the other hand, if sleep deprivation has long term cumulative effects, then perhaps baseline days are at a disadvantage. But that doesn’t match our experience of feeling significantly better on baseline days.
So, does sleep deprivation make you dumber? Not really!
Depending on how you correct for multiple comparisons, the psychomotor vigilance task (PVT) differences might be significant. And I’d expect the PVT differences to become significant with more data points. From what I (Sam) remember from doing PVT on sleep-deprivation days, I felt just as fast, but I would slip-up more from inattention or distractions. This is consistent with the large gap on the slowest 10% days.
But overall, this is a nice example of how our intuitions around sleep can lead us astray. It sure feels like sleep deprivation should make you dumber. But we don’t see that here. It’s important to actually check what changes our productivity because our intuitions around this are pretty fuzzy.
The Next Trial
There’s a few reasons why we might be getting a null result. We might have too few data points, or the dose might be too low, or more concerningly, we might be storing the orexin improperly.
So the first next step is a slightly bigger trial where we see if a higher dose of orexin changes our results. From anecdotes online, some have felt effects while others haven’t. But even if orexin doesn’t have obvious effects, it might still reduce sleep need. We need to try higher doses and collect more data to find out.
That said, sleep deprivation is uncomfortable for Sam and No Magic Pill and extremely uncomfortable for niplav. We’ve decided to try a different design: sleep ad libitum on all nights of the week, but observe whether orexin reduces the amount we sleep the night after. This should make it sustainable to collect a lot more data.
Appendix A: Details about the Data Analysis
We collected two separate datasets:
Data collected automatically from the Fitbit Inspire 3
Measures of sleep
Sleep duration
Sleep efficiency
Time spent in deep sleep
Time spent in light sleep
Time spent in REM sleep
Time spent in nocturnal awakenings
Additional measures
HRV Daily RMSSD (ms)
HRV Deep RMSSD (ms)
SpO2 Avg (%)
SpO2 Min (%)
Breathing rate (breaths/min)
Skin temperature Δ (°C)
Steps
Data from the mental acuity testing
Description of subjective state (free text)
We aggregated mental acuity tests per-test to avoid pseudoreplication (so two data points per day), and aggregated Fitbit data per-day. We analyzed the data via matched controls (with days in a participant-block being matched as to analyze within-pair differences) and ran two separate analyses on the data; one frequentist and one Bayesian. The code for the analysis, written in Julia by Claude Opus 4.6, is available here. Our mental acuity test data is available here, aggregated full data is available here.
Frequentist Analysis and Additional Results
In our frequentist analysis we ran the paired t-test on the paired data with cardinal measurements, and the Wilcoxon signed rank test on paired data with ordinal measurements, we also report Cohen’s d for the measurements. We Bonferroni-corrected the p-values, not that that was necessary…
Variable | Effect Size | p-value | p-corrected | Orexin | Placebo | Difference |
PVT Mean RT (ms) | 0.100 (Cohen’s d) | 0.624 | 1.000 | 256.0 ± 28.0 (n=50) | 253.3 ± 26.2 (n=46) | +2.7 |
PVT Median RT (ms) | 0.149 (d) | 0.469 | 1.000 | 243.6 ± 18.3 (n=50) | 240.8 ± 18.9 (n=46) | +2.8 |
PVT Slowest 10% (ms) | -0.024 (d) | 0.908 | 1.000 | 296.7 ± 59.9 (n=50) | 298.2 ± 68.3 (n=46) | -1.5 |
DSST Correct | 0.211 (d) | 0.303 | 1.000 | 69.7 ± 10.6 (n=51) | 67.4 ± 11.3 (n=46) | +2.3 |
DigitSpan Forward | 0.148 | 1.000 | 7.86 ± 1.00 (n=42) | 8.10 ± 1.13 (n=40) | -0.24 | |
DigitSpan Backward | 0.061 (r) | 0.627 | 1.000 | 7.31 ± 0.95 (n=42) | 7.38 ± 1.25 (n=40) | -0.07 |
DigitSpan Total | 0.127 (r) | 0.318 | 1.000 | 15.2 ± 1.7 (n=42) | 15.5 ± 2.0 (n=40) | -0.3 |
SSS Rating | -0.178 (r) | 0.112 | 1.000 | 3.29 ± 1.02 (n=52) | 2.98 ± 0.86 (n=46) | +0.31 |
Sleep Duration (hrs) | 0.212 (d) | 0.542 | 1.000 | 8.60 ± 1.91 (n=17) | 8.27 ± 1.05 (n=17) | +0.33 |
Sleep Efficiency (%) | -0.257 (d) | 0.460 | 1.000 | 89.4 ± 5.3 (n=17) | 90.5 ± 3.7 (n=17) | -1.2 |
Sleep Deep (min) | -0.011 (d) | 0.974 | 1.000 | 74.5 ± 22.9 (n=17) | 74.7 ± 19.3 (n=17) | -0.2 |
Sleep Light (min) | 0.232 (d) | 0.505 | 1.000 | 283 ± 69 (n=17) | 270 ± 40 (n=17) | +13 |
Sleep REM (min) | -0.150 (d) | 0.665 | 1.000 | 101.2 ± 27.3 (n=17) | 104.9 ± 21.8 (n=17) | -3.7 |
Sleep Wake (min) | 0.341 (d) | 0.331 | 1.000 | 56.9 ± 38.0 (n=17) | 46.6 ± 19.6 (n=17) | +10.3 |
HRV Daily RMSSD (ms) | 0.079 (d) | 0.814 | 1.000 | 32.8 ± 13.0 (n=18) | 31.7 ± 15.1 (n=18) | +1.1 |
HRV Deep RMSSD (ms) | 0.369 (d) | 0.276 | 1.000 | 31.8 ± 12.2 (n=18) | 27.2 ± 13.0 (n=18) | +4.6 |
SpO2 Avg (%) | -0.286 (d) | 0.397 | 1.000 | 95.7 ± 1.0 (n=18) | 96.0 ± 1.0 (n=18) | -0.3 |
SpO2 Min (%) | -0.059 (d) | 0.861 | 1.000 | 93.6 ± 1.4 (n=18) | 93.7 ± 1.6 (n=18) | -0.1 |
Breathing Rate (breaths/min) | 0.314 (d) | 0.382 | 1.000 | 16.4 ± 2.0 (n=17) | 15.8 ± 1.9 (n=15) | +0.6 |
Skin Temp Δ (°C) | -0.041 (d) | 0.905 | 1.000 | 0.01 ± 0.65 (n=17) | 0.04 ± 0.49 (n=17) | -0.02 |
Steps | 0.032 (d) | 0.909 | 1.000 | 6478 ± 6403 (n=27) | 6282 ± 5996 (n=26) | +196 |
Bayesian Analysis and Additional Results
We fit a hierarchical Bayesian linear model with participant random intercepts, using NUTS (4 chains × 2000 samples per metric). The primary estimand is δ, a standardized treatment effect (Cohen’s d-like), with a weakly informative N(0,1) prior.
Formally, the likelihood is yᵢ ~ N(μ + δσ·treatmentᵢ + α[pᵢ], σ), where treatmentᵢ ∈ {0,1} encodes placebo/orexin. The raw treatment effect on the outcome scale is δσ; δ alone is dimensionless. Priors: μ ~ N(0,10) (vague grand mean), σ ~ half-N(0,10) (residual SD), τ ~ half-N(0,5) (between-participant SD), α[j] ~ N(0,τ) iid for each participant j.
Priors and posteriors for cognitive acuity tests and sleep measurements:
Priors and posteriors for additional Fitbit data:
Learning Effects on Mental Acuity Tests
Circles for the first test of the day, diamonds for the second test of the day.
Appendix B: Threats to Validity
Our method seems simple on its face, but there were a lot of annoyances along the way.
Orexin was delivered at room temperature, and while the vendor claims the orexin was lyophilized, we are uncertain if the lyophilization was sufficient to prevent damage.
In niplav’s case the orexin sat uncooled in customs for over a week during the delivery in July.
In order to distribute the orexin into vials, we had to dissolve it in water. This meant that we had to both store the orexin dissolved in water for almost a week, and freeze the rest. We are uncertain if any of those damaged the peptide structure.
We are uncertain if our route of administration can cross the blood-brain barrier.
One participant wasn’t aware that Fitbit data needs to be regularly synced, so we only have sleep data for two individuals. Additionally, Fitbit syncing and data collection is unreliable, leaving us with only 17 datapoints for nights of sleep following orexin.
Another headache was making sure sleep deprivation nights were scheduled between nights where we could sleep ad libitum. We also tried to keep a consistent schedule on trial days so that variations in exercise, nootropic consumption or other activity didn’t change our results.
Appendix C: Personal Experiences
Niplav:
So much stuff can go wrong
2.5ml is way too much, let’s do 1ml next time
Filling the syringes was fun! It felt very scientist-y.
Plausibly we should’ve started with a higher dose but also safety concerns so whatever
5½ hours of sleep feels horrible
I was ~completely unproductive on sleep deprivation days, and will put a high premium on this
Caffeine was really helpful
Two nights of normal sleep & then one more sleep deprivation would’ve been better
Inform yourself about the reliability of the data collection tools
Didn’t nap at all except in week two, when I just couldn’t stay awake
I’m happy we did a short trial first so we discovered the data collection issue early
Beat-by-beat experiment log here
Sam:
Feel so much better and alive from my rebound sleep, better than a normal days sleep even.
Consistently napped on placebo days, but orexin does seem to have stimulant effect.
Couldn’t really tell which was which in general
Felt mentally the same on trial days. But doing tests felt a little slower than on baseline day. In general I didn’t have a good sense of how productive/smart I was.
Extra hours in the morning did get used, for somewhat intellectual tasks like reading papers and writing.
Much easier to be up early when sun was up
No Magic Pill:
Tests:
PVT: I don’t think my reaction time improved at all over the course of the testing (75% confident). I did not mind this test.
DSST: I don’t think my skills improved much over the course of the testing. I disliked this test the most.
Digit span: I am 75% confident that I got better over the course of testing. I was consistently able to get 9 both forward and reverse towards the end. I disliked this test the second most (behind DSST).
Sleepiness: I never scored that high and probably erred on the side of scoring higher because of a feeling that I needed to utilize most of the scale. I did not mind this test.
Feelings: I could have been more verbose here.
It was wayyyyyyy easier to stay up late and get up at my normal time than it was to go to bed at my normal time and wake up early.
I was fairly productive when staying up late
I was NOT productive when getting up early
Most of the time I had full-body “tingles” when awakening after a sleep-deprived night. I’ve experienced this phenomenon for years: anything less than ~6 hours, or normal duration after a hard exercise session, leaves me “tingley” in the morning.
I was a normal level of irritable and quick to anger after a sleep-deprived night. This has been consistent for years (like the tingles).
I did not nap on any day (pre-test or day-of test) because I thought that would throw off the testing data.
My motivation on sleep-deprived days often waned faster than non-sleep deprived days. This has been consistent for the past few years and is what I expected.
I did not feel anything physically or mentally immediately following the orexin administration.
I should have done a better job of isolating myself during testing. Sometimes it was a bit noisy or visually distracting, especially if I was at work. I should have noted down if I was distracted during the test.
Maybe a feature can be added to add comments at the end of each test?
I agree with Niplav that 2.5 mL was too much water. 1.25 mL was good, if not a tad much as well. I think 1 mL is probably good for the future?
This is an awesome self-experiment. Sadly the 100ug dose used might be too low by at least an order of magnitude. How was it chosen?
We’ve been doing extensive placebo-controlled preclinical mouse trials with orexins and stabilized orexins in the last 6 months. Typically, we use 1 − 1000 ug per mouse per dose. We’ve found that 100ug intranasally is our common effective dose in mice, to achieve measurable behavioral effects such as improved wakefulness or increased locomotion.
Happy to share more preclinical research data here privately.
Scaling drug dosing from mice to humans would via conventional allometric scaling done in reseach would suggest at least a 100x higher dose in humans, if not more.
(Though this is complicated in this case by intranasal dosing, peptides, etc).
Another interesting recent data point for needing higher doses: In narcoleptic humans, the potent small-molecule orexin agonist oveporexton when dosed in milligrams achieves “improved attention, memory, and executive function over 8 weeks.”
“Least-squares (LS) mean placebo-adjusted changes [in attention, memory, executive function] from baseline were −10.77 (95% CI, −16.74 to −4.79), −9.45 (95% CI, −15.66 to −3.24), −8.60 (95% CI, −14.84 to −2.36), and −8.69 (95% CI, −14.90 to −2.47) PVT lapses with 0.5/0.5 mg, 2⁄2 mg, 2⁄5 mg, and 7 mg/placebo doses, respectively.” (Ref) Note that this is in narcoleptics, who have less orexin signalling, hence complicating these otherwise very impressive results.
Key point: Oveporexton is dosed in a 0.5 to 7mg range with good systemic absorption, while also being as potent or more potent than natural orexins and being much more stable with a multi-hour half-life than orexin A’s rapid degradation. Interesting for you to consider that 100ug of natural orexin might be underdosed.
Lastly, published human intranasal orexin-A studies were already in the 1.55-1.78 mg range (435-500 nmol), whereas this experiment used 0.10 mg (100 µg, about 28 nmol). Human intranasal orexin-A references with explicit doses:
Baier et al. 2011, Sleep Medicine
https://pubmed.ncbi.nlm.nih.gov/22036605/
Dose: 1.55 mg (435 nmol) in narcolepsy with cataplexy
Finding: reduced REM sleep quantity and reduced direct wake-to-REM transitions.
Weinhold et al. 2014, Behavioral Brain Research
https://pubmed.ncbi.nlm.nih.gov/24406723/
Dose: 1.55 mg (435 nmol) in narcolepsy with cataplexy
Finding: fewer false reactions on divided-attention testing, plus REM-stabilizing effects.
Meusel et al. 2022, Journal of Neurophysiology
https://pubmed.ncbi.nlm.nih.gov/35044844/
Dose: 1.78 mg (500 nmol) in healthy lean males
Finding: increased resting muscle sympathetic nerve activity after intranasal orexin-A (meh?)
Sayk et al. 2015, Exp Clin Endocrinol Diabetes abstract
https://www.thieme-connect.com/products/ejournals/abstract/10.1055/s-0035-1549071
Dose: 1.78 mg (500 nmol) in healthy normotensive males
Finding: null on sympathetic baroreflex
In other words, this trial dose was only about 1⁄15 to 1⁄18 of doses already used in prior human intranasal orexin-A studies. That likely matters even more because orexin-A is a fragile peptide with rapid degradation and uncertain intranasal CNS delivery, so effective brain exposure is probably well below nominal administered dose.
The 2.5 mL intranasal volume also seems large enough to increase runoff and swallowing, which could further reduce effective exposure. For our human phase I we’re planning to keep volume to 500ul or less.
That makes me consider if this is a too conservative peptide dose and formulation, not strong evidence against the target.
I’m very impressed by the self-experimentation and rigor, and would be excited to see and fund more of this work on Manifund.
Thanks for this background, it’s super helpful!
For dosing, I think we based the dose on the Deadwyler et. al. study in monkeys as well as user experiences.
Unfortunate that a 10-20x dose in humans seems to have small effects? I would have expected them to use a C-terminal -NH2 modification on their Orexin-A to prevent degradation, but it doesn’t look like it (but maybe they did and I missed it). If that’s the case, might reduce the effective dose gap between ours and theirs somewhat.
I’m excited to see our results at a higher dose, though part of me is frustrated by how difficult peptides are to work with. But hopefully Takeda or someone else will perfect small-molecule orexin agonists!
I find it great that you took up the task to run the experiment. I’m a bit curious about whether part of you getting interested in orexin was downstream from my post Orexin and the quest for more waking hours.
When I asked ChatGPT “What happens to orexin if you store it in a saline solution?” I got as a response “If you store orexin in saline, expect it to become unreliable fairly quickly.” Using as freemany detailed a low dose and additional a poor way to store it, probably resulted in the null effect.
That’s a great post! It did more to popularize the idea than I ever could. I’ve been thinking about this for a while and my first writing on the topic was in 2021. I’m going to refrain from linking to it because I’m planning on depreciating that blog soon though.
Re storage and handling: this part was tricky, we opted to dissolve in sterile water (not saline) and froze the batch after mixing. So doses were only exposed to room temperature for ~minutes. We also used a C-terminal NH2 peptide that is less susceptible to degradation.
There’s still no guarantee that peptide is stable under these conditions and we don’t have a good way to check. This is a big reason why we thought there was a ~60% chance of a null on this trial (and the next one too perhaps). But hope springs eternal!
I think this is one of the cases where a quick discussion with an LLM can be helpful to check trial protocols. ChatGPT did find https://cdn.caymanchem.com/cdn/insert/15073.pdf which suggests dissolving the Orexin first in DMSO.
This is enormous overanalysis of an underpowered study design. N=2 to evaluate what you hypothesise to be a small effect is pointless. Did you perform a power analysis before you started?
I don’t know why others are downvoting this. Almost the first thing I did on opening this article was Cmd+F search for “power.” When I hear about a null result for something I care about, whether the study had enough power to detect a positive result if there is one of the first things I want to know; if the answer is no, then there’s little to be gained by reading it.
I don’t like the jump from “N=2” to “underpowered” (I read a good part of a book on single-case study design), but that’s more analysis than I found skimming and searching through TFA.
Thank you for offering a more constructive comment.
We did a power analysis to set the total number of trials (iirc assumed d=0.5, alpha=0.05, 80% power, so ~30 total test weeks and 10 weeks/person). However, the design proved unsustainable for us and the Fitbit dropped one persons data.
Though in some sense it worked out, we can pursue a better trial now.
Nit: this article’s first author is niplav and I have no idea who is “I” here.
Also your footnote got fucked by the wysiwyg → markdown conversion
Oops, that should read “Sam preferred to get up early …” my bad!
Very interesting. My crude guess is not enough getting into the brain before the peptide degrades. Orexin nasal spray trials for narcolepsy have beem kind of disappointing so far, which is why companies like Takeda are developing orexin agonists.
Keep up the experimentation. I wrote about something related by the way—S-modafinil, the shorter acting enatomer of modafinil (modafinil, as you know, boosts orexin, (or orexin signaling.. something like that).. and also boosts dopamine as well).
Yeah we used C-terminal NH2 modified orexin to prevent degradation but its possible it simply wasn’t effective.
Interesting that orexin sprays haven’t been working, I’ll have to look into this. Do you know the names of any off the top of your head?
Love that post! Made me realize that sleep need reduction therapies have to be pretty specific in what receptors they hit. A stimulant like modafinil that hits orexin + other stuff doesn’t reduce long-term sleep need in healthy individuals right? So a sleep need therapy needs to stimulate for just the right window, while enabling efficient sleep at night.
My experience from playing a lot of online chess is that tiredness, exhaustion, illness etc doesn’t necessarily immediately crash my performance. Often I feel bad but still play well. Performance then crashes over the following days.
Interesting! Once you get a good nights sleep or a break does your performance go back to normal? Or does it take a few days?