Melatonin Self-Experiment Results

Throughout the first half of 2025 I did a blinded experiment to see how low to moderate melatonin intake affects me. In this post I summarize my findings.

Tl;dr: In my blinded self-experiment on melatonin, conducted over n = 60 days and analyzed with hypothesize.io, I found significantly positive effects for time to fall asleep (averaging ~25 instead of ~35 minutes, p ~= 0.001) and feeling awake the following morning (5.74/​10 instead of 4.95/​10, p = 0.006; but: this effect only persisted for the lower of two dosages, and did not persist throughout the rest of the day; could well be a false positive, as I didn’t correct for multiple hypothesis testing) for a dosage as low as 0.15mg of melatonin, taken on average ~1h before going to bed.

Feel free to jump ahead to the Results section if you don’t care much about my methodology.

Experiment Setup

I randomized between 3 groups on a (roughly) nightly basis:

  • No melatonin at all (control group)

  • 1 drop dissolved in water (~0.15mg of melatonin)

  • 2 drops dissolved in water (~0.3mg of melatonin)

The dosage may seem low, but past experiences with melatonin showed that taking more than 2 drops often caused me to wake up around 4am, not feeling well rested, which I wanted to avoid.

For blinding, I asked my girlfriend to prepare a glass with ~20ml of water and the 0-2 drops for me, which I then drank. I think blinding worked well, as I almost never had any idea whether the water I consumed contained melatonin or not (there were one or two exceptions where I had a feeling that the water tasted unusual, at which point I did my best not to think about it further).

Before starting the experiment I discussed my setup with GPT-4o (nowadays I would certainly use a reasoning model) and we concluded that aiming for ~60 nights of measurements would be a suitable amount for the effect sizes I was hoping to find.

As I failed to run the experiment on many evenings (e.g. due to traveling, because either my girlfriend or me weren’t home, plus skipping some days, sometimes weeks, where I felt like my situation or mental state weren’t suitable to yield representative results), it took almost 5 months (January 29 till June 19) to collect all the data. This means more than half of the days were skipped. However, given that the decision of when to skip was not causally downstream of the group assignment, but independent of it, it should be fine to analyze the data on a per-protocol basis (filtering for only the 60 days where I actually did the experiment) rather than intention-to-treat (where I would take all days into account, including those where I didn’t actually do the experiment).

This is what I measured each evening that I did run the experiment:

  • Time when I took the melatonin (or pure water)

  • Time I went to bed

The next morning/​day, I then further tracked:

  • My best guess on how long it took me to fall asleep, in minutes

  • My subjective sleep quality, on a scale of 0 to 10

  • The time I first woke up that morning

  • The time I got out of bed (usually by alarm—often this was identical with the previous value)

  • My level of wakefulness right after waking up, on a scale of 0 to 10

  • My level of wakefulness at noon

  • My level of wakefulness at 4pm

Once I had collected 60 such data points, they turned out to entail 20 cases of the control group, 25 cases of 1 drop (0.15mg), and 15 cases of 2 drops (0.3mg) – the round numbers are coincidence, all 3 groups had the same chance of occurring each day.

I was originally planning to evaluate the data “by hand”, possibly within Google Sheets, but shortly before concluding the experiment, I learned of Hypothesize – a new sister project of Clearer Thinking, and a website that makes data analysis like this very easy, so I used that instead and can indeed recommend it[1]. All I had to do was slightly reformat my spreadsheet (making sure my table starts at the upper-left-most cell) and turn times (from hh:mm format) into numbers, then export that as CSV file, and drop it into Hypothesize, which pretty much walked me through the analysis process at that point. All the charts in the results section come directly from Hypothesize.

Hypotheses

When planning out the experiment, I primarily expected (/​hoped for) an improvement in wakefulness upon waking up the next morning, expecting around a 1-point improvement on a 10-point scale (which didn’t materialize; and, in hindsight, also seems like a pretty large expected effect anyway).

I’m surprised now to notice I didn’t put much thought into how melatonin would affect how long it takes me to fall asleep, which ended up being the clearest effect of all. However, I did at least think an effect there is plausible, given I explicitly decided to measure this. Based on the measurements I decided to make, my “implied hypotheses” were that melatonin could have an impact on:

  • Time to fall asleep

  • Sleep quality

  • Wakefulness in the morning and throughout the day

But I didn’t quantify these further.

Results

A summary of the clearest findings:

  • Duration to fall asleep dropped significantly (from 34.5m in control to ~25m for both test groups (p = 0.0006 for 0.15mg and 0.0012 for 0.3mg))

  • Wakefulness after waking up was significantly increased for 0.15mg (from 4.95/​10 to 5.74/​10, p ~= 0.01), but was not increased for 0.3mg (from 4.95/​10 to 5.07/​10, p ~= 0.72) – kinda sus

  • Effects on wakefulness throughout the next day (noon and afternoon) were very small and not-at-all significant, so probably nothing to see here

  • On average, I woke up very slightly earlier when taking melatonin than in the control condition, but not statistically significant

Besides these findings, the data showed very little of interest (or significance).

Given that melatonin is very cheap and taking a single drop of it takes merely a few seconds, the results seem promising enough that I’ll keep taking one drop of melatonin (0.15mg) within an hour before going to bed in the foreseeable future. Sparing me 10 minutes of falling asleep alone seems like a great deal, whether or not there are any effects on sleep quality or wakefulness the next day.

Limitations

Of course a relevant question is how my results compare to a wider audience. Hard to say! I just liked the idea of putting my results out there in a structured manner. Claude 4 claims my average time to fall asleep of ~35 minutes (under control conditions) suggests I may have gone into the experiment with pre-existing sleep issues, which I never really considered. To summarize my general approach to sleep:

  • I typically get up around 7:30 with an alarm, including weekends, but there’s definitely some variation; average get-up time throughout the entire experiment was 7:51.

  • Throughout the experiment my median time of going to bed was 11:15pm, giving me a typical sleep opportunity time of around 8h15. My (estimated) net sleep time per day throughout the experiment was 7h53.

  • I have blue light filters installed on all my devices, kicking in around 8-9pm.

  • Avoid listening to music after ~8pm.

  • I tend to listen to (not too demanding and 1.0x speed) podcasts while falling asleep.

I also have no meaningful insights into possible tolerance effects, but would be surprised if taking such low dosages sporadically, 40 times over 5 months, ran into that issue. And if it’s the case, the data is probably not sufficient to get to any meaningful conclusions.

A quick look did, to my surprise, yield a correlation of 0.24 (p = 0.02) between fall-asleep time and day of the experiment (looking only at intervention days, n=40), which could be tolerance-related, but could also have any number of other possible causes, such as the warmer weather in the summer months. On control days (n=20) the correlation is 0.283 (p>0.2), so it appears I generally took longer to fall asleep in later months. I guess I can’t rule out tolerance effects, but would expect other (such as seasonal) causes to explain this trend.

Detailed Results

Sleep Quality

Subjectively reported sleep quality, on a 0-10 scale. The chart shows the sleep quality of the two test groups compared to the control group. The dashed line would be effect size 0. The three colors (dark blue, light blue, gray) represent confidence intervals of 80%, 95% and 99% respectively.

GroupMean (0…10)Changep-value
0mg (Control)7.15
0.15mg7.24+1.3%0.73
0.3mg7.25+1.4%0.72

Conclusion: No meaningful effect. Very slightly promising direction, but would require a much larger study (or more accurate measurements than my subjective self assessment) to test whether there’s something there, and is probably not worth the effort. In short, nothing to see here.

Fall Asleep Duration

Subjective assessment made the next morning of how many minutes it took from going to bed to actually falling asleep. Naturally, there will be a lot of noise here and my guessed time can easily be off by several minutes. But as this noise affects all groups equally and I didn’t have any better measurement available for this, seems useful enough.

Note: if the experiment was not blinded, I would be highly skeptical of this, but given that blinding worked well, I put much credence in these findings.

GroupMean (minutes)Changep-value
0mg (Control)34.5
0.15mg25.24-26.8%0.0013
0.3mg24.29-29.6%0.0009

Conclusion: Pretty strong positive effect. Lying awake for ten minutes less each night seems like a win.

Wakefulness after Waking Up

Subjective assessment of wakefulness (on the following day) on a 0-10 scale.

GroupMean (0…10)Changep-value
0mg (Control)4.95
0.15mg5.74+16%0.006
0.3mg5.07+2.4%0.72

Conclusion: Questionable. The naive interpretation that 0.15mg has a notable effect on my wakefulness, yet 0.3mg has no effect may of course be possible, but I certainly wouldn’t have predicted such an outcome ahead of time. It’s somewhat reassuring that both values point in a positive direction, but I wouldn’t rely much on these – particularly in light of the two findings that follow. Plus, given the many things I’ve tested here (without correcting for multiple hypothesis testing), risk of running into false positives is high, so this may well be one.

Wakefulness at Noon

Subjective assessment of wakefulness (on the following day) on a 0-10 scale.

GroupMean (0…10)Changep-value
0mg (Control)7.05
0.15mg7.22+2.4%0.40
0.3mg7.13+1.1%0.73

Conclusion: Tiny effect sizes and very far from statistical significance, so nothing to see here.

Wakefulness at 4pm

Subjective assessment of wakefulness (on the following day) on a 0-10 scale.

GroupMean (0…10)Changep-value
0mg (Control)7.05 (it’s coincidence that this is the exact same value as for the noon measurement)
0.15mg6.96-1.3%0.72
0.3mg6.93-1.7%0.68

Conclusion: Again, probably nothing to see here.

Wake-Up Time

This is not necessarily a metric I care that much about or where I would even have a strong opinion on which direction is preferable. I just wanted to have a look at effects on this, as my past experience seemed to suggest that, when taking melatonin, I occasionally would wake up at e.g. 4AM, which doesn’t really happen otherwise. Also note that the values here were pretty strongly capped by me typically getting up via a 7:30AM alarm.

In this chart, the y-axis shows the difference of the two test groups’ wake-up time to the control group in hours:

GroupMean (hh:mm) [AM]Changep-value
0mg (Control)07:03
0.15mg06:51-12m0.55
0.3mg06:52-11m0.62

Conclusion: Direction as expected, but effect size too small to find anything conclusive. Definitely not as clear or strong an effect as I would have assumed based on past experiences.

When Best to Take Melatonin

Lastly, a speculative section about timing of the intake. This is probably mostly useless but has some nice looking charts.

While running the analysis, I thought it might be interesting to figure out what is likely to be the best time to take melatonin. I wasn’t at all systematic about this during the experiment, and this time difference ranged from 19 minutes to 174 minutes. This is the distribution (x-axis shows time difference in hours, green dots are the data points):

Distribution of the time elapsed between taking melatonin and going to bed. Mean: 67 minutes, median: 48 minutes

Note that this time delay was not randomized, so any conclusions derived from this data are much more uncertain and likely subject to confounding factors.

To test if the time makes a difference, I discarded the control group, looking only at the data where I took melatonin (not distinguishing in this case whether 0.15mg or 0.3mg), and split this resulting data (consisting of n = 40 data points) into three eye-balled groups: <36 minutes (n = 11, group 0), 36 − 60 minutes (n = 12, group 1), and >60 minutes (n = 17, group 2).

For these three groups, I then compared the time to fall asleep (as this one had the largest effect sizes and hence the best chance of yielding results):

So, it appears, directionally, that going to bed relatively early after taking melatonin worked better for me, although the differences don’t quite reach statistical significance.

Here’s each of the three groups compared to the other two groups:

GroupMean (time to fall asleep)p-value (group compared to other two groups)
<36 minutes22.4 minutes0.12
36-60 minutes24.2 minutes0.35
>60 minutes29.8 minutes0.02

Additionally, here’s a scatter plot showing the association between the time diff between taking melatonin and going to bed (x-axis) and the time it took me to fall asleep (y-axis):

Conclusion: limited and very noisy (and also in this case only observational!) data, might very weakly suggest that taking melatonin relatively shortly before going to bed (0-60 minutes) may be beneficial – but getting more conclusive evidence on this would require an experiment design with actual randomization of the timing.

Learnings for Future Self-Experiments

  • Being responsible for the blinding was annoying for my girlfriend, and it would be better to find ways to either do it myself in future experiments, or at least streamline the process to not have her involved every day of the experiment

  • I should make more of an effort next time to think about my hypotheses beforehand, and quantify them

  • Make a cleaner power analysis next time

  • It probably would have made sense to invest in an app or device that tracks sleep metrics more accurately than just going by my subjective assessment the next morning

  • Hypothesize is nice and I’ll use it again

  1. ^

    Full disclosure, I’m affiliated with some people involved in Hypothesize and may thus have a more positive inclination towards the tool. But it’s a fact that it made my job analysing this data much easier and this post would likely be much less insightful/​accurate without it. :)