The Machine that Broke My Heart
Obesity is one of the world’s greatest health problems. It contributes to everything from diabetes to cancer. Obesity is hard to prevent because tracking calories requires effort. It’s too easy to eat absent-mindedly if you’re a middle-aged woman in charge of your household’s food. Manually logging your food intake is hard. Food tracking should be automatic, like a FitBit. If we had a cheap machine that could effortlessly track food intake it would save many lives and improve the quality of life of even more people.
I talked to the lead engineer at company that is a leading manufacturer of wearable electronics. The company is worth tens of billions of dollars. His manager tasked his team to make a device that used IMU data to detect when a person is eating. He failed.
Me and my team succeeded. Our device worked so well our test subjects thought we were cheating. They thought we had a man behind the curtain. It was all software. My friends and family would try to fake eating by pantomiming it and the device wouldn’t go off until they took a bite for real. Our algorithm was so good it often detected when people were eating before they even lifted their fork. We had to program a delay into the alarm system so the predictive signal didn’t disrupt the user’s actual behavior.
We trained our machine learning model on six hours of annotated data. The algorithm we deployed was so small and efficient you could run it all day on a Nordic SoC the size of a fingernail. I trained the whole thing on a laptop from 2016.
Our system was perfectly general. You could dump a different hand movement into it and the algorithm could detect the new behavior too. All you had to do was strap a sensor to your test subject, video record them and then tell the computer where in the video the new hand movements began and ended. Press one button and my toolkit would turn that training data into a binary executable ready to be deployed on your embedded system. It did not need to be customized for individual users (except left-handers). Just put it on and you’re good to go. Individual units cost us $30 to produce. Large-scale manufacturing could drive the price much lower.
You might wonder “Where is this magical machine?” I threw it away.
I could reconstruct it. Strap an IMU onto peoples’ wrists. Manually annotate where fork movements begin and end. Apply a complimentary filter and a moving average filter. Normalize everything. Turn the continuous datastram into discrete symbols by feeding it into a random forest classifier[1]. Dump the random forest classifier into the recently-invented WarpingLCSS algorithm[2]. Add some threshold cutoffs for trigger and reset. Use Facebook’s Adaptive Experimental Platform to tell you what hyperparameters to use. Port the whole thing to C. Deploy it on a microcontroller.
We had experience selling consumer electronics. We had the hardware for hundreds of these wearable food trackers just sitting in boxes.
But startups are hard. We had been working full-time for (basically) no pay for six years. We shut the project down and parted amicably. We all had different reasons for leaving. Personally, I didn’t want to spend the rest of my life running a diabetes prevention program. The reason our algorithm could be trained on my laptop instead of a supercomputer was because I had invented a machine learning trick so new it had been invented independently and published in an academic journal within the previous two years. I had more tricks where that came from. I wanted to find out what would happen if I pushed my machine learning skills as hard as I could.
I see obese people everywhere. People die every single day from a health problem I could have done something about. I was the only person on Planet Earth who could have built a passive eating tracker with the technology available in 2020. I didn’t.
How did this system actually track calories? Detecting that the user is consuming food seems like a fairly solvable problem; tracking what they’re eating—which is going to have order-of-magnitude effects on caloric intake—seems like a much harder problem.
I can’t seen any obvious ways to do it, other than by requiring significant user input, and that would rather negate any benefits that a passive, low effort tracker had.
Am I missing something here?
(Was it a “beep to remind you not to snack” device, rather than a calorie tracker?)
It didn’t track calories. It tracked bites. The problem we were attacking was eating awareness and frequency. It might’ve been possible to track food types too but we didn’t get that far.
Yes.
Even a tool like this was useful enough that people repeatedly and independently, on their own initiative, asked us to build it for them. They were all from the same demographic: women, often stay-at-home, between the ages of 30 and 50. Their problem, as they saw it, wasn’t eating calorie-dense food. It was frequent unconscious snacking.
I read it as “beep to remind you to log what you’re eating”.
Thank you for sharing this story!
I think there’s a small bit of illusion of transparency going on here. You mention “IMU” (“inertial measurement unit”?) twice but don’t explain the acronym, so I think some readers didn’t necessarily understand what you’d built, or what the value-add is.
Now, that also means I am not quite sure whether I’ve understood you correctly, but I think it’s supposed to be:
Picture a tracker or smartwatch, e.g. a Fitbit. Or even some smartphones. These all include various sensors, like accelerometers and gyroscopes. The combination of the latter two allows tracking time-series data of body movements. If you wear a tracker around your wrist, that includes arm movement, among other things.
People wear these trackers for various reasons, e.g. to change their behavior. The features offered by popular trackers like the Fitbit depend partly on the included sensors, and partly on the algorithm used to interpret the sensor data. For instance, if you were able to identify the signal in the noisy data, you could interpret certain accelerometer & gyroscope data as “you’re taking a bite of a snack”.
As I understand it, there is not in fact a commercial bite tracker, so Lsusr is saying his startup found a way to reliably identify when a tracked user takes a bite of a snack, and then to alert the user of this behavior, in a way companies like Fitbit have not managed to. And if you have something like that, you have a prototype of a valuable product, e.g. because it could help people become conscious of their unconscious bad habit of snacking, which in turn would make it easier to stop.
Lsusr, I understand that you’ve personally decided to stop this project, because your time and resources are limited and better spent elsewhere. That said, if you still think this is an unusually good idea but that you can’t execute on it, consider posting a more technical version of this story on the EA forum. Most likely nothing will come of it, but very occasionally there are people in search of projects. I haven’t done the EV-calculation myself, but this project is the kind of thing that, if it actually worked, sounds like it could actually have a somewhat competitive EV, given the abysmal efficiency of other obesity interventions?
If you have patents or would (for the right price) be available to consult others who gave this project a try, mention that, too.
Your description of the invention is precisely correct. Cross posting on the EA forum is a good idea.
I’d enthusiastically do part-time contract work to help make a machine like this happen again, for the right price (which might be surprisingly high or low, depending on one’s expectations). It’d be way faster and cheaper for a company to hire me than to do it themselves from scratch. It’d be less risky too. My mental model is that the bottleneck on a project like this is good founders/executives. It is a question of leadership and will.
So, you and your team spent six years of effort working full time for no pay (what did you even eat then?). You developed a product that worked just great, was in demand and could make a difference in fighting obesity by making a beep whenever the wearer eats. But even though the product was ready—“just put it on and good to go”—and you can easily reconstruct it, you and your whole team decided to abandon it and part ways. Because you simply aren’t that into diabetes prevention, and also your time is limited and you have more important things to do. But you would enthusiastically do part-time contract work on this project again.
I feel like this story doesn’t quite make sense. If the company was doing so well and you just didn’t want to run it anymore, why didn’t you sell it?
What was it that made you know or decide it was time to throw in the towel?
My time is limited. I had more important things to do.
Was there something specific that caused you to stop when you did, rather than sooner or later?
I’d rather not get into the details.
“Turn the continuous datastram into discrete symbols by feeding it into a random forest classifier.”
Would you be willing to say any more about this? It sounds really interesting.
The LCSS algorithm (from which WarpingLCSS is an optimization) operates on a sequence of discrete symbols such as letters or nucleotides. This is in contrast to a neutral network whose natural inputs come from a continuous vector space. IMU data is continuous. We needed to bucket it before feeding it into LCSS. The random forest buckets continuous data into discrete categories. We tried support vector machines too but random forest classification worked better.
Aha—so like how something like word2vec maps discrete symbols into a shared vector space, this does the other direction, mapping vectors into discrete symbols. Never thought about that direction; thanks!
Did you really throw away the software and not keep it in a VCS or on an extra storage device? I feel a sort of pain thinking about it—that if it really worked as well as described, it may not at all be easy to rebuild.
We kept the software. It’s somewhere on the cloud in a VCS. Not sure about the data. This was an embedded system. The hardest, most frustrating part to maintain was hardware integration. The prototype was a specific wearable device attached to a specific laptop. I erased that laptop and gave it away to a relative.
In a perfect world, I’d redo this project as a contractor for an established wearables company. I’d do the machine learning, they’d do the hardware and we’d outsource the data annotation to Mechanical Turk. (Data collection is easy. The data bottleneck is annotation.) But that takes industry connections I don’t have.
My other comment is that you probably didn’t succeed as well as you thought you did. I am taking your story at face value—that your model was spookily accurate, that it worked way better than you could reasonably expect, etc. But scale matters. Many tech prototypes work perfectly at small scales but would fail if you had built a few thousand or million hardware instances and tried them with the user wearing them across the full range of human activity and cultures.
But say you did, and your model that’s cheap enough to train on a laptop and run on an embedded CPU is perfectly accurate. How does this solve the actual problem? Obesity is caused by humans getting incredibly strong, insidious urges to eat (and maintain their weight once fat). The actual root cause is probably poisons in the food supply, explaining the delta between specific countries of genetically comparable people, but in any case, how does this gadget help?
In The Circle everyone wears cameras all the time and thus gets peer shamed into not eating, but this has issues other than the privacy ones, such as most people not being attractive enough/popular enough to have any peers to shame them.
I’ve on many occasions tried the myfitnesspal calorie counting. It does work but the longer you do it the stronger your urges to restore your missing mass. It’s like compressing a spring. Significant weight loss is very difficult. It’s not impossible, I’ve been −35 lbs for 4 years now, but I have not been able to get from being merely ‘overweight’ to ‘normal’ BMI.
If this technique you’re talking about is actually as good as you say it is, and can run very performantly on regular PCs, I would be willing to pay you to help build a PoC for a particular problem class that this kind of pipeline would add a ton of value to. Although to be honest, I’m kind of curious why your cofounders decided to quit at this particular moment, if people liked your product so much and it worked so well.
We should continue this conversation privately in either Less Wrong private messages or email.
I’m obese and struggle with weight loss, so this is a particularly sad story to hear. My experience makes me think a lot of my issues could be improved just by having someone, like, standing 24⁄7 by my side going “hey” when I go to buy ice cream, stress-eat etc.
Would you be willing to make the pieces you made available to someone who wanted to pick up where you ended? I’d probably not be that person, though (because don’t have the spoons).
I don’t recommend anyone use that actual literal hardware we used. Hardware advances fast and some of the components we used are no longer manufactured by Nordic Semiconductor. It would be better to start from scratch with new hardware. The hardware was not complicated. It was just an industry-standard IMU attached to an industry-standard microcontroller attached to a battery, a vibrating motor and a charging light.
If someone wants to take on this project the thing to steal from my experience would be the machine learning architecture. That’s where all the hard technical challenges were. I think I have left behind enough hints in my story to save them 80% of the algorithmic work. Anyone competent enough to pull this project off could probably muddle through the remaining 20% on their own, but I recommend they hire me instead.
Is this something that can now be replicated on a sports watch or wearable via an app?
Not on an Apple Watch. At least, not on an Apple Watch the one time we tried porting our code to it. But it’s theoretically possible to implement the algorithm on any wearable with an IMU which meets basic power efficiency requirements and which let’s you run raw C close to the metal. Apple didn’t let us run the app with the necessary privileges.
In the very beginning we wrote our software for Android smartwatches but that approach caused several problems.
Our customers almost never had an Android smartwatch already. They bought whatever hardware we told them to. We earned $0 per sale even though Android smartwatches cost several times more money for our customers to buy. Customers kept asking us to make our own devices.
Many of our customers were iPhone users. Android smartwatches do not integrate optimally with iPhones.
Google kept changing the API, the user interface and the update/installation process. They even changed the name.
Smartwatches were frustratingly power inefficient in ways we could not alter.
Manufacturers frequently discontinued the smartwatch models we used, including discontinuing software updates, which eventually bricked them.
The smartwatches weren’t standardized enough that we could write software once and it would run well on all varieties.
The problem is that no wearable platform good enough was already owned by enough of the population. As counterintuitive as our approach might seem to a software developer, it made more sense for us to manufacture our own hardware.
You sound very confident your device would have worked really well. I’m curious, how much testing did you do?
I have a Garmin Vivosmart 3 and it tries to detect when I’m either running, biking, or going up stairs. It works amazingly well considering the tiny amount of hardware and battery power it has, but it also fails sometimes, like randomly thinking I’ve been running for a while when I’ve been doing some other high heart rate thing. Maddeningly, I can’t figure out how to turn off some of the alerts, like when I’ve met my “stair goal” for the day.
Only eating with a fork. A full system would require more data than that. We tested on real people in real-world conditions who were not part of the training dataset. If someone ate in a different style we could add just a little bit of annotated training data for the eating style, run the toolchain overnight and the algorithm would be noticeably better for that person and everyone else. The reason why I’m so confident in our algorith was because ① it required very little data to do updates and ② I had lots of experience in the field which meant I knew exactly what quality level was and wasn’t acceptable to customers.
To update the code in response to user feedback we would have to push the new code. Building an update system was theoretically straightforward. It was a (theoretically) solved problem with little technical risk. But it was not a problem that we had personally built a toolchain for and the whole firmware update system involved more technical maintenance than I wanted to commit myself to.
This is a beautiful piece of writing. I can feel you clearly here. Your care, your hope, your desolated disappointment. This short post took me on a journey. Thank you.
As a probably annoying but potentially enlightening aside, you might get a lot out of reading Lynne Forrest’s article on Karpman’s Drama Triangle. My guess is that you haven’t touched the true core of your heartbreak yet. If you want to, this might be a powerful direction for doing so.
I appreciate your supportive encouragement. This story took place over a year ago. I have had plenty of time to wrestle with the competing values. This wasn’t the first time I chose to abandon a project which potentially could have helped people at scale. I have limited resources. I have to make hard decisions. I like making hard decisions because the act of facing hard decisions implies I’m living life to the fullest.
I don’t think I need Karpman’s Drama Triangle right not but I do see the connection. It definitely would have helped me if I had read it ten years ago, but that is an unrelated story I do not expect to ever publish.
Why the heck is this at −5, am I missing something here? Tentatively upvoted unless someone tells me why this should be so far down.
[edit: Though I disagree with the downvotes, I now understand them due to the explanations in child comments]
Apart from what Richard said, the second paragraph has a very… “handing down nuggets of wisdom from on high”? vibe to me. Like, Val apparently thinks he knows better than lsusr what’s going on with lsusr’s emotions or something? (I interpret “you haven’t touched the true core” as something like “you think you know what’s going on but there’s more to it than that”, but it’s not clear and that’s part of the problem.) And if lsusr wants to learn, here’s a long article he can read. (Firefox reader mode puts it at 57-73 minutes.)
Val acknowledges that the paragraph is probably annoying, and uses words “guess” and “might”, and that makes it less obnoxious to me than it would be otherwise. But still obnoxious. More things that would make it less obnoxious to me:
What makes Val think lsusr hasn’t touched the true core of his heartbreak? This probably partly comes from details that don’t make sense if you don’t know the framework, but it should at least be possible to point at something in what lsusr wrote. Things like “you spend N words on this and 3N words on that, if you were in touch I’d expect roughly equal numbers of words”; or “your writing style when when talking about this is much more concise than your writing style when talking about that, like you’re trying to avoid thinking about it in detail”. If Val can’t point at something like this, I think that’s a bad sign and he should admit that he can’t.
Is there something specific that makes Val recommend this particular framework here? Or is it just his standard recommendation for getting in touch with emotions?
Give lsusr some way to say “no, that seems wrong” that doesn’t involve reading the long article. Like, “another explanation for what I see might be ___, and if you think that’s what’s going on then this probably won’t help you”.
More explicit acknowledgment that what he’s doing here is thinking he knows better than lsusr what’s going on with lsusr’s emotions; that this is the sort of guess that people frequently make while being dead wrong about; some reason why he thought it was worth making anyway.
(I don’t know that I would like the comment if it had those things. But I do think I would find it less bad.)
I appreciate the thorough explanation, it helped me to understand things here quite a bit.
I wondered that too. I miss the old LW social norm where downvotes were expensive and came with an explanation. Here I’m just left shrugging because I’m not sure what update to make.
(I mean this as a sharing of my experience, not a critique of how LW is designed. I’m sure Oli & Ben et al. put a ton of thought into details like this and landed on the current karma model for good reasons.)
I didn’t vote on it either way, but I read the Forrest article, and if someone made a top-level post promoting the “Drama Triangle” I would very likely give it a strong downvote. It’s yet another universal Procrustean psychological theory, every possible response to which can be shoehorned into the theory itself.
I also found the first paragraph of Valentine’s comment icky. It comes across to me as histrionic emoting, and I would not care to encounter that sort of thing addressed to myself. But it was addressed to lsusr and it is up to him how he takes it.
For what it’s worth, I took Valentine’s first paragraph as high praise. I write narratives with the deliberate intention of eliciting specific emotional responses and that was the exact emotional response I aimed to elicit when I wrote this story.
I’d feel icky too to read such a response to one of your [Richard_Kennaway’s] posts or to one of my drier posts. But I feel the emoting is appropriate in this context.
Is the implication that this story is a rescuer → victim arc?
I didn’t mean to imply that per se. But yes, I do see that playing a strong role here, and that’s why I thought to bring Forrest’s article forward here.