Screencasts could be scalable data + evals for single-user emulation (Guardian Angels)

This is a response to Gwern’s Guardian Angels post and draws terms from it quite extensively.

Epistemic status: draft and to-be-updated once I do more experiments.

Suppose you have a Guardian Angel (GA) model and you want to know how well it predicts your knowledge/​personality/​values/​preferences.

However, you don’t have an extensive list of personal writings to personalize the model (because you’re not one of the most prominent internet writers), and you don’t want to do a lot of data labeling/​supervision either. How would you train/​evaluate the model then?

Motivating examples

  • We give the GA model a held-out replay of your twitter-browsing session containing 10 tweets (could be either as text captions or a stiched-together screenshot). Can the model can actually predict which ones you’ll actually click on?

    • Doing it correctly depends on a lot of personalization! At least for me, I don’t click on most tweets on my timeline, and what I click depends a lot on my specific, mostly implicit taste, and twitter’s recommender system doesn’t seem to grok it well.

  • You’re reading a technical blogpost/​paper. Can the model predict which sections you’ll read or which concepts you’ll look up?

    • This depends on your knowledge (e.g. are you new to the domain?) and you taste (did you only read the abstract/​intro before you closed the tab? or did you read it top-to-bottom?)

My plan

I vibe-coded a macOS app that continuously take 4fps screencasts alongside a 4fps webcam feed (for possible eye tracking), plus timestamp aligned user inputs, and sends it to Cloudflare’s S3 equivalent.

image.png

So far we’re at ~8 hours now, so roughly 1GB/​hr, and doing this for 1 year is <10TB /​ <$150 on Cloudflare based on my usage patterns. (it goes up linearly if you keep them forever)

I intend to make a benchmark out of this using a bunch of heuristics, then try a bunch of methods to hill-climb it. Will report back if there’s any updates.

Other things to read

  • Michael S. Bernstein’s lab from StanfordHCI is thinking about user modeling among many other related things. (The company Simile.ai also spun off of it.)

No comments.