Elliot Callender

Karma: 54

Elliot Callender 27 Jan 2026 22:03 UTC
1 point
0
on: Bounty: Detecting Steganography via Ontology Translation
Bounties (fractional funds distributed in good faith if you solve part of a problem):
- 1500$ for an algo which individuates a sufficient portion of activation space into semantically meaningful polytopes (or fuzzy loci) such we can detect steganography during training with minimal human oversight in polynomial (constant exponent across architectures) or faster time
- 750$ for strong handles on the sorts of downstream activation patterns by which we can cluster upstream polytopes, and additional 300$ for polynomial or faster clustering algo
- Happy to fund solutions to other subproblems as well. Comment or dm.

Elliot Callender 30 Dec 2025 22:41 UTC
2 points
0
on: How to game the METR plot
There’s something completely different going on for tasks longer than 1 minute, clearly not explained by the log-linear fit.
Perhaps humans generating training data are, for longer tasks, taking cognitive steps which are opaque to these models, or at least relatively more difficult to learn?
I’d wager 1:1 that this sort of abstraction-domain mismatch between human training data and LLMs is causing more of the HCAST weirdness than skewed finetuning investment.

Elliot Callender 22 Dec 2025 17:55 UTC
3 points
0
on: Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance
Interesting!
What do we see if we apply interpretability tools to the filler tokens or repeats of the problem?
I would be especially interested in how this evolves through training, perhaps by training a more accessible model to do math / code classification with many filler tokens.
Overall, these results demonstrate a case where LLMs can do (very basic) meta-cognition without CoT.
Can you clarify what you mean by meta-cognition? I’m intuiting that these LLMs are using the extra embeddings afforded by appended tokens to do more parallel ops, which does not sound like meta-cognition to me.

Elliot Callender 22 Dec 2025 17:08 UTC
1 point
0
in reply to: Kiyota’s comment on: Cognition Augmentation Org
I am aiming all of my resources at this, which for now looks externally like saving/investing personal capital, writing biological (molecular, NN) simulations, and searching for advice. Feel free to message me on Signal at (+1)-478-456-9667 if you want specific examples of my ideas; I expect that the entities I’m worried about accessing my research will do so after (if) it is legibly useful.

Elliot Callender 17 Dec 2025 11:52 UTC
2 points
0
on: Scientific breakthroughs of the year
Awesome! I’m looking forward to reading many of these while traveling in the coming weeks.
Might I suggest, though, that you add $log P (b i g)$ to the importance score instead of multiplying? It doesn’t make sense to multiply a non-log term by a logspace term.

Elliot Callender 17 Dec 2025 10:34 UTC
1 point
0
on: Eliezer’s Unteachable Methods of Sanity
And a fiat decision to stay sane, implemented by not instructing myself that any particular stupidity or failure will be my reaction to future stress.
I have not implemented the other two, but this decision I made during HPPD-like psychosis; yes, it is for some a learnable skill.

Elliot Callender 13 Feb 2025 0:41 UTC
1 point
0
in reply to: Ariel’s comment on: So You Want To Make Marginal Progress...
How much would you say (3) supports (1) on your model? I’m still pretty new to AIS and am updating from your model.
I agree that marginal improvements are good for fields like medicine, and perhaps so too AIS. E.g. I can imagine self-other overlap scaling to near-ASI, though I’m doubtful about stability under reflection. I’ll put 35% we find a semi-robust solution sufficient to not kill everyone.
Given my model, I think 20% generalizability is worth a person’s time. Given yours, I’d say 1% is enough.
I think that the distribution of success probability of typical optimal-from-our-perspective solutions is very wide for both of the ways we describe generalizability; within that, we should weight generalizability heavier than my understanding of your model does.
Earlier:
Designing only best-worst-case subproblem solutions while waiting for Alice would be like restricting strategies in game to ones agnostic to the opponent’s moves
Is this saying people should coordinate in case valuable solutions aren’t in the apriori generalizable space?

Elliot Callender 10 Feb 2025 16:52 UTC
2 points
0
in reply to: Ariel’s comment on: So You Want To Make Marginal Progress...
I strongly think cancer research has a huge space and can’t think of anything more difficult within biology.
I was being careless / unreflective about the size of the cancer solution space, by splitting the solution spaces of alignment and cancer differently; nor do I know enough about cancer to make such claims. I split the space into immunotherapies, things which target epigenetics / stem cells, and “other”, where in retrospect the latter probably has the optimal solution. This groups many small problems with possibly weakly-general solutions into a “bottleneck”, as you mentioned:
aging may be a general factor to many diseases, but research into many of the things aging relates to is composed of solving many small problems that do not directly relate to aging, and defining solving aging as a bottleneck problem and judging generalizability with respect to it doesn’t seem useful.
Later:
Define the baseline distribution generalizability is defined on.
For a given problem, generalizability is how likely a given sub-solution is to be part of the final solution, assuming you solve the whole problem. You might choose to model expected utility, if that differs between full solutions; I chose not to here because I natively separate generality from power.
Give a little intuition about why a threshold is meaningful, rather than a linear “more general is better”.
I agree that “more general is better” with a linear or slightly superlinear (because you can make plans which rely heavier on solution) association with success probability. We were already making different value statements about “weakly” vs “strongly” general, where putting concrete probabilities / ranges might reveal us to agree w.r.t the baseline distribution of generalizability and disagree only on semantics.
I.e. thresholds are only useful for communication.
Perhaps a better way to frame this is in ratios of tractability (how hard to identify and solve) and usefulness (conditional on the solution working) between solutions with different levels generalizability. E.g. suppose some solution $w$ is 5x less general than $g$ . Then you expect, for the types of problems and solutions humans encounter, that $w$ will be more than 5x as tractable * useful as $g$ .
I disagree in expectation, meaning for now I target most of my search at general solutions.
My model of the central AIS problems:
1. How to make some AI do what we want? (under immense functionally adversarial pressures)
  1. Why does the AI do things? (Abstractions / context-dependent heuristics; how do agents split reality given assumptions about training / architecture)
  2. How do we change those things-which-cause-AI-behavior?
2. How do we use behavior specification to maximize our lightcone?
  1. How to actually get technical alignment into a capable AI? (AI labs / governments)
  2. What do we want the AI to do? (“Long reflection” / CEV / other)
I’d be extremely interested to hear anyone’s take on my model of the central problems.

Elliot Callender 9 Feb 2025 1:32 UTC
3 points
0
in reply to: Ariel’s comment on: So You Want To Make Marginal Progress...
I think general solutions are especially important for fields with big solution spaces / few researchers, like alignment. ~~If you were optimizing for, say, curing cancer, it might be different (I think both the paradigm-and subproblem-spaces are smaller there).~~
From my reading of John Wentworth’s Framing Practicum sequence, implicit in his (and my) model is that solution spaces for these sorts of problems are apriori enormous. We (you and I) might also disagree on what apriori feasibility would be “weakly” vs “strongly” generalizable; I think my transition is around 15-30%.

Elliot Callender 26 Jun 2024 17:44 UTC
1 point
0
in reply to: RogerDearnaley’s comment on: Contrapositive Natural Abstraction—Project Intro
Shoot, thanks. Hopefully it’s clearer now.

Elliot Callender 26 Jun 2024 17:43 UTC
1 point
0
in reply to: RogerDearnaley’s comment on: Contrapositive Natural Abstraction—Project Intro
Yes, I agree. I expect abstractions, typically, to involve much more than 4-8 bits of information. On my model, any neural network, be it MLP, KAN or something new, will approximate abstractions with multiple nodes in parallel when the network is wide enough. I.e. the causal graph I mentioned is very distinct from the NN which might be running it.
Though now that you mentioned it, I wonder if low-precision NN weights are acceptable because of some network property (maybe SGD is so stochastic that higher precision doesn’t help) or the environment (maybe natural latents tend to be lower-entropy)?
Anyways, thanks for engaging. It’s encouraging to see someone comment.

Elliot Callender 23 Jun 2024 17:57 UTC
3 points
0
on: Framing Practicum: Dynamic Equilibrium
This one was a lot of fun!
1. ROS activity in some region of the body is a function of antioxidant bioavailability, heat, and oxidant bioavailability. I imagine this relationship is the inverse of some chemical rate laws, i.e. dependent on which antioxidants we’re looking at. But since I expect most antioxidants to work as individual molecules, the relationship is probably $\frac{1}{a x}$ , i.e. ROS activity is inverse w.r.t. some antioxidant’s potency and concentration if we ignore other antioxidants. The bottom term can also be a sum across all antioxidants, given no synergistic / antagonistic interactions!
2. Transistor reliability is probably a function of heat, band gap and voltage? I imagine that, in fact, reliability is hysteretic in terms of band gap and voltage! When the gap is lower, noise can cross more easily, and when it’s too high there won’t be enough voltage for it to pass (without overheating your circuit). And heat increases noise. I think that information transmission might be exponential or Gaussian centered around the optimum, parameterized by $\frac{v o l t a g e}{g a p}$ . Does anyone have an equation for this?
3. Ant movement speed is probably an equilibrium between evolved energy-conservation priors, available calories and pheromones. Let’s just focus on pheromones which make the ant move faster. Energy (perhaps as $\frac{k c a l}{s e c}$ ) and pheromones (say, $m o l / L$ ) are probably each about $O (\sqrt{x})$ predictors of speed, since I’m imagining material stress of movement ( $O (v^{2})$ ) to be the main energy sink. Let $y = \sqrt{e} * \sqrt{p}$ , where $p = p h e r o m o n e s$ . I don’t know what the evolved frugality priors look like, but expect they can just map $y \to s p e e d$ without needing the subcomponents $e$ and $p$ , at least as far as big-O notation goes.

Elliot Callender 23 Jun 2024 17:01 UTC
3 points
0
on: Framing Practicum: Bistability
1. Sleep / wakefulness; hypnagogia seems transient and requires conscious effort to maintain. Outside stimuli and internal volition can wake people up; lack thereof can do the opposite.
2. Friendships; I tend to have few, close friendships. I don’t interact much with more distant friends because it’s less emotionally fulfilling, so they slowly fade towards being acquaintances. I distance myself from people I don’t connect with / feel safe around, and try to strengthen bonds with people I think are emotionally mature and interesting.
3. Focus; I tend to either be checked out or deeply zoned-in. There’s strong momentum here, especially for cognitively engaging tasks. Anything which I expect to impair my work will push me into “maintenance” mode, where I conserve energy and do less object-level work. This takes engagement with interesting stuff plus willed focus to recover from.

Elliot Callender 23 Jun 2024 16:37 UTC
3 points
0
on: Framing Practicum: Stable Equilibrium
I know this post is old(ish), but still think this exercise is worth doing!
1. Deep ocean currents; I expect changes in ocean floor topography and deep-water inertial/thermal changes to matter. I don’t expect shallow-water topography to matter, nor wind (unless we have sustained 300+kph winds for weeks straight).
2. Earth’s magnetic pole directions; I’m not sure what causes them. I think they’re generated by induction from magma movement? In that case, our knobs are those currents. I don’t think anything can change the equilibrium without changing the flow patterns, minus stuff like magma composition which can eliminate magnetism.
3. Tourism to, say, Tokyo; the following factors are both compared to other destinations and just Tokyo, and don’t span our knob-space. Public opinion and salience, travel costs (time and money), hotel availability, and number of people who speak Japanese. I think that if we know these, most other markets become rounding errors, though I wouldn’t be too sure.

Elliot Callender 18 Jun 2024 2:09 UTC
3 points
0
on: Towards a Less Bullshit Model of Semantics
I agree that this seems like a very promising direction.
Beyond that, we of course want our class of random variables to be reasonably general and cognitively plausible as an approximation—e.g. we shouldn’t assume some specific parametric form.
Could you elaborate on this; “reasonably general” sounds to me like the redundancy axiom, so I’m unclear about whether this sentence is an intuition pump.

Elliot Callender 12 Jun 2024 18:38 UTC
21 points
5
on: My AI Model Delta Compared To Christiano
I think it depends on which domain you’re delegating in. E.g. physical objects, especially complex systems like an AC unit, are plausibly much harder to validate than a mathematical proof.
In that vein, I wonder if requiring the AI to construct a validation proof would be feasible for alignment delegation? In that case, I’d expect us to find more use and safety from [ETA: delegation of] theoretical work than empirical.

Elliot Callender 5 Jun 2024 20:47 UTC
7 points
4
on: How should I think about my career?
I’m in a very similar situation, graduating next spring with a math degree in the US. I’ll sketch out my personal situation (to help contextualize my advice) followed my approach for career scouting. If you haven’t checked out 80k hours, I really suggest doing so, because they have much more thorough and likely wiser advice than I do.
I’m a 19-year-old undergrad in a rural part of the US. My dad’s a linguistics professor and pushing me to do a PhD. I want to do AI safety research, and am currently weighing the usefulness of a PhD compared to saving money to do self-funded work. I’m also sort-of Buddhist / nihilist / absurdist, which points me towards utilitarianism.
I strongly encourage anything to do with AI safety. Specific examples here include working to donate money to Open Phil’s longtermism fund, policy research, nonprofit alignment research, being a DeepMind Scalable Alignment researcher, and software development for Lightcone Infrastructure. I’d be very careful here though; are you looking for local or global goods? E.g. I’ve a friend working to improve ethical data collection, which I think is important in a platonic sense, but not comparable to x-risk work.
Onto processes. Writing out all of my thoughts helps me to be rigorous and honest with myself. It increases my functional working memory because my thoughts are saved on screen, freeing up cognitive capacity for introspection.
Say for example that I’m weighing how I’d research in the EU vs US. I write down how I feel initially, including possible biases (EU probably has better living conditions; US has more researchers; I should be careful not to anchor on these feelings). As I go through, I find knowledge gaps (where will I have more free time, and by how much?) and brainstorm how to fill them (my dad knows German researchers. They’d be good to ask about this). I find extelligence helps me move much faster and build a game plan.
Another thing is to discuss your plans with others. I know LessWrong is an example, but in-person discussion is probably better.
If I can make a difference to enough people or to the world and leave it a better place than I found it then at least I wasn’t entirely pointless or a complete waste of space, oxygen and other natural resources. So far, I have spent my life learning and becoming a functioning adult, but now it’s time to start really earning my place here.
I strongly caution you to watch out for obligation / guilt. Even if you don’t feel it yet, the mindset “I owe this to the world” can push you to some dark and counterproductive places. As said here, make sure you’ve put your own oxygen mask on before helping others.
Feel free to message me. Best of luck.

Elliot Callender 8 Jan 2024 0:17 UTC
1 point
0
on: Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
We know that some genes are only active in the womb, or in childhood, which should make us very skeptical that editing them would have an effect.
Would these edits result in demethylated DNA? A reversion of the epigenome could allow expression of infant genes. There may also be robust epigenomic therapies developed by the time this project would be scalable.
Companies like 23&Me genotyped their 12 millionth customer two years ago and could probably get at perhaps 3 million customers to take an IQ test or submit SAT scores.
Just as you mentioned academics’ aversion from this area, I think genomics companies would be reluctant at best to ask their customers for test scores. Perhaps it wouldn’t be bad PR once the public is more concerned about existential AI. Governments might be more willing to provide data.