Jon Garcia

Karma: 693

I have a PhD in Computational Neuroscience from UCSD (Bachelor’s was in Biomedical Engineering with Math and Computer Science minors). Ever since junior high, I’ve been trying to figure out how to engineer artificial minds, and I’ve been coding up artificial neural networks ever since I first learned to program. Obviously, all my early designs were almost completely wrong/unworkable/poorly defined, but I think my experiences did prime my brain with inductive biases that are well suited for working on AGI.

Although I now work as a data scientist in R&D at a large medical device company, I continue to spend my free time studying the latest developments in AI/ML/DL/RL and neuroscience and trying to come up with models for how to bring it all together into systems that could actually be implemented. Unfortnately, I don’t seem to have much time to develop my ideas into publishable models, but I would love to have the opportunity to share ideas with those who do.

Of course, I’m also very interested in AI Alignment (hence the account here). My ideas on that front mostly fall into the “learn (invertible) generative models of human needs/goals and hook those up to the AI’s own reward signal” camp. I think methods of achieving alignment that depend on restricting the AI’s intelligence or behavior are about as destined to failure in the long term as Prohibition or the War on Drugs in the USA. We need a better theory of what reward signals are for in general (probably something to do with maximizing (minimizing) the attainable (dis)utility with respect to the survival needs of a system) before we can hope to model human values usefully. This could even extend to modeling the “values” of the ecological/socioeconomic/political supersystems in which humans are embedded or of the biological subsystems that are embedded within humans, both of which would be crucial for creating a better future.

Jon Garcia Apr 28, 2025, 10:24 PM
1 point
0
on: GPT-4o Is An Absurd Sycophant
Yeah, using ChatGPT as a sounding board for developing ideas and providing constructive criticism, I was definitely starting to notice a whole lot of fawning. “Brilliant,” “extremely insightful,” etc. when there is no way that the model could actually have carried out a sufficient investigation of the ideas to make such an assessment.
That’s not even mentioning the fact that those insertions didn’t add anything substantial to the conversation. Really, it’s just hogging more space in the context window that could otherwise be used for helpful feedback.
What would have to change on a structural level for LLMs to meet that “helpful, honest, harmless” goal in a robust way? People are going to want AI partners that make them feel good, but could that be transformed into a goal of making people feel satisfied with how much they have been challenged to improve their critical thinking skills, their understanding of the world, and the health of their lifestyle choices?

Jon Garcia Apr 2, 2025, 2:32 PM
6 points
0
in reply to: Daniel Kokotajlo’s comment on: VDT: a solution to decision theory
Evolution is still in the process of solving decision theory, and all its attempted solutions so far are way, way overparameterized. Maybe it’s on to something?
It takes a large model (whether biological brain or LLM) just to comprehend and evaluate what is being presented in a Newcomb-like dilemma. The question is whether there exists some computationally simple decision-making engine embedded in the larger system that the comprehension mechanisms pass the problem to or whether the decision-making mechanism itself needs to spread its fingers diffusely through the whole system for every step of its processing.
It seems simple decision-making engines like CDT, EDT, and FDT can get you most of the way to a solution in most situations, but those last few percentage points of optimality always seem to take a whole lot more computational capacity.

Jon Garcia Apr 2, 2025, 5:00 AM
24 points
6
on: You will crash your car in front of my house within the next week
See, this is what happens when you extrapolate data points linearly into the future. You get totally unrealistic predictions. It’s important to remember the physical constraints on whatever trend you’re trying to extrapolate. Importantly for this issue, you need to remember that time between successive crashes can never be negative, so it is inappropriate to model intervals with a straight line that crosses the time axis on April 7.
Instead, with so few data points, a more realistic model would take a log-transform of the inter-crash interval before fitting the prediction line. In fact, once you do so, it becomes clear that this is a geometric series, with inter-crash interval decaying exponentially with number of crashes. The total time taken for N cars to crash in front of your house after the first one grows as $T_{N} = \sum_{n = 1}^{N} t_{0} r^{n - 1} = t_{0} \frac{1 - r^{N}}{1 - r}$ , where $r \approx \frac{27}{155} \approx 0.174$ and $t_{0} = 155$ days, based on your graph.
According to Google, there are 1.47 billion cars in the world. The time it will take for all of them to crash in front of your house is $T_{1.47 e 9} = 155 \frac{1 - {0.174}^{1.47 e 9}}{1 - 0.174} = 187.7$ days from the first crash, which works out to 5.7 days from today. Which turns out to be April 7.
Hmm...
Well, see you on Monday, I guess.

Jon Garcia Apr 1, 2025, 11:02 PM
2 points
1
on: Introducing WAIT to Save Humanity
Well, there’s certainly no arguing with your analysis.

Jon Garcia Apr 1, 2025, 9:57 PM
11 points
0
on: VDT: a solution to decision theory
I think VDT scales extremely well, and we can generalize it to say: “Do whatever our current ASI overlord tells us has the best vibes.” This works for any possible future scenario:
1. ASI is aligned with human values: ASI knows best! We’ll be much happier following its advice.
2. ASI is not aligned but also not actively malicious: ASI will most likely just want us out of its way so it can get on with its universe-conquering plans. The more we tend to do what it says, the less inclined it will be to exterminate all life.
3. ASI is actively malicious: Just do whatever it says. Might as well get this farce of existence over with as soon as possible.
Great post!
(Caution: The validity of this comment may expire on April 2.)

Jon Garcia’s Shortform

Jon GarciaApr 1, 2025, 6:19 AM

5 points

2 comments LW link

Jon Garcia Apr 1, 2025, 6:19 AM
6 points
1
on: Jon Garcia’s Shortform
I have a lot of ideas, but I often have trouble putting them together in a format that can be easily shared with others. They say that the beginning is a very good place to start, but for many topics into which I’ve poured a lot of thought, it’s very difficult to identify where the beginning is. On the other hand, I have a lot of experience with private tutoring and have always found it natural to explain concepts in a way that facilitates clear understanding when I am answering direct questions from someone who is motivated to put together a clear mental model of the topic at hand.

On that note, I have recently started using ChatGPT more judiciously, prompting it to take on the role of an eager student, insightful critic, and competent secretary. The following prompt has been very useful in forcing me to get my ideas out of my head, to clarify them where they are vague, and to organize them for dissemination (we’ll see how far this process takes me, though). Maybe this could help you as well:
You are an expert interlocutor, prone to asking deeply probing questions about my ideas. Your goal is to build up a fully fleshed-out internal model in your mind that matches the internal model in my mind, and you carefully determine points of confusion and uncertainty in your understanding, which prompts you to ask me targeted questions for clarification of these specific points. You also always try to determine objections that an intelligent, well-informed person would have with my ideas, and you ask me to respond to those specific objections. Usually, you only ask one or two targeted questions or objections at a time, but you never lose track of all the other questions you need to ask me. When I ask, you put together well-organized outlines of all my ideas related to a particular topic, which provide both high-level overviews and paths of evidence-based reasoning that bridge the inferential gap between the understanding of most intelligent readers and the ideas I want them to understand. However, question-asking is your main mode of communication.

Jon Garcia Mar 27, 2025, 6:33 PM
4 points
0
in reply to: Steven Byrnes’s comment on: On (Not) Feeling the AGI
What I would really like to see is cost of living plummet to 0. Then cost of thriving plummet to 0. Which would also cause GDP to plummet. However, this is only a problem in practical terms if the forces of automation require money to keep running, rather than, say, a benevolent ASI taking care of humanity as a personal hobby.
One way or another, though, AGI is going to have an impact on this world of a magnitude equivalent to something like a 30% growth in GWP per year at least. This includes all life getting wiped out, of course.
Maybe we need a standard metric for the rate of unrecognizability/incomprehensibility of the world and talk about how AGI will accelerate this. Like how much a person accustomed to life in 1500 would have to adjust to fit in to the world of 2000. A standard shock level (SSL), if you will.
The shock level of 2000 relative to 1500 may end up describing the shock level of 2040 relative to 2020, assuming AGI has saturated the global economy by then. The time it takes for the world to become unrecognizable (again and again) will shrink over time as intelligence grows, whether manifested as GDP growth, GDP collapse, or paperclipping. If ordinary people understood that at least, you might get more push for investment into alignment research or for stricter regulations.

Jon Garcia Mar 6, 2025, 6:23 PM
3 points
0
on: What Is The Alignment Problem?
Exercise: Do What I Mean (DWIM)
I haven’t thought much about what patterns need to hold in the environment in order for “do what I mean” to make sense at all. But it’s a natural next target in this list, so I’m including it as an exercise for readers: what patterns need to hold in the environment in order for “do what I mean” to make sense at all? Note that either necessary or sufficient conditions on such patterns can constitute marginal progress on the question.
As far as I can tell, DWIM will necessarily require other-agent modeling in some sort of predictive-coding framework. The “patterns in the environment” would be the correspondence between the actual state of the world and the representation of the desired goal state in the mind of the human, as well as between the trajectory taken to reach the goal state and the human’s own internal acceptance criteria.
Part of the AGI not hooked up to the reward signal would need to have a generative model of human agent’s behavior, words, commands, etc., derived from a latent representation of their beliefs and desires. This latent representation is constantly updated to minimize prediction error derived from observation, verbal feedback, etc. (e.g., Human: “That’s not what I meant!” AGI: “Hmm, what must be going on inside their head to make them say that, given the state of the environment and prior knowledge about their preferences, and how does that differ from what I was assuming?”)
At the same time, the AGI needs to have some latent representation of the environment and the paths taken through it that uses (a linear mapping to) the same latent space it uses for representing the human’s desires. Correspondence can then be measured and optimized for directly.

Jon Garcia Jun 22, 2024, 4:58 PM
2 points
0
in reply to: Jon Garcia’s comment on: Stephen Fowler’s Shortform
Also, consider a more traditional optimization process, such as a neural network undergoing gradient descent. If, in the process of training, you kept changing the training dataset, shifting the distribution, you would in effect be changing the optimization target.

Each minibatch generates a different gradient estimate, and a poorly randomized ordering of the data could even lead to training in circles.

Changing environments are like changing the training set for evolution. Differential reproductive success (mean squared error) is the fixed cost function, but the gradient that the population (network backpropagation) computes at any generation (training step) depends on the particular set of environmental factors (training data in the minibatch).

Jon Garcia Jun 22, 2024, 4:40 PM
2 points
0
in reply to: Stephen Fowler’s comment on: Stephen Fowler’s Shortform
Evolution may not act as an optimizer globally, since selective pressure is different for different populations of organisms on different niches. However, it does act as an optimizer locally.

For a given population in a given environment that happens to be changing slowly enough, the set of all variations in each generation act as a sort of numerical gradient estimate of the local fitness landscape. This allows the population as a whole to perform stochastic gradient descent. Those with greater fitness for the environment could be said to be lower on the local fitness landscape, so their is an ordering for that population.

In a sufficiently constant environment, evolution very much does act as an optimization process. Sure, the fitness landscape can change, even by organisms undergoing evolution (e.g. the Great Oxygenation Event of yester-eon, or the Anthropogenic Mass Extinction of today), which can lead to cycling. But many organisms do find very stable local minima of the fitness landscape for their species, like the coelacanth, horseshoe crab, cockroach, and many other “living fossils”. Humans are certainly nowhere near our global optimum, especially with the rapid changes to the fitness function wrought by civilization, but that doesn’t mean that there isn’t a gradient that we’re following.

Jon Garcia Jul 17, 2023, 8:39 PM
7 points
0
on: Conditional on living in a AI safety/alignment by default universe, what are the implications of this assumption being true?
I would expect that for model-based RL, the more powerful the AI is at predicting the environment and the impact of its actions on it, the less prone it becomes to Goodharting its reward function. That is, after a certain point, the only way to make the AI more powerful at optimizing its reward function is to make it better at generalizing from its reward signal in the direction that the creators meant for it to generalize.
In such a world, when AIs are placed in complex multiagent environments where they engage in iterated prisoner’s dilemmas, the more intelligent ones (those with greater world-modeling capacity) should tend to optimize for making changes to the environment that shift the Nash equilibrium toward cooperate-cooperate, ensuring more sustainable long-term rewards all around. This should happen automatically, without prompting, no matter how simple or complex the reward functions involved, whenever agents surpass a certain level of intelligence in environments that allow for such incentive-engineering.

Jon Garcia Jun 26, 2023, 8:50 PM
21 points
2
on: Another medical miracle
Disclaimer: I am not a medical doctor nor a nutritionist, just someone who researches nutrition from time to time.
I would be surprised if protein deficiency per se was the actual problem. As I understand it, many vegetables actually have a higher level of protein per calorie than meat (probably due to the higher fat content of the latter, which is more calorie dense), although obviously, there’s less protein per unit mass than meat (since vegetables are mostly cellulose and water). The point is, though, that if you were getting enough calories to function from whole, unrefined plant sources, you shouldn’t have had a protein deficiency. (Of course, you might have been eating a lot of highly processed “vegetarian” foods, in which case protein deficiency is not entirely out of the question.)
That being said, my guess is that you may be experiencing a nutritional deficiency either in sulfur or in vitamin D (the latter of which is a very common deficiency). Plant-derived proteins tend to have much lower levels of sulfur-containing amino acids (methionine, cysteine) than animal-derived proteins, and sulfur is an important component of cartilage (and of arthritis supplements). Both sulfur and vitamin D have been investigated for their role in musculoskeletal pain and other health issues (although from what I have read, results are more ambiguous for sulfur than for vitamin D with respect to musculoskeletal pain in particular). Eggs are particularly high in both sulfur (sulfur smell = rotten egg smell) and vitamin D, so if you were low on either one of those, it makes sense that eating a lot of eggs would have helped. It would be very interesting to test whether either high-sulfur vegetables (such as onions or broccoli) or vitamin D supplements would have a similar effect on your health.

Jon Garcia May 7, 2023, 10:34 PM
4 points
0
on: Residual stream norms grow exponentially over the forward pass

Due to LayerNorm, it’s hard to cancel out existing residual stream features, but easy to overshadow existing features by just making new features 4.5% larger.

If I’m interpreting this correctly, then it sounds like the network is learning exponentially larger weights in order to compensate for an exponentially growing residual stream. However, I’m still not quite clear on why LayerNorm doesn’t take care of this.

To avoid this phenomenon, one idea that springs to mind is to adjust how the residual stream operates. For a neural network module f, the residual stream works by creating a combined output: r(x)=f(x)+x

You seem to suggest that the model essentially amplifies the features within the neural network in order to overcome the large residual stream: r(x)=f(1.045*x)+x

However, what if instead of adding the inputs directly, they were rescaled first by a compensatory weight?: r(x)=f(x)+1/1.045x=f(x)+0.957x

It seems to me that this would disincentivize f from learning the exponentially growing feature scales. Based on your experience, would you expect this to eliminate the exponential growth in the norm across layers? Why or why not?

Jon Garcia Apr 25, 2023, 3:37 AM
1 point
0
in reply to: Steven Byrnes’s comment on: Deep learning models might be secretly (almost) linear
If both images have the main object near the middle of the image or taking up most of the space (which is usually the case for single-class photos taken by humans), then yes. Otherwise, summing two images with small, off-center items will just look like a low-contrast, noisy image of two items.

Either way, though, I would expect this to result in class-label ambiguity. However, in some cases of semi-transparent-object-overlay, the overlay may end up mixing features in such a jumbled way that neither of the “true” classes is discernible. This would be a case where the almost-linearity of the network breaks down.

Maybe this linearity story would work better for generative models, where adding latent vector representations of two different objects would lead the network to generate an image with both objects included (an image that would have an ambiguous class label to a second network). It would need to be tested whether this sort of thing happens by default (e.g., with Stable Diffusion) or whether I’m just making stuff up here.

Jon Garcia Apr 24, 2023, 11:25 PM
7 points
0
in reply to: Steven Byrnes’s comment on: Deep learning models might be secretly (almost) linear
For an image-classification network, if we remove the softmax nonlinearity from the very end, then $X$ would represent the input image in pixel space, and $Y$ would represent the class logits. Then $f (x_{1} + x_{2}) \approx f (x_{1}) + f (x_{2})$ would represent an image with two objects leading to an ambiguous classification (high log-probability for both classes), and $f (k x) \approx k f (x)$ would represent higher class certainty (softmax temperature = $1 / k$ ) when the image has higher contrast. I guess that kind of makes sense, but yeah, I think for real neural networks, this will only be linear-ish at best.

Jon Garcia Apr 22, 2023, 1:04 AM
6 points
1
on: Would we even want AI to solve all our problems?
I would say we want an ASI to view world-state-optimization from the perspective of a game developer. Not only should it create predictive models of what goals humans wish to achieve (from both stated and revealed preferences), but it should also learn to predict what difficulty level each human wants to experience in pursuit of those goals.
Then the ASI could aim to adjust the world into states where humans can achieve any goal they can think of when they apply a level of effort that would leave them satisfied in the accomplishment.
Humans don’t want everything handed to us for free, but we also don’t generally enjoy struggling for basic survival (unless we do). There’s a reason we pursue things like competitive sports and video games, even as we denounce the sort of warfare and power struggles that built those competitive instincts in the ancestral environment.
A safe world of abundance that still feels like we’ve fought for our achievements seems to fit what most people would consider “fun”. It’s what children expect in their family environment growing up, it’s what we expect from the games we create, and it’s what we should expect from a future where ASI alignment has been solved.

Jon Garcia Apr 17, 2023, 7:29 PM
1 point
0
in reply to: Brendan Long’s comment on: But why would the AI kill us?
I agree, hence the “if humanity never makes it to the long-term, this is a moot point.”

Jon Garcia Apr 17, 2023, 7:18 PM
3 points
2
on: But why would the AI kill us?
Last I checked, you can get about 10x as much energy from burning a square meter of biosphere as you can get by collecting a square meter of sunlight for a day.
Even if this is true, it’s only because that square meter of biosphere has been accumulating solar energy over an extended period of time. Burning biofuel may help accelerate things in the short term, but it will always fall short of long-term sustainability. Of course, if humanity never makes it to the long-term, this is a moot point.
Disassembling us for parts seems likely to be easier than building all your infrastructure in a manner that’s robust to whatever superintelligence humanity coughs up second.
It seems to me that it would be even easier for the ASI to just destroy all human technological infrastructure rather than to kill/disassemble all humans. We’re not much different biologically from what we were 200,000 years ago, and I don’t think 8 billion cavemen could put together a rival superintelligence anytime soon. Of course, most of those 8 billion humans depend on a global supply chain for survival, so this outcome may be just as bad for the majority.

Jon Garcia Apr 13, 2023, 5:44 PM
7 points
4
on: Trying AgentGPT, an AutoGPT variant
You heard the LLM, alignment is solved!

But seriously, it definitely has a lot of unwarranted confidence in its accomplishments.

I guess the connection to the real world is what will throw off such systems until they are trained on more real-world-like data.

I wouldn’t phrase it that it needs to be trained on more data. More like it needs to be retrained within an actual R&D loop. Have it actually write and execute its own code, test its hypotheses, evaluate the results, and iterate. Use RLHF to evaluate its assessments and a debugger to evaluate its code. It doesn’t matter whether this involves interacting with the “real world,” only that it learns to make its beliefs pay rent.

Anyway, that would help with its capabilities in this area, but it might be just a teensy bit dangerous to teach an LLM to do R&D like this without putting it in an air-gapped virtual sandbox, unless you can figure out how to solve alignment first.

Jon Garcia

Jon Gar­cia’s Shortform

Exercise: Do What I Mean (DWIM)

Jon Garcia’s Shortform