I have a PhD in Computational Neuroscience from UCSD (Bachelor’s was in Biomedical Engineering with Math and Computer Science minors). Ever since junior high, I’ve been trying to figure out how to engineer artificial minds, and I’ve been coding up artificial neural networks ever since I first learned to program. Obviously, all my early designs were almost completely wrong/unworkable/poorly defined, but I think my experiences did prime my brain with inductive biases that are well suited for working on AGI.
Although I now work as a data scientist in R&D at a large medical device company, I continue to spend my free time studying the latest developments in AI/ML/DL/RL and neuroscience and trying to come up with models for how to bring it all together into systems that could actually be implemented. Unfortnately, I don’t seem to have much time to develop my ideas into publishable models, but I would love to have the opportunity to share ideas with those who do.
Of course, I’m also very interested in AI Alignment (hence the account here). My ideas on that front mostly fall into the “learn (invertible) generative models of human needs/goals and hook those up to the AI’s own reward signal” camp. I think methods of achieving alignment that depend on restricting the AI’s intelligence or behavior are about as destined to failure in the long term as Prohibition or the War on Drugs in the USA. We need a better theory of what reward signals are for in general (probably something to do with maximizing (minimizing) the attainable (dis)utility with respect to the survival needs of a system) before we can hope to model human values usefully. This could even extend to modeling the “values” of the ecological/socioeconomic/political supersystems in which humans are embedded or of the biological subsystems that are embedded within humans, both of which would be crucial for creating a better future.
Yeah, using ChatGPT as a sounding board for developing ideas and providing constructive criticism, I was definitely starting to notice a whole lot of fawning. “Brilliant,” “extremely insightful,” etc. when there is no way that the model could actually have carried out a sufficient investigation of the ideas to make such an assessment.
That’s not even mentioning the fact that those insertions didn’t add anything substantial to the conversation. Really, it’s just hogging more space in the context window that could otherwise be used for helpful feedback.
What would have to change on a structural level for LLMs to meet that “helpful, honest, harmless” goal in a robust way? People are going to want AI partners that make them feel good, but could that be transformed into a goal of making people feel satisfied with how much they have been challenged to improve their critical thinking skills, their understanding of the world, and the health of their lifestyle choices?