My agenda for research into transformer capabilities - Introduction

I recently applied for funding for my research agenda. I probably did a pretty bad job. In fact just now before writing this second sentence, it occured to me to check my emails and, lo and behold, this specific application has just been denied.

This is not particularly surprising or depressing. However, the process of writing a grant application and also the aftermath of thinking about what I should have written led me to a clearer understanding of where I would like to go with my research.

I want to pour that new-found clarity into a series of posts, where I describe the questions I would like to investigate and how I plan on doing this with relatively minimal resources. Given that I have absolutely zero slack right now (writing this post is already cutting into sleep) and funding seems a remote possibility, this is also a way to put these ideas and approaches out there for others to be inspired.

My motivation

As a chess fanatic and a Deep Learning enthusiast, I have long dreamt about and occasionally worked on combining these two passions. Originally the idea was simply to model human concepts with Deep Learning based classifiers and integrate these classifiers into a system that could give human understandable feedback on games and moves. [Example: Label positions by whether the player is going to succumb to a mating attack later in the game. Train a classfier to predict this. Call it “king safety” and give feedback like “This move weakens your king’s position.”]

However, as I got deeper into Deep Learning my motivation changed towards “I want to try out all these cool techniques and ideas on my own little usecase”. Now, however, as Transformers are seemingly scaled in the general direction of AGI, my motivation has changed again towards understanding what transformers are capable of in the limit.

In late 2014 I had gotten interested in Deep Learning and came across a paper that used a CNN to quite accurately predict human moves in the game of Go. Having some interest in and a little knowledge about chess and Go engines, I realized that this was the missing piece in current Go engines. I briefly considered trying to implement this myself, but then decided that some industry giant would probably create a human competitive Go engine within a relatively short time.

Maybe the certainty I felt was misplaced. But obviously (in hindsight) this turned out to be exactly right. I think if we can get some understanding what the gaps in current architecture’s capabilities are, we do have a chance to recognize when something comes along that fills the gap. AlphaGo was predictable, maybe AGI is too.

General Outline

These are some core ideas and questions that I would like to tackle by training different transformer models on chess games and chess comments. I think chess has some unique properties that make the results of training transformers on games and moves much more interpretable than doing the same on language alone or language and images. It also is big data for a small budget, with roughly 80 billion game positions freely available.

Is system 2 thinking a gap in the capabilities of current large models or does it emerge at scale? In chess system 1 thinking (positional intuition) and system 2 thinking (calculation) can be clearly differentiated. The transformers I have trained show a clear lack of calculation. This can be quantified and investigated whether it changes at scale.
The world, what the model knows about the world, what the models says about the world. How these three hang together seems to me to be a core question. In chess the state of the world can be automatically analysed by powerful engines. This extends to the detection of high level concepts, like zugzwang, initiative, king safety, development, corresponding squares, etc. This means a multi-modal chess language model could be investigated for symbol grounding on several levels of complexity.
Chess is an antagonistic setting. Imagine RL-scenarios where playing and communicating has different effects on the outcome. For example several chess-language models giving verbal advice to one move-selection model. How quickly do they learn to lie, if incentivised to do so? How well do they lie, if the move-selection model learns to discard lies?

Right now I plan two further posts in which I will give technical details on how I intend to tackle the first two research ideas below. A short one about system 2 thinking and a longer one on pulling out all the stops to make a multi-modal chess language model feasible (so far chess comment generating models are extremely bad—I think I can do much better). I am not yet sure about part 3. Partly because I don’t know enough about RL, partly because I think it might be difficult to set up experiments that have a certain probability of surprise.

My agenda for research into transformer capabilities—Introduction

My motivation

General Outline