Retrospective: Lessons from the Failed Alignment Startup AISafety.com

TL;DR: Attempted to create a startup to contribute to solving the AI alignment problem. Ultimately failed due to rapid advancements in large language models and the inherent challenges of startups.

In early 2021, I began considering shorter AI development timelines and started preparing to leave my comfortable software development job to work on AI safety. Since I didn’t feel competent enough to directly work on technical alignment, my goal was capacity-building, personal upskilling, and finding a way to contribute.

During our reading group sessions, we studied Cotra’s “Case for Aligning Narrowly Superhuman Models” which made a compelling argument for working with genuinely useful models. This inspired us to structure our efforts as a startup. Our team comprised of Volkan Erdogan, Timothy Aris, Robert Miles, and myself, Søren Elverlin. We planned to offer companies automation of certain business processes using GPT-3 in exchange for alignment-relevant data for research purposes.

Given my strong deontological aversion to increasing AI capabilities, I aimed to keep the startup as stealthy as possible without triggering the Streisand effect. This decision significantly complicated fundraising and customer acquisition.

In November 2021, I estimated a 20% probability of success, a view shared by my colleagues. I was fully committed, investing DKK420,000 (USD55,000), drawing no salary for myself, and providing modest compensation to the others.

Startup literature generally advises against one-person startups. Despite our team of four, I was taking on a disproportionate amount of work and responsibility, which should have raised red flags.

My confidence in our success grew during the spring of 2022 when a personal contact helped me secure a preliminary project with a large company that wished to remain anonymous. For $1,300/​month, I sold them a business automation solution that relied solely on a large language model for code-generation. However, it didn’t provide us with the data we sought. Both parties understood this was a preliminary project, and the company seemed eager about the full project.

Securing this project early on made Rob’s role redundant, and we amicably parted ways. Half a year later Tim was offered a PhD, leaving Volkan and I (with minimal help from Berk and Ali).

The preliminary project involved validating several not-quite-standardized Word documents, and I developed a VSTO plugin for Outlook to handle this task. It took longer than anticipated, mainly due to late-discovered requirements. Despite the iterative process, the client was ultimately very satisfied, and I focused on building trust with them during this phase.

The full project aimed to execute business processes in response to incoming emails using multiple fine-tuned GPT-3 models in stages and incorporating as much context as possible into the prompts. Our first practical target was sorting emails in a shared mailbox and delegating tasks to different department members. Initial experiments suggested this process was likely feasible to automate.

We were more intrigued by experiments demonstrating the opposite: certain business processes could not be replicated by GPT-3. This was particularly evident when deviations were necessary for common-sense reasons or when deviations would yield greater value. For example, a customer inquires if product delivery is possible before a specific date. The operator determines it’s barely unattainable but recognizes the potential for high profits, which would prompt a human operator to deviate from standard procedures. We could not persuade GPT-3 to do this, and exploring such discrepancies in strategic reasoning seemed worthwhile.

The technical aspects of the full project presented significant challenges. We discovered that we were less familiar with certain technologies than we initially thought, particularly OAuth and specific aspects of Azure. These issues caused us considerable difficulties. We tried hiring consultants through Upwork, but the results were mixed and the process was unexpectedly time-consuming. Both the VSTO-plugin and Azure solution are available on our GitHub.

Alarmingly, the business automation department attempted to position themselves between our software and the end-users. I objected to this, as I saw it as a considerable threat. However, it was a political battle I was destined to lose. Shortly after my loss, I discovered they had initiated a competing internal project for the full project.

When we presented our solution, the end-users were enthusiastic, but the procurement champion claimed not to “have time” to make a decision at that point. He continued to stall for a month, and eventually, I (correctly) deduced that they would choose their internal project. This setback was demoralizing, as customer acquisition is likely my weak point, and I somewhat lack the metis required for startups.

Simultaneously, it appears that Microsoft is planning to make a strong effort to integrate large language models into Microsoft Office. Although they are unlikely to offer the same level of customization as our solution, their presence as a competitor would likely diminish our ability to be profitable.

Furthermore, experiments with GPT-4 indicated that the misalignment effect we were interested in probably didn’t reflect any significant difference in strategic reasoning. Instead, it was likely a result of GPT-3′s relatively limited capabilities. This realization made our formal experiments unlikely to yield any interesting findings.

By mid-April, after 18 months, we faced three major obstacles: our primary customer was not going to commit, our business model appeared uncompetitive compared to Microsoft’s offerings, and the alignment data was unpromising. Consequently, I decided to discontinue business operations.

Although each individual external obstacle was difficult to predict, the existence of challenges was foreseeable. We received $50,000 in funding from LTFF with an application that estimated a 60% probability of failure. On the bright side, I believe I achieved significant personal upskilling during this process. While AISafety.com A/​S is likely to undergo a substantial pivot, the organization may still contribute in a different way.