Linkpost: Github Copilot productivity experiment

Link post

We recruited 95 professional developers, split them randomly into two groups, and timed how long it took them to write an HTTP server in JavaScript. One group used GitHub Copilot to complete the task, and the other one didn’t. We tried to control as many factors as we could–all developers were already familiar with JavaScript, we gave everyone the same instructions, and we leveraged GitHub Classroom to automatically score submissions for correctness and completeness with a test suite. We’re sharing a behind-the-scenes blog post soon about how we set up our experiment!

In the experiment, we measured—on average—how successful each group was in completing the task and how long each group took to finish.

  • The group that used GitHub Copilot had a higher rate of completing the task (78%, compared to 70% in the group without Copilot).

  • The striking difference was that developers who used GitHub Copilot completed the task significantly faster–55% faster than the developers who didn’t use GitHub Copilot. Specifically, the developers using GitHub Copilot took on average 1 hour and 11 minutes to complete the task, while the developers who didn’t use GitHub Copilot took on average 2 hours and 41 minutes.

My opinion: Because of the usual reasons (publication bias, replication crisis, the task being “easy,” etc.) I don’t think we should take this particularly seriously until much more independent experiments have been run. However, it’s worth knowing about at least.

Related: https://​​ai.googleblog.com/​​2022/​​07/​​ml-enhanced-code-completion-improves.html

We compare the hybrid semantic ML code completion of 10k+ Googlers (over three months across eight programming languages) to a control group and see a 6% reduction in coding iteration time (time between builds and tests) when exposed to single-line ML completion. These results demonstrate that the combination of ML and SEs can improve developer productivity. Currently, 3% of new code (measured in characters) is now generated from accepting ML completion suggestions.