The key idea behind Copilot and other programs like it, sometimes called code assistants, is to put the information that programmers need right next to the code they are writing. The tool tracks the code and comments (descriptions or notes written in natural language) in the file that a programmer is working on, as well as other files that it links to or that have been edited in the same project, and sends all this text to the large language model behind Copilot as a prompt. (GitHub co-developed Copilot’s model, called Codex, with OpenAI. It is a large language model fine-tuned on code.) Copilot then predicts what the programmer is trying to do and suggests code to do it.
This round trip between code and Codex happens multiple times a second, the prompt updating as the programmer types. At any moment, the programmer can accept what Copilot suggests by hitting the tab key, or ignore it and carry on typing.
The tab button seems to get hit a lot. A study of almost a million Copilot users published by GitHub and the consulting firm Keystone Strategy in June—a year after the tool’s general release—found that programmers accepted on average around 30% of its suggestions, according to GitHub’s user data.
“In the last year Copilot has suggested—and had okayed by developers—more than a billion lines of code,” says Dohmke. “Out there, running inside computers, is code generated by a stochastic parrot.”
Copilot has changed the basic skills of coding. As with ChatGPT or image makers like Stable Diffusion, the tool’s output is often not exactly what’s wanted—but it can be close. “Maybe it’s correct, maybe it’s not—but it’s a good start,” says Arghavan Moradi Dakhel, a researcher at Polytechnique Montréal in Canada who studies the use of machine-learning tools in software development. Programming becomes prompting: rather than coming up with code from scratch, the work involves tweaking half-formed code and nudging a large language model to produce something more on point.
But Copilot isn’t everywhere yet. Some firms, including Apple, have asked employees not to use it, wary of leaking IP and other private data to competitors. For Justin Gottschlich, CEO of Merly, a startup that uses AI to analyze code across large software projects, that will always be a deal-breaker: “If I’m Google or Intel and my IP is my source code, I’m never going to use it,” he says. “Why don’t I just send you all my trade secrets too? It’s just put-your-pants-on-before-you-leave-the-house kind of obvious.” Dohmke is aware this is a turn-off for key customers and says that the firm is working on a version of Copilot that businesses can run in-house, so that code isn’t sent to Microsoft’s servers.
Copilot is also at the center of a lawsuit filed by programmers unhappy that their code was used to train the models behind it without their consent. Microsoft has offered indemnity to users of its models who are wary of potential litigation. But the legal issues will take years to play out in the courts.
Dohmke is bullish, confident that the pros outweigh the cons: “We will adjust to whatever US, UK, or European lawmakers tell us to do,” he says. “But there is a middle balance here between protecting rights—and protecting privacy—and us as humanity making a step forward.” That’s the kind of fighting talk you’d expect from a CEO. But this is new, uncharted territory. If nothing else, GitHub is leading a brazen experiment that could pave the way for a wider range of AI-powered professional assistants.