Copilot’s license

As I said in my last post, GitHub Copilot has marketing issues. However, the bigger issues are with licenses and the code generated.

Copilot was trained on public repositories with a variety of licenses. With a 0.1% chance of verbatim code duplication from the training set, there is the possibility of having incompatible licensed code inserted into a new project. This scenario has already been demonstrated by @mitsuhiko. There are no safeguards in Copilot to prevent this issue.

One partial solution was suggested by @kelseyhightower:

GitHub’s Copilot would benefit from a compliance feature to help developers detect when any code, hand written or auto generated, possibly violates another projects license or copyright.


This would at least warn users of potential problems. Implementing this solution could be difficult.

A more effective solution may be for Copilot to detect the new project’s license and only suggest compatible code. This would most likely require GitHub to re-train Copilot with training sets split by licenses. Splitting code this way could reduce Copilot’s effectiveness by limiting the size of the code base used for generating suggestions.

The demonstrated verbatim code duplication with incompatible licenses shows changes are required. GitHub can’t push the blame to end users if the tool has flaws end users can’t detect. The fix has to come from Copilot’s creators.