My Views on the Anti-GitHub Copilot Argument

Aug 4, 2022
ai github copilot

So since the announcement of GitHub Copilot, there has been a lot of backlash from the developer community. I’ve seen a lot of people saying that it’s a bad thing and that it’s going to ruin the industry in some way or another.

Arguments 1: It’s the start of the end for developers

Many developers are afraid this is the start of AI armageddon for developers. The fear being they will take our jobs. I’m not sure if I agree but I do completely understand that fear. At least as a knee-jerk reaction, it makes sense to me. However, I still don’t think AI is ready to take any complicated job from a human. Self-driving cars have not taken truck drivers or taxi drivers’ jobs. AI robot arms aren’t taking over the jobs of surgeons around the world. And Copilot is not good enough to code anything but simple applications on its own. And still not without someone first knowing what steps need to be taken in what sequence.

That’s not to say these things won’t happen in the future. In 20 - 30 years I think AI will be good enough to take over many jobs. AI will be doing a lot more in our societies than it is today. But I think that will be a good thing. If AI in a vehicle can prevent more accidents on the roads I’m all for it. If AI can assist a surgeon to save more lives, I’m all for it. And if Copilot can help me write code faster and more efficiently, I’m DEFINITELY all for it. That brings me to the next argument against it.

Argument 2: Copilots code is unsafe/bad

Yes, Copilot’s suggestions are certainly not perfect. I’ve seen examples of it spitting out very dubious SQL, poor-performing code, and complete overkill code. It’s certainly not taking our jobs if this is the best it has to offer. So we don’t have to worry about that just yet.

So I agree Copilot’s code should, for the foreseeable future, be regarded as unsafe and/or bad. That’s why it’s a “Copilot” we’re the ones in charge. It’s on us to vet its work. I prefer to think of it as Github’s “AI Junior Developer”. You put it to work on the small stuff so you can focus on the big-picture stuff. And part of that is reviewing the Junior dev’s code before you put it live. If you don’t, that’s on you, not them. That’s nothing new. However, understanding that is something that comes with over 10 years of experience in the industry. Which brings me to the next argument.

Argument 3: Schools, Colleges, and Students

Another argument is from teachers and lecturers, who are worried students will use it to skip learning the basics or, at worst, they will use it to cheat. I agree with this also. At least the part about students not learning a language or framework properly because Copilot will do the “grunt work” for you. I completely agree that this is a problem we will have to deal with now.

My counter-argument to this is simply that this is nothing new. It’s new for our field, but when I was in school, this argument was happening in math classrooms over the use of calculators. I, for one, was never a fan of maths (you don’t need to be a maths whizz to be a coder). This is because I had a very cruel teacher as a child who made an example of the “weak ones” so that gave me a mental block when put on the spot to solve a maths problem, even basic addition. My brain just panics and shuts down for a moment. So a calculator for me would have meant I could work around that mental block and not feel so ashamed in every maths test that I was “a dummy”.

That said, it’s not the same thing. And even though I would have liked to use a calculator in maths class in school, I don’t think allowed anyone who isn’t truly struggling to keep their head above water. Or for those with learning disabilities, number dyslexia, etc. I think you should be able to do maths and solve development problems with as little help as possible, before using any tools that would do that work for you.

However, when a mathemetician or engineer is working professionally, I think we would all agree they should use the tools that help reduce errors and improve efficiency. I see Copilot in this category. At least on the efficiency metric.

Argument 4: Copyleft/Copyright Infringement

The fourth argument, and the one I have the biggest issue with, is that Copilot is trained on possibly copyrighted code. This is an argument that has been made in other industries as well, and it has never had much legal merit for varying reasons.

The argument is that because Copilots AI (Open AI), was trained on potentially copyrighted code, it is infringing on the copyright of the original code. The most notable entity with this view is the Software Freedom Conservancy, which posted an article in June 2022 that everyone should Give Up Github.

They argue that because Copilots AI learned from copyrighted code from Github, it is breaking copyright laws by using that code in its suggestions. Firstly, Copilot is not simply spitting out code it has learned verbatim. It’s just not. It has “learned” to apply the logic it has seen before and rework it based on your project’s files, and the context of the file/code you are working on. It’s not just copying and pasting. It’s not even close to that. It’s applying logic to the code it has seen before and reworking it to fit the context of the file you are working on. It’s not copying and pasting. It’s not even close to that. It’s applying logic to the code it has seen before and reworking it to fit the context of the file you are working on.

To say that is copyright infringement means you or I can never learn from a piece of code on Github that has a license and then apply that knowledge in our projects, without attribution. “I saw something similar to this in XYZ. All copyright is theirs. blah blah blah”. That’s ridiculous!

So what about the cases where Copilot, for whatever reason, does generate code that is the same as a piece of code in a copyrighted project? Well, then we have to ask, how much code duplication can be called copyright infringement? A line? A function? A class, One file?

This is very important to clarify if they want to go down this road, because if we get into a situation where even the use of 1 function from a copyrighted codebase is copyright infringement, regardless of the context of the project or file it is in, then we’re in serious trouble as an industry. Worse than AI taking our jobs. We’re getting into the territory of musicians and artists claiming common phrases in lyrics are copyrighted and therefore infringing. And that’s a slippery slope.

If I write a song in the key of G, does that mean I’m breaking copyright because those notes have been used before? Of course not, that’s insane! But as more time passes artists are losing copyright cases for more and more convoluted reasons. But at what point are the series of notes “too similar”, and how many consecutive notes are no longer “original” and now “copying”? This is the same argument. At what point is a line of code “too similar” to another line of code, and how many consecutive lines are no longer “inspired by” and now “copying”? Is a class that does the same thing as a copyrighted class, but uses, for the most part, different code to get there, a copy?

I don’t have answers to those questions, but I certainly don’t want lawyers and judges making rulings on them that would then be law forever. I’ll take rampant code plagarism by all over lawyers and judges ruling who can and can’t use certain code any day!