KPI Bait, and Github Copilot

musings by Bast on Sunday June 26th, 2022

KPI and Code

For the non-technical, KPI stands for "Key Performance Indicator". It's usually a number, but anything that "indicates" and can be measured and written down (IE: can become a number) will do. When applied to a person/employee/programmer, think attendance, features shipped, total revenue of projects they're responsible for, total downtime of projects they're responsible for (a "negative" KPI, where it's better to be as low as possible).

KPI's are used to track performance in general.

They're used with cars:

mileage
noise level
how many people it's carried versus empty space

Computers:

amount of free RAM
free cpu %
load average
disk wear (IOPS, total written, bad sector count)

Students:

Test questions gotten/missed
Attendance
Project rubrics

These are all incredibly useful–a system with very low free RAM is "overperforming"–that is, operating beyond it's capabilities. Think a car redlining–you want to do some minimal management work to move tasks either off the machine, or lessen their load. Why? Because any sufficiently complex task needs a buffer. What happens if a background task spins up and requires just a sliver more ram than is available?

The OOM Killer takes out your application, that's what. Potential data loss and recovery, downtime, all the nightmares you don't want.

On top of that, much of the time running at too high memory thresholds is bad itself in the first place due to risk and swapping (where the operating system makes additional memory available in exchange for slowing everything down to the speed of your hard disk). Which is invaluable when you really need it, but compared to the original speeds ram gets its horrible. The bar is "better than crashing" and it is certainly met. How bad is it, though? Let's grab a samsung enterprise drive, currently advertised, up to date technology, the works. You won't have this, you will have worse than this:

 Random write speeds up to 200,000 IOPS and sequential read speeds up to 6,900 MB/s are appropriate for the leading data centers that require fast performance for data processing.

7GBps isn't too shabby. But how fast is RAM? Let's assume you're using ddr4 (if you're not using ddr4 with a modern storage device, you're bottlenecking badly and need to stop). DDR4 starts at 19GBps. The slowest DDR4 you can get is nearly two and a half times the speed of the fastest disk you'll likely have. And it just goes up from there, DDR4-32000 (middle of the pack) does 25.6GBps. That's 3.25 times faster.

"Oh, it's just 3 times faster" speed is a direct multiplier on cost to run and scale. Provided you're not "minimally scaled"–a single server, with perhaps a single failover–doubling your speed halves the number of servers you need to purchase/maintain/rent. And halving servers is not just halving renting costs–maintenance gets more complicated disproportionally to the number of servers you use–two servers is not double the work to maintain compared to one, but roughly three times, because not only do you now have to worry about them all being consistent, but they probably also need a way to communicate with each other, and with communication comes complexity, pain, and exponential suffering (see: cloud engineer salaries). Typically, once you go beyond a single server you need to consider how different people performing similar actions on different servers will behave, and down that path lies madness. Thus, it's staked off with ever-greater tools and frameworks and "cloud" until it's "good enough".

Let's say you're running your devices at 95% ram, and they're swapping hard as a result. Let's also say you've got three servers, and one database server, which is about appropriate for a small but significant corporation's web product's production environment (we'll ignore machines used for testing and development).

They're running at 1/3 speed, and roughly then 1/2 capacity, at best, due to the overload. This means that of those three servers… you're only getting 1.5 well-loaded server's worth of performance. And surprisingly enough, often this manifests as a periodic problem too, not showing up even if you test it with the same amount of users. But for the entire work afternoon they'll be bogged down. Why is that?

Because people are waiting on their work, too. They try and load a page. Times out. They reload. Times out, they wait, they reload, get results, repeat. Each timeout adds load to your server for no benefit. If, somehow, this load could be reduced then the timeouts would drop, the server would be able to use less ram as it scales down, swapping will stop, the application will speed up, and then it will be able to answer all the users it had before except without breaking.

But if it could do that, how did it get broken in the first place?

Well, everyone logged into the system at 9:00 sharp, and gave it the mother of tidal waves of work. And, well, nobody wants to be put on hold, so the system isn't built for it, struggles, and then overloads because it's configured to do so rather than crash, and then it follows everyone throughout the day, turning extra server space into heat and money wasters. A giant, endless, cyberspace traffic jam.

So, that's a real world example of KPI's, the problems with following KPI's too closely (95% ram is bad, not good, 5% ram is bad, not good, because the edges behave badly because nothing is a simple, constant stress), but also how they can be helpful, setting apppropriate usage goals and plans to determine which servers are potentially unused, etc.

What's KPI Bait?

KPI Bait is something that appears to raise KPI's, but has little to no or even a detrimental actual effect on actual, real-world performance/production/profits/etc.

The above example is pretty good: running too close to the line with RAM, if you're using "Total used RAM" as part of your KPI's. More RAM! More usage! Numbers go up! We'll see so many completed requests, and completed requests are profit! Wait.. why are completed requests going down while RAM still goes up? Why are our applications crashing?

Another incredibly popular KPI Bait (Although by now, hopefully widely known) is "lines of code". Developer A wrote 500 lines of code today. Developer B wrote 400 lines of code. Is developer A more productive? Maybe? But what do you do about Developer C, (me), who wrote -50?

Wait a minute, you're probably asking, how do you write negative lines of code? Well, you delete it. "Removing features??" You may ask? Not really, either. Lines of code only roughly corresponds to features, or performance. Replacing a complicated arrangement of code used to determine, say, tax rates with a connection to a cheap (legal-approved) service will reduce the amount of code needed, in addition to raising accuracy, flexibility, value, and adding features (like, say, not overpaying in states with lower sales taxes).

On top of that, it's very easy in a digital environment to copy code–after all, it's just text. Sure, you could write two features in a way that is integrated and shares configuration (IE: use the same settings page in a way that lets them be compared by a user).. or you could copy the code for both the original feature and it's settings page. Now you have lots of code! High KPIs, great… oh.. no.. it's pretty slow now that we've got 15 settings pages with a few settings each. And why does it take forever to make simple changes? What do you mean you have to change the name in 35 different places?

As a result, lines of code is known (or should be..) to be a particularly volatile and unreliable KPI. Don't take the bait. Running in place doesn't make you go any faster. Make sure perceived motion is actual motion.

Github Copilot is KPI Bait?

Github Copilot (TM) is fairly straightforwards KPI bait. It promises:

GitHub Copilot uses the OpenAI Codex to suggest code and entire functions in real-time, right from your editor.

Ok, seems pretty futuristic. Can they hold to their promise?

Spend less time creating boilerplate and repetitive code patterns, and more time on what matters: building great software. Write a comment describing the logic you want and GitHub Copilot will immediately suggest code to implement the solution.

(Both of these are taken from their marketing site on Sun, Jun 26, 2022)

The beginning of this is good. Less time on boilerplate and repetitive.. actually.. it's not good. Boilerplate is fine, although having a solution that makes it much easier to write is likely to incentivise boilerplate.. which isn't good. Lines of Code Go Up, functionality.. doesn't change. Means it's harder to find the important parts, takes longer to compile, deploy, and runs slower. You want less boilerplate. Car analogy? Screws. You need some screws (boilerplate) to hold everything together, otherwise you're stuck doing cursed things like fitting body pieces together and using nails when you shouldn't, but after you start extracting the fiftieth screw to remove one plastic plate from the engine just so you can change a headlight you will realize that perhaps more is not better.

Less time creating repetitive code patterns is the above, except exponentially worse. There is a constant flux between writing code that is repetitive but simple, and code that is complex but powerful. A good application strikes a good balance. Parts that aren't core, change semi-frequently but not dramatically, and have simple repetitive implementations should be done as such. For example, if you're running in one state, and open operations in a second state. It's not (yet) worth it to write an entire management system for multiple states, it's much easier to just have a few fairly identical lines of code in strategic places that, in the future, can be expanded into something more comprehensive. You don't spend time developing features you don't need (you just expanded into another state, you're not going to go into 5 more just yet), and you save that time for working on your core competency (say, finding the right car parts and saving on shipping costs, which may be a behemoth of an application with custom solutions to industry-defining problems like the traveling salesman or box stacking problem).

The problem comes where writing repetitive code is too easy. It ends up spreading, and spreading quickly. Places where stuff really should be in an interface for non-programmers to edit rapidly become massive chunks of code with all those options hard- and "hand"-written in. It slowly becomes harder and harder to change anything, even with the tool–because Github Copilot does not help you with refactoring the same way it does with writing the initial mountain of problem. Bad code is like cancer, and slowly spreads, and repetitive code specifically ossifies when it does.

So, at this point, my point may be clear, but Github Copilot is something special. It knocks being a KPI lure far, far out of the park.

Github Copilot suggests bugs, including security vulnerabilities: https://cyber.nyu.edu/2021/10/15/ccs-researchers-find-github-copilot-generates-vulnerable-code-40-of-the-time/

Github Copilot suggests copyrighted code without warning:

https://twitter.com/mitsuhiko/status/1410886329924194309

You are on the hook if this is found in your codebase, by the way. Github Copilot's terms include indemnification, as is custom, which means your ass is strictly unpadded.

Github Copilot was trained on the "publicly available" code on github:

 GitHub Copilot is powered by Codex, a generative pretrained AI model created by OpenAI. It has been trained on natural language text and source code from publicly available sources, including code in public repositories on GitHub.

(per their marketing site)

This includes obviously mislicensed code, code not licenced for commercial use, code with license conditions (such as Creative Commons with Attribution), and more although it's nontrivial to figure out where a given chunk of code came from (because sometimes it has been altered a bit. Whether that counts as legally distinct.. better hope you have some very good lawyers.)

But all of these do not reach the height of why I personally think Github Copilot is the worst programming tool to come out in recent memory.

It changes my job from someone solving problems, architecting solutions, and making cost-benefit analyses into being a nanny for a machine that can, and will, slip random bugs and glitches into the code. It's trained on publicly available code. Including all students, first-time projects, unmaintained source code hellholes using styles from the 70s that have long since been labeled radioactive, and more. What you're getting has almost certainly not been evaluated for accuracy, correctness, efficiency, or anything else, it's just what "happens to come up" and fit your parameters the best. And the reason this is so bad?

Finding bugs IS HARD.

Even in code written by themselves, and not subject to the "machine wrote it, it's probably right" effect that nearly everyone is unfortunately subject to, we spend somewhere between 25% and 50% of our time as programmers finding and fixing bugs (The rest is figuring out what we're going to write, what has been written, what should be written). Not even including the little typos we make that don't even make it to a build or automated tooling check.

Copilot lets you write lots of code. Lots of "reasonably" buggy code. Which means that my job goes from 25% bugs, to 80% bugs. This also means that I spend less time reading and understanding the code (+bugs), less time planning the code (-understanding, +hard to find bugs), and it just goes on.

What do you think a car mechanic would do if you told them their job was going to be "supervise the result of these 4 beginner mechanics following instructions they found on the internet somewhere, and make sure it works great? Oh, and you can't see the instructions they found to see if they followed them correctly, or their source, or if you'll be in legal hot water for following an Official Booklet without a license."

They'd think it was a joke. Then they'd instantly quit.

I'll instantly quit too.

Not because the tool is taking my job, but because it's taking my job, and then not doing it, and on top of that I am asked to take responsibility? Are you serious?

Don't use Github Copilot.