A Year With GitHub Copilot in Production

A Year With GitHub Copilot in Production

December 2, 2022 · 4 min read · by Muhammad Amal ai

TL;DR — One year of daily Copilot use. Estimated ~25% productivity gain on boilerplate-heavy code (tests, CRUD, glue), ~5% on novel architecture. Most-misleading suggestions: APIs that look real but don’t exist. Workflow shift is real; the language for it (“AI-native engineer”) only stuck mid-2022.

December starts the AI-augmented dev theme. After a year using GitHub Copilot, this post is the honest retrospective.

I subscribed to Copilot in November 2021. It’s been ~13 months of daily use across Go, Python, TypeScript, PHP, Rust. This isn’t a Copilot promotional post; it isn’t a Copilot hate piece either. It’s what I’ve actually observed.

What I use it for, ranked by value

Highest value:

Boilerplate: handler signatures, struct definitions, repetitive CRUD. The exact value AI is good at.
Test scaffolding: “given this function, generate three test cases.” 80% correct; I edit assertions.
DTO mappings: convert a domain entity to a JSON response. Mechanical; Copilot handles it.
Documentation comments: starts a docstring; I correct it.
Repetitive refactors: “do this same change to 12 functions.” Copilot suggests each next one.

Medium value:

Algorithm starters: I describe what I want in a comment; Copilot gives a starting point. Usually 60% right.
Regex: I write // match phone numbers; it generates one. ~50% right.
Library API discovery: it suggests function calls that often exist in the library. Saves docs lookup.

Low value / negative:

Anything novel architectural: it pattern-matches against what’s typical; what’s typical is sometimes wrong for my situation.
Anything security-sensitive: it sometimes suggests insecure patterns (SQL string concat, weak crypto).
Anything domain-specific: business logic for my specific app is wrong without exception.

Estimated productivity gain

Hard to measure precisely. My best estimate:

Task type	Time savings
Boilerplate (CRUD, mappers, tests)	~40-50%
Code reviews	~10-15%
New feature design	~5-10%
Debugging	~5%
Novel architecture	~0%

Weighted by how much time I spend on each: maybe 25% overall.

That’s significant. Not 10× (“AI will replace developers”). Not 0% (“AI is hype”). Somewhere in the middle.

What Copilot consistently gets wrong

Three patterns I see repeatedly:

1. Hallucinated APIs. Generates code calling lodash.flatMapDeep when the function is actually _.flatMapDeep — and sometimes invents APIs that don’t exist at all. The code looks right; it doesn’t run. ~20% of my “Tab to accept” sessions need correction.

2. Wrong context. Suggests Python-3 syntax in a Python-2 codebase. Suggests React 18 patterns in a React 16 codebase. It uses statistical familiarity; it doesn’t deeply check the project.

3. Subtle bugs. Off-by-one errors. Wrong sign on a comparison. Missing await. The code compiles and looks fine; misbehavior surfaces during testing.

For each, the workflow is: accept, scan, often correct. Faster than typing from scratch; not free.

The workflow shift

Some real changes in how I work:

More planning out loud. I write more comments BEFORE writing code, because Copilot’s suggestion quality scales with context. “// this function takes a userID and returns… " is now my normal opening. Side effect: better code structure.

Less stack overflow. I haven’t searched “how to do X in Python” in months. Copilot autocompletes the syntax I’m trying to remember.

More skim-reading. Less time typing; more time eyeballing what was generated. Skimming code is a real skill now.

More test-writing. Trivial cost to add 5 more test cases. I write more tests than before.

What’s NOT covered by Copilot

Copilot helps within a function or across nearby functions. It doesn’t:

Suggest architectural patterns
Find bugs in existing code (that’s a different product)
Refactor across modules
Generate PRs from descriptions
Plan work

For those, GPT-4 in ChatGPT (released March 2023 — out of scope here) starts being useful but is its own product. Copilot is autocomplete++.

When I turn it off

Three contexts I disable Copilot:

Security-sensitive code (auth, crypto, payment). The risk of an insecure pattern slipping through outweighs the speed gain.
Code I’m trying to learn through. Working through Rust ownership the first time, autocomplete robs me of the learning.
Long writing sessions in markdown. Constant suggestions distract.

It has a button. I use it.

The license thing

Mid-year (June 2022) the lawsuit against GitHub Copilot for training-data licensing was filed. As of December 2022 still in litigation. Companies with strict open-source compliance concerns have policies; mine doesn’t have an issue.

I’ll cover the legal angle separately. For most teams in 2022: the practical impact is “your code may resemble training data.” Audit accordingly.

What this month covers

12 more posts on AI-augmented dev:

Dec 5: What Copilot’s good at and what it isn’t
Dec 7: Prompt-style comments to steer Copilot
Dec 9: Reviewing AI-suggested code
Dec 12: Pair programming with AI
Dec 14: Copilot for tests — TDD or anti-TDD?
Dec 16: Codespaces + Copilot
Dec 19: Beyond Copilot — alternatives
Dec 21: Licensing and IP
Dec 23: Productivity metrics that matter

Wrapping Up

Copilot is good, not magic, sometimes wrong. A year in, I won’t go back. Monday: what it’s specifically good at and what it isn’t.

What I use it for, ranked by value

Estimated productivity gain

What Copilot consistently gets wrong

The workflow shift

What’s NOT covered by Copilot

When I turn it off

The license thing

What this month covers

Wrapping Up

Related posts

Prompt-Style Comments to Steer Copilot

What Copilot Is Good At (and What It Isn't)

Beyond Copilot, Tabnine, Codeium, Amazon CodeWhisperer

Copilot for Tests, TDD or Anti-TDD?

Pair Programming With an AI Assistant

Reviewing AI-Suggested Code

AI Assist in Neovim, Copilot, Codeium, and ChatGPT in 2024

Why Senior Developers Are Moving to Neovim in 2024

Let’s Start a Project