A Year With GitHub Copilot in Production
TL;DR — One year of daily Copilot use. Estimated ~25% productivity gain on boilerplate-heavy code (tests, CRUD, glue), ~5% on novel architecture. Most-misleading suggestions: APIs that look real but don’t exist. Workflow shift is real; the language for it (“AI-native engineer”) only stuck mid-2022.
December starts the AI-augmented dev theme. After a year using GitHub Copilot, this post is the honest retrospective.
I subscribed to Copilot in November 2021. It’s been ~13 months of daily use across Go, Python, TypeScript, PHP, Rust. This isn’t a Copilot promotional post; it isn’t a Copilot hate piece either. It’s what I’ve actually observed.
What I use it for, ranked by value
Highest value:
- Boilerplate: handler signatures, struct definitions, repetitive CRUD. The exact value AI is good at.
- Test scaffolding: “given this function, generate three test cases.” 80% correct; I edit assertions.
- DTO mappings: convert a domain entity to a JSON response. Mechanical; Copilot handles it.
- Documentation comments: starts a docstring; I correct it.
- Repetitive refactors: “do this same change to 12 functions.” Copilot suggests each next one.
Medium value:
- Algorithm starters: I describe what I want in a comment; Copilot gives a starting point. Usually 60% right.
- Regex: I write
// match phone numbers; it generates one. ~50% right. - Library API discovery: it suggests function calls that often exist in the library. Saves docs lookup.
Low value / negative:
- Anything novel architectural: it pattern-matches against what’s typical; what’s typical is sometimes wrong for my situation.
- Anything security-sensitive: it sometimes suggests insecure patterns (SQL string concat, weak crypto).
- Anything domain-specific: business logic for my specific app is wrong without exception.
Estimated productivity gain
Hard to measure precisely. My best estimate:
| Task type | Time savings |
|---|---|
| Boilerplate (CRUD, mappers, tests) | ~40-50% |
| Code reviews | ~10-15% |
| New feature design | ~5-10% |
| Debugging | ~5% |
| Novel architecture | ~0% |
Weighted by how much time I spend on each: maybe 25% overall.
That’s significant. Not 10× (“AI will replace developers”). Not 0% (“AI is hype”). Somewhere in the middle.
What Copilot consistently gets wrong
Three patterns I see repeatedly:
1. Hallucinated APIs. Generates code calling lodash.flatMapDeep when the function is actually _.flatMapDeep — and sometimes invents APIs that don’t exist at all. The code looks right; it doesn’t run. ~20% of my “Tab to accept” sessions need correction.
2. Wrong context. Suggests Python-3 syntax in a Python-2 codebase. Suggests React 18 patterns in a React 16 codebase. It uses statistical familiarity; it doesn’t deeply check the project.
3. Subtle bugs. Off-by-one errors. Wrong sign on a comparison. Missing await. The code compiles and looks fine; misbehavior surfaces during testing.
For each, the workflow is: accept, scan, often correct. Faster than typing from scratch; not free.
The workflow shift
Some real changes in how I work:
More planning out loud. I write more comments BEFORE writing code, because Copilot’s suggestion quality scales with context. “// this function takes a userID and returns… " is now my normal opening. Side effect: better code structure.
Less stack overflow. I haven’t searched “how to do X in Python” in months. Copilot autocompletes the syntax I’m trying to remember.
More skim-reading. Less time typing; more time eyeballing what was generated. Skimming code is a real skill now.
More test-writing. Trivial cost to add 5 more test cases. I write more tests than before.
What’s NOT covered by Copilot
Copilot helps within a function or across nearby functions. It doesn’t:
- Suggest architectural patterns
- Find bugs in existing code (that’s a different product)
- Refactor across modules
- Generate PRs from descriptions
- Plan work
For those, GPT-4 in ChatGPT (released March 2023 — out of scope here) starts being useful but is its own product. Copilot is autocomplete++.
When I turn it off
Three contexts I disable Copilot:
- Security-sensitive code (auth, crypto, payment). The risk of an insecure pattern slipping through outweighs the speed gain.
- Code I’m trying to learn through. Working through Rust ownership the first time, autocomplete robs me of the learning.
- Long writing sessions in markdown. Constant suggestions distract.
It has a button. I use it.
The license thing
Mid-year (June 2022) the lawsuit against GitHub Copilot for training-data licensing was filed. As of December 2022 still in litigation. Companies with strict open-source compliance concerns have policies; mine doesn’t have an issue.
I’ll cover the legal angle separately. For most teams in 2022: the practical impact is “your code may resemble training data.” Audit accordingly.
What this month covers
12 more posts on AI-augmented dev:
- Dec 5: What Copilot’s good at and what it isn’t
- Dec 7: Prompt-style comments to steer Copilot
- Dec 9: Reviewing AI-suggested code
- Dec 12: Pair programming with AI
- Dec 14: Copilot for tests — TDD or anti-TDD?
- Dec 16: Codespaces + Copilot
- Dec 19: Beyond Copilot — alternatives
- Dec 21: Licensing and IP
- Dec 23: Productivity metrics that matter
Wrapping Up
Copilot is good, not magic, sometimes wrong. A year in, I won’t go back. Monday: what it’s specifically good at and what it isn’t.