background-shape
What Copilot Is Good At (and What It Isn't)
December 5, 2022 · 4 min read · by Muhammad Amal ai

TL;DR — Excels at: completing patterns, boilerplate, common API usage, test scaffolding, doc strings. Mediocre: business logic, complex algorithms. Negative: security-sensitive code, novel architecture, off-distribution languages.

After the year retrospective, categorizing where Copilot actually helps. The simple “AI good / AI bad” framing misses the texture.

Five tiers of Copilot competence

Tier 1 — Excels at. Copilot’s suggestions are nearly always usable. Faster + correct.

Tier 2 — Useful. Suggestions need some editing but save real time.

Tier 3 — Mixed. Suggestions are starting points; often need rework.

Tier 4 — Misleading. Suggestions look right but are subtly wrong. Net-negative.

Tier 5 — Avoid. Don’t use Copilot here.

Tier 1 — Excels at

Boilerplate code:

// type Comment: id, post_id, body, created_at
// Copilot fills in the struct, JSON tags, constructor, etc.

Mechanical mappings:

// Convert User to UserResponse, dropping internal fields
// Copilot generates the mapper accurately

Standard idioms:

# Use context manager for the file
# Copilot does the with-open-as pattern correctly

Test scaffolding:

def test_email_validator():
    # Copilot suggests cases: valid, missing @, missing domain, etc.

Docstring stubs:

def parse_phone(number: str) -> str:
    """
    Copilot completes a reasonable docstring based on signature.
    """

For these, accept rate is 70-90%. Massive time saver.

Tier 2 — Useful

API calls in popular libraries. “How do I open a Postgres connection with pgx in Go?” — Copilot generates the right setup. Sometimes off in version-specific details but mostly works.

Regular expression starters. “// match a US phone number” — Copilot offers a regex. 50% right; faster than typing from scratch.

Configuration files. Dockerfiles, GitHub Actions YAML, nginx configs. Common patterns auto-complete.

Error handling boilerplate. “if err != nil { return …” Copilot fills in the return value with context.

For these, accept rate is 40-70%. Faster but careful review needed.

Tier 3 — Mixed

Complex algorithms. “Compute the diff between two trees.” Copilot has seen this; the implementation is sometimes correct, sometimes a confident hallucination.

Database queries. SQL generation works for simple queries. Complex joins / window functions / CTEs: hit and miss.

Date/time logic. Time zones, ISO formats, date math. Subtle bugs.

Concurrent code. Mutex usage, channel patterns, async/await. Sometimes correct; often subtly wrong.

For these, accept rate is 30-50%. Worth scanning the suggestion; often a starting point for human work.

Tier 4 — Misleading

Domain-specific business logic. Subscription proration math. Tax calculations. Order workflow. Copilot pattern-matches against generic patterns; your business rules are specific.

Cross-module refactors. Function calls change; Copilot suggests calling code based on old signatures.

Performance-sensitive code. Generates “reasonable” code that’s not optimal. Hot-path optimization needs careful human work.

Code that looks plausible but uses APIs that don’t exist. The hardest category to spot. Code compiles in your editor (or the LSP error is briefly visible); you accept it; the import fails to resolve and you waste 5 minutes.

For these, decline more often than accept. Manual coding is faster.

Tier 5 — Avoid

Security-sensitive code. Auth, crypto, secrets handling. Copilot suggestions in these areas sometimes include insecure patterns (HS256 for JWT, fixed seeds, SQL concatenation). Default off; write from scratch with reference docs.

Code following a strict spec. OAuth implementations, payment processing, anything compliance-bound. The cost of being subtly wrong is too high.

Novel domains. If you’re working on something rare (medical devices, embedded firmware, esoteric protocols), Copilot has thin training data and confidently wrong suggestions.

Languages

By language, my accept rates:

Language Accept rate
JavaScript / TypeScript ~65%
Python ~60%
Go ~55%
Rust ~40%
PHP ~50%
SQL ~45%
Shell scripts ~50%
Markdown / docs ~70%

JavaScript wins because there’s the most training data. Rust loses because the language is young + ownership / lifetimes need precise reasoning that statistical models struggle with.

Project-specific factors

Within a single language, accept rate varies by:

  • Codebase size: bigger codebase = more context = better suggestions
  • Naming conventions: consistent naming = Copilot picks up your idioms
  • Type system usage: strict TypeScript / Go > untyped Python
  • Comment density: more inline context = better completions

Investing in code clarity helps both humans AND Copilot.

When to ignore Copilot

Specific signals:

  • Suggestion uses a different design pattern than the rest of the file
  • Suggestion calls an API you don’t recognize
  • Suggestion involves crypto / security primitives
  • Suggestion is “too good” for the complexity of the prompt
  • You’re tired and can’t review carefully

When in doubt: write it yourself. The minute you save by accepting a wrong suggestion costs ten minutes debugging later.

Common Pitfalls

Accepting without reading. Hit Tab; ship; regret. Always scan.

Trusting hallucinated APIs. Code compiles; the function doesn’t exist. Verify with LSP / docs.

Suggesting in security code. Disable or extra careful.

Not investing in prompt context. Comments before functions = better suggestions. Free.

Letting Copilot write your tests entirely. Test cases pattern-match against expected cases; misses unexpected edge cases.

Using Copilot’s confidence as signal. It always sounds confident. The output’s confidence isn’t related to its correctness.

Wrapping Up

Tier 1-2 = time saver. Tier 3 = think. Tier 4-5 = type yourself. Knowing where you are makes Copilot useful instead of dangerous. Wednesday: prompt-style comments.