background-shape
Designing the Service Boundary, Lessons from Splitting a Billing Module
January 12, 2022 · 6 min read · by Muhammad Amal programming

TL;DR — A service boundary is not a directory boundary. Define it by data ownership, transactional scope, and the questions other services need to ask. If the answer to “who owns this row” is fuzzy, you don’t have a boundary yet.

Now that we have a first Go microservice running, the obvious next question is which one to extract second. The answer in our case was billing. Not because billing is easy — it isn’t — but because the monolith’s billing code had become the worst pain point: every product change touched it, every PR risked breaking invoicing, every release required a billing engineer in the room.

What I learned drawing the billing service boundary surprised me. The hard work is not the code. The hard work is the conversations before the code, where you ask: who owns the customer’s plan record? When does a credit balance change become visible? Can a refund happen without an invoice? Most monolith codebases have implicit answers to these questions buried across twenty files. Extracting the service forces you to make them explicit.

What “boundary” actually means

In a monolith, a “boundary” is a folder name. Code outside app/Billing/ is supposed to call into it through the BillingService class, but in practice half the codebase reaches directly into Invoice::find()->subscriptions->charges. The boundary is theoretical.

In a microservice, the boundary is enforced by the network. Anything outside the billing service can only ask billing questions through billing’s public API. That sounds tautological until you try to draw the line, and discover that the rest of your application constantly asks billing questions that aren’t in any API:

  • “Is this user on a paid plan right now?” (account service)
  • “Show me this user’s last three invoices on the dashboard.” (web frontend)
  • “If they cancel today, what’s the prorated refund?” (support tooling)
  • “Did they have an active subscription on March 12, 2021?” (analytics)

If your billing service doesn’t answer all four, calling code will reach around it. That’s not a microservices architecture; that’s a distributed monolith with extra latency.

The boundary design exercise is enumerating every question other parts of your system ask of billing, and ensuring billing has an answer for each one — or explicitly delegating the question elsewhere.

Three boundary heuristics

Data ownership is non-negotiable. Pick a clear list of database tables that only the billing service writes to. Other services can’t have triggers on them, can’t run scheduled jobs against them, can’t write read replicas pointed at them. Reads from outside are fine if mediated through an API. Writes from outside are forbidden.

For us: subscriptions, invoices, invoice_lines, payment_methods, charges, credit_notes, subscription_plans. Seven tables. Anything that wasn’t on this list — like users or addresses — stayed in the monolith.

Transactional scope determines synchronous vs async. Operations that must complete atomically with the billing change need to be inside billing. Operations that can happen eventually should be on the outside, consuming billing events. The split is forced on you by the loss of cross-service transactions.

For us: a refund (credit note + payment reversal) must be atomic — both in the same DB transaction in the billing service. Sending the refund email is async — billing emits a RefundIssued event and the notifications service consumes it.

Stable invariants tell you where the boundary should be. If your business rules say “a subscription always has exactly one active plan, and changing it is a billable event,” that invariant points squarely at the billing service. If your business rules say “a customer can be in three loyalty tiers based on rolling 90-day spend,” that’s not a billing invariant — it’s a marketing/CRM invariant that consumes billing data.

Drawing the boundary along invariant lines makes it clear what each service is responsible for keeping true.

The contract: what billing exposes

We landed on three categories of API.

Synchronous queries (gRPC): for things the rest of the system needs to ask in real time.

service Billing {
  rpc GetActiveSubscription(GetActiveSubscriptionRequest) returns (Subscription);
  rpc ListInvoices(ListInvoicesRequest) returns (ListInvoicesResponse);
  rpc PreviewCharge(PreviewChargeRequest) returns (PreviewChargeResponse);
}

These are read-heavy and need predictable latency. gRPC over HTTP/2, with deadlines on every call.

Synchronous commands (gRPC): for state changes that need an immediate result.

service Billing {
  rpc StartSubscription(StartSubscriptionRequest) returns (Subscription);
  rpc CancelSubscription(CancelSubscriptionRequest) returns (Subscription);
  rpc IssueRefund(IssueRefundRequest) returns (CreditNote);
}

The caller waits, the billing service does the transaction (DB plus payment gateway plus event emission) atomically, returns the result. If anything fails, nothing changes.

Asynchronous events (Kafka in our case, NATS would have been fine): for downstream systems that need to react.

billing.subscription.started.v1
billing.subscription.renewed.v1
billing.subscription.canceled.v1
billing.invoice.issued.v1
billing.payment.succeeded.v1
billing.payment.failed.v1
billing.refund.issued.v1

Versioned in the topic name. Schema in protobuf, registered in a schema registry. Consumers (notifications, analytics, CRM, the dashboard) subscribe to whichever topics they need.

The split between sync command, sync query, and async event is the most important architectural decision in the whole extraction. Get it wrong and you have a spaghetti dependency graph in three months.

Common Pitfalls

Treating gRPC as RPC instead of a contract. A gRPC method is a public commitment. Once a consumer integrates against IssueRefund, you cannot change the request shape without versioning. Treat each method addition like an API addition: design, review, document.

Letting upstream callers reach into your DB. A read-only Postgres user with access to billing’s tables seems harmless. It isn’t. The moment another team writes a query against your subscriptions table, your schema is a public API and you can never refactor it. Force every read through the gRPC API even when it’s annoying.

Emitting events that mirror your DB schema. The subscription_started event should not contain every column of the subscriptions table. It should contain the business fact — who started subscribing, to what plan, when, at what price. The internal columns (foreign keys, soft-delete flags, billing-cycle-anchor timestamps) stay private.

Skipping event versioning until the first breaking change. Always include .v1 in the topic name. The day you need a v2, you’ll thank yourself for not having to migrate every consumer simultaneously.

Believing “we’ll add the API later.” If the rest of the system needs a piece of data, and your billing service doesn’t expose it, the rest of the system will get it some other way. That other way is always worse. Ship the API the first time someone asks for it, even if the implementation is “select from the table.”

Wrapping Up

Billing is the test case for whether your team can do microservices honestly. If the boundary is sharp, the rest of the year is mostly mechanical: extract, shadow, cutover. If the boundary is fuzzy, you’ll spend twelve months relitigating it. Spend a full week on the boundary design before writing any code. Next post: REST vs gRPC for the inter-service calls, and why we picked gRPC for billing specifically.