gRPC Deep Dive in 2025, Patterns for High Throughput Services

API article cover illustration on a gradient background

July 7, 2025 · 8 min read · by Muhammad Amal programming

TL;DR — gRPC-Go 1.71 ships with smarter flow control defaults, but the wins still come from how you configure HTTP/2, pool sub-connections, and shape your payloads. The patterns below cut p99 latency by half on a service I shipped last quarter.

I’ve been running gRPC in production since 2017. The protocol hasn’t fundamentally changed, but the runtimes around it have, and the failure modes keep mutating. What worked at 2k RPS on a single pod stops working at 40k RPS across 30 pods, and the usual culprit isn’t the network. It’s the way the client SDK keeps a single HTTP/2 stream pinned to a backend that’s already saturated.

This walkthrough is the one I’d hand to a mid-level engineer joining a team that already has gRPC-Go services in production. I’ll skip the “what is RPC” bits and focus on the patterns that pay rent. Code targets gRPC-Go 1.71 (released April 2025) with Go 1.24, and I’ll flag where the new experimental.WithRecvBufferPool API changes the calculus.

If you’ve read my earlier piece on edge computing with k3s , some of the same network-tuning instincts apply, but the kernel paths gRPC exercises are different enough that the knobs don’t transfer one-to-one. We’ll start with the wire and work outward to the application layer.

1. Why gRPC Still Wins, and Where It Loses

HTTP/2 multiplexing is the headline feature, but the actual reason gRPC keeps showing up in new architectures is the schema contract plus codegen. You get the wire-level efficiency of binary protobuf and the type-safety of generated stubs in one package. JSON-over-HTTP doesn’t compete on either axis.

Where it loses: browsers, polyglot teams without protobuf discipline, and any environment where you can’t terminate HTTP/2 cleanly. The first problem is what Connect-Go solves (see the Connect-Go tutorial for that path). The second is a process problem dressed up as a tech problem. The third is usually a load balancer issue, and it’s worth understanding before you touch any code.

1.1 Connection Topology Matters More Than Codec Choice

                +---------+        +-----------+
client SDK ---> | L4 LB   |  --->  | backend A | <-- 1 long-lived HTTP/2 conn
                | (TCP)   |  -+    +-----------+
                +---------+   |    +-----------+
                              +--> | backend B | <-- ignored until reconnect
                                   +-----------+

A naive L4 load balancer pins a client’s TCP connection to whichever backend it picked first. HTTP/2 keeps that connection alive for hours. The other backends sit cold while one melts. The fix is either client-side load balancing (round_robin resolver in gRPC-Go) or an L7 proxy that splits at the request layer.

2. gRPC-Go 1.71 Server Setup, Done Right

Here’s the server skeleton I start every new service with. It looks boring on purpose.

package main

import (
	"context"
	"log/slog"
	"net"
	"os"
	"os/signal"
	"syscall"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/keepalive"
	"google.golang.org/grpc/reflection"
)

func main() {
	lis, err := net.Listen("tcp", ":9090")
	if err != nil {
		slog.Error("listen", "err", err)
		os.Exit(1)
	}

	srv := grpc.NewServer(
		grpc.MaxConcurrentStreams(1000),
		grpc.MaxRecvMsgSize(8*1024*1024),  // 8 MiB
		grpc.MaxSendMsgSize(16*1024*1024), // 16 MiB
		grpc.KeepaliveParams(keepalive.ServerParameters{
			MaxConnectionIdle:     5 * time.Minute,
			MaxConnectionAge:      30 * time.Minute,
			MaxConnectionAgeGrace: 30 * time.Second,
			Time:                  20 * time.Second,
			Timeout:               5 * time.Second,
		}),
		grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
			MinTime:             10 * time.Second,
			PermitWithoutStream: true,
		}),
	)

	reflection.Register(srv) // gate behind a build tag in prod

	go func() {
		slog.Info("serving", "addr", lis.Addr().String())
		if err := srv.Serve(lis); err != nil {
			slog.Error("serve", "err", err)
		}
	}()

	stop := make(chan os.Signal, 1)
	signal.Notify(stop, syscall.SIGINT, syscall.SIGTERM)
	<-stop

	ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
	defer cancel()
	done := make(chan struct{})
	go func() {
		srv.GracefulStop()
		close(done)
	}()
	select {
	case <-done:
	case <-ctx.Done():
		srv.Stop()
	}
}

A few things to call out:

MaxConnectionAge is the single most important knob if you’re behind an L4 LB. It forces clients to redial periodically, which redistributes load to backends that came up later.
MaxConcurrentStreams defaults to math.MaxUint32 in gRPC-Go. That’s a footgun. Cap it.
The enforcement policy’s MinTime must be lower than every client’s keepalive Time, or the server sends GOAWAY ENHANCE_YOUR_CALM and clients reconnect in a thundering herd.

3. Client-Side Patterns That Move p99

The client is where most teams leave performance on the table. The default grpc.Dial gives you one sub-connection. For high throughput, you want multiple.

package main

import (
	"context"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/grpc/keepalive"
)

func newClient(target string) (*grpc.ClientConn, error) {
	return grpc.NewClient(
		target,
		grpc.WithTransportCredentials(insecure.NewCredentials()),
		grpc.WithDefaultServiceConfig(`{
			"loadBalancingConfig": [{"round_robin":{}}],
			"methodConfig": [{
				"name": [{"service": "orders.v1.OrderService"}],
				"retryPolicy": {
					"maxAttempts": 4,
					"initialBackoff": "0.1s",
					"maxBackoff": "2s",
					"backoffMultiplier": 2.0,
					"retryableStatusCodes": ["UNAVAILABLE", "DEADLINE_EXCEEDED"]
				}
			}]
		}`),
		grpc.WithKeepaliveParams(keepalive.ClientParameters{
			Time:                30 * time.Second,
			Timeout:             5 * time.Second,
			PermitWithoutStream: true,
		}),
	)
}

func call(ctx context.Context, conn *grpc.ClientConn) error {
	ctx, cancel := context.WithTimeout(ctx, 2*time.Second)
	defer cancel()
	// ... use generated stub
	_ = conn
	return nil
}

3.1 Sub-Connection Pooling

For services pushing more than 5k RPS through a single client process, one HTTP/2 connection becomes the bottleneck. The server’s MaxConcurrentStreams cap kicks in, and you start seeing HTTP/2 stream exhausted errors. The fix is a pool of *grpc.ClientConns with a hash or round-robin selector. Don’t use the same target string for each — use DNS records with multiple A entries, or set up a manual.Resolver and hand it endpoints explicitly.

3.2 Deadlines Are Not Optional

Every RPC needs a deadline. Every one. If you forget, your service will eventually wedge behind a downstream that’s slow but not dead. I enforce this with a server interceptor that rejects calls with no deadline set:

func deadlineRequired(ctx context.Context, req interface{},
	info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
	if _, ok := ctx.Deadline(); !ok {
		return nil, status.Error(codes.InvalidArgument, "deadline required")
	}
	return handler(ctx, req)
}

4. Flow Control and Buffer Sizing

HTTP/2 has per-stream and per-connection flow control windows. The defaults in gRPC-Go (64 KiB initial, dynamically growing) are conservative for fat pipes. On a 10 Gbps internal link with 1 ms RTT, you’ll cap out at ~64 MB/s per stream until the window grows. For batch transfers, set the initial window higher:

srv := grpc.NewServer(
	grpc.InitialWindowSize(1 << 20),       // 1 MiB per stream
	grpc.InitialConnWindowSize(16 << 20),  // 16 MiB per conn
)

The new experimental.WithRecvBufferPool (added in gRPC-Go 1.66, hardened in 1.71) lets you reuse receive buffers across RPCs. For a service hot path that allocates 1 MB messages, this cut my GC time by ~30%. It’s still experimental — read the package docs and test under load.

5. Streaming, Carefully

Server-streaming and bidirectional streaming are the features that drew a lot of teams to gRPC, then bit them. I cover streaming patterns in depth in the streaming gRPC guide , but here are the load-bearing rules:

Always provide a way to cancel from both sides. Long-lived streams without cancellation paths are how memory leaks ship.
Backpressure is your job. gRPC won’t push back on the producer if the consumer is slow; the stream’s send buffer grows.
Heartbeats inside the stream are cheaper and more reliable than HTTP/2 keepalive for detecting half-open streams.

func (s *Server) Watch(req *pb.WatchRequest, stream pb.Watcher_WatchServer) error {
	ticker := time.NewTicker(15 * time.Second)
	defer ticker.Stop()
	events := s.subscribe(req.GetTopic())
	defer s.unsubscribe(events)

	for {
		select {
		case <-stream.Context().Done():
			return stream.Context().Err()
		case ev := <-events:
			if err := stream.Send(ev); err != nil {
				return err
			}
		case <-ticker.C:
			if err := stream.Send(&pb.Event{Kind: pb.Event_HEARTBEAT}); err != nil {
				return err
			}
		}
	}
}

6. Common Pitfalls

A non-exhaustive list of footguns I’ve stepped on, or watched colleagues step on.

6.1 Treating ClientConn as Throwaway

grpc.NewClient is expensive. It does DNS, builds sub-connections, and negotiates TLS. Creating one per request kills throughput. Build them once at startup, reuse forever, close on shutdown.

6.2 Forgetting `MaxConnectionAge` Behind an L4 LB

Without this, your shiny new pods stay cold for hours after a deploy. Set it to something between 5 and 30 minutes depending on traffic shape.

6.3 Returning Raw Errors from Handlers

return errors.New("bad thing") becomes code = Unknown on the wire. Always wrap with status.Error(codes.X, msg), or define a small helper. Clients can’t make retry decisions on Unknown.

6.4 Mixing TLS Modes in a Mesh

If half your services do mTLS and the other half terminate TLS at a sidecar, your incident on-call rotation will discover this at 3 a.m. Pick one model per environment.

6.5 Trusting Reflection in Production

Server reflection is great for grpcurl in dev. In prod, it leaks your schema to anyone who can reach the port. Gate it behind a build tag.

7. Troubleshooting

When a gRPC service misbehaves, the symptoms cluster into a few buckets.

7.1 p99 Spikes But p50 Is Fine

Almost always head-of-line blocking on a single sub-connection, or GC pauses from large message allocations. Check grpc.io/client/sent_messages_per_rpc distribution, and look at runtime/metrics GC pause histograms. The fix is sub-connection pooling plus WithRecvBufferPool.

7.2 `code = Unavailable desc = transport is closing`

Three causes, in order of likelihood: the server hit MaxConnectionAge and sent GOAWAY (benign, client retries), the keepalive enforcement policy kicked in (fix MinTime mismatch), or the LB dropped the connection (check LB idle timeouts). The gRPC log channel at verbosity 2 prints the GOAWAY reason.

7.3 Streams Hang Forever

Either no deadline, or the consumer stopped reading and the send buffer is full. The first is a code review failure. The second needs an if err := stream.Send(x); err != nil { return err } discipline that nobody bypasses with _ =.

8. Wrapping Up

gRPC in 2025 is not a new technology, and that’s the point. The boring patterns — deadlines everywhere, connection age limits, sub-connection pooling, status codes — buy you most of the throughput. The exotic ones (custom resolvers, buffer pools, xDS) are worth reaching for when you’ve already done the boring work.

The canonical reference for everything in this post is the gRPC-Go repo , which is now better-documented than it was two years ago. Read the examples/ directory before you write your own interceptors. And if you’re starting fresh, weigh Connect-Go before you commit to vanilla gRPC — the ergonomics are better, and the wire compatibility is real. Next on the list: streaming patterns for real-time data, where the rules change subtly but consequentially.

1. Why gRPC Still Wins, and Where It Loses

1.1 Connection Topology Matters More Than Codec Choice

2. gRPC-Go 1.71 Server Setup, Done Right

3. Client-Side Patterns That Move p99

3.1 Sub-Connection Pooling

3.2 Deadlines Are Not Optional

4. Flow Control and Buffer Sizing

5. Streaming, Carefully

6. Common Pitfalls

6.1 Treating ClientConn as Throwaway

6.2 Forgetting MaxConnectionAge Behind an L4 LB

6.3 Returning Raw Errors from Handlers

6.4 Mixing TLS Modes in a Mesh

6.5 Trusting Reflection in Production

7. Troubleshooting

7.1 p99 Spikes But p50 Is Fine

7.2 code = Unavailable desc = transport is closing

7.3 Streams Hang Forever

8. Wrapping Up

Related posts

gRPC for Internal Services in Go, A buf Powered Workflow

OpenTelemetry for gRPC Services in Go, A Production Walkthrough

Streaming gRPC for Real Time Data, A Hands On Guide

Connect Go for Browser Friendly gRPC, A Production Tutorial

OpenAPI First API Design in Go, oapi-codegen in 2024

Securing Go Microservices with JWT, Patterns That Hold Up

Adapters in Go, HTTP, gRPC, and Worker Patterns

Communicating Between Go Microservices, REST vs gRPC in 2022

Let’s Start a Project

6.2 Forgetting `MaxConnectionAge` Behind an L4 LB

7.2 `code = Unavailable desc = transport is closing`