Streaming gRPC for Real Time Data, A Hands On Guide
TL;DR — A working streaming gRPC service needs four things you don’t get for free: explicit cancellation, backpressure, heartbeats, and resumability via cursors. Skip any one and the stream will misbehave at scale.
I’ve shipped three streaming gRPC systems in production. The first one was a market-data fanout. The second was a multi-tenant CDC pipeline. The third was a robotics telemetry feed. All three started the same way: a colleague wrote for { stream.Send(...) }, it worked in dev, and it imploded the moment we put a flaky network between client and server.
The thing nobody tells you when you read the gRPC quickstart is that streams are a state machine on top of a transport that’s already a state machine on top of TCP. The interaction surface is large. The default behaviors are reasonable for a happy path that never happens at scale. This post is the playbook I wish I’d had.
It builds on gRPC deep dive 2025 and assumes you’ve got the basics. We’ll cover all three streaming flavors, the patterns that keep them alive in production, and the specific failure modes you’ll hit. Code is gRPC-Go 1.71.
1. The Three Streaming Modes, and When to Use Them
gRPC defines three streaming RPC kinds beyond unary:
server-streaming: client ----req----> server
client <---msg------ server (many)
client-streaming: client ----msg-----> server (many)
client <---resp----- server
bidi-streaming: client <--msg-msg--> server (independently)
- Server-streaming is for “subscribe to a thing”. Use it for event feeds, log tail, watch APIs.
- Client-streaming is for upload aggregation. Useful for batch uploads with a single response.
- Bidi-streaming is for genuinely interactive flows: chat, telemetry with control plane, RTC signaling.
If you’re not sure which one you need, you probably want server-streaming. It’s the easiest to reason about, the easiest to load-balance, and the easiest to resume.
2. Schema for a Resumable Server Stream
The contract matters more than the implementation. Here’s a watch API I’ve used in production with minor variations:
syntax = "proto3";
package events.v1;
option go_package = "example.com/events/gen/events/v1;eventsv1";
service Watcher {
rpc Watch(WatchRequest) returns (stream WatchEvent) {}
}
message WatchRequest {
string topic = 1;
// Resume from a cursor previously emitted by the server.
string cursor = 2;
// Filter on event kinds.
repeated string kinds = 3;
}
message WatchEvent {
oneof payload {
Data data = 1;
Heartbeat heartbeat = 2;
EndOfBacklog end_of_backlog = 3;
}
// Always present, advances monotonically.
string cursor = 10;
}
message Data {
string kind = 1;
bytes body = 2;
int64 emitted_at_unix_micros = 3;
}
message Heartbeat { int64 server_unix_micros = 1; }
message EndOfBacklog {}
Three things matter here:
- The cursor is part of every event, including heartbeats. Clients can resume from any received message.
- Heartbeats are first-class. The client knows the stream is alive even when the underlying topic is quiet.
EndOfBacklogsignals the transition from history replay to live tail. Consumers often want to behave differently in each phase.
3. Server Implementation with Backpressure
package main
import (
"context"
"time"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
pb "example.com/events/gen/events/v1"
)
type Server struct {
pb.UnimplementedWatcherServer
store EventStore
pubsub PubSub
}
func (s *Server) Watch(req *pb.WatchRequest, stream pb.Watcher_WatchServer) error {
ctx := stream.Context()
// 1. Backlog replay from cursor.
if err := s.replay(ctx, req, stream); err != nil {
return err
}
if err := stream.Send(&pb.WatchEvent{Payload: &pb.WatchEvent_EndOfBacklog{
EndOfBacklog: &pb.EndOfBacklog{},
}}); err != nil {
return err
}
// 2. Subscribe to live events.
sub, err := s.pubsub.Subscribe(ctx, req.GetTopic())
if err != nil {
return status.Errorf(codes.Internal, "subscribe: %v", err)
}
defer sub.Close()
hb := time.NewTicker(15 * time.Second)
defer hb.Stop()
for {
select {
case <-ctx.Done():
return ctx.Err()
case ev, ok := <-sub.Events():
if !ok {
return status.Error(codes.Aborted, "subscription closed")
}
if !matches(req.GetKinds(), ev.Kind) {
continue
}
if err := stream.Send(toProto(ev)); err != nil {
return err
}
case <-hb.C:
if err := stream.Send(&pb.WatchEvent{
Payload: &pb.WatchEvent_Heartbeat{
Heartbeat: &pb.Heartbeat{ServerUnixMicros: time.Now().UnixMicro()},
},
Cursor: s.currentCursor(),
}); err != nil {
return err
}
}
}
}
func (s *Server) replay(ctx context.Context, req *pb.WatchRequest, stream pb.Watcher_WatchServer) error {
it := s.store.Scan(req.GetTopic(), req.GetCursor())
for it.Next(ctx) {
ev := it.Value()
if !matches(req.GetKinds(), ev.Kind) {
continue
}
if err := stream.Send(toProto(ev)); err != nil {
return err
}
}
return it.Err()
}
func matches(filters []string, kind string) bool {
if len(filters) == 0 {
return true
}
for _, f := range filters {
if f == kind {
return true
}
}
return false
}
3.1 Where’s the Backpressure?
stream.Send blocks when the underlying HTTP/2 stream’s send window is full. That’s your backpressure. The pubsub channel is bounded — if sub.Events() blocks because the publisher is faster than this subscriber, you’ve correctly slowed down the producer.
The pattern to avoid is decoupling them with an unbounded buffer:
// BAD
buf := make(chan *Event, 1<<20)
go func() { for ev := range sub.Events() { buf <- ev } }()
for ev := range buf { stream.Send(...) }
That converts a slow consumer into an OOM. If you need decoupling, use a small bounded channel and explicitly drop or summarize on overflow.
4. Client Implementation with Resume
package main
import (
"context"
"errors"
"io"
"log/slog"
"time"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
pb "example.com/events/gen/events/v1"
)
func consume(ctx context.Context, conn *grpc.ClientConn, topic string) error {
client := pb.NewWatcherClient(conn)
cursor := ""
for {
err := watchOnce(ctx, client, topic, &cursor)
if errors.Is(err, context.Canceled) {
return err
}
if err == nil {
return nil
}
s, _ := status.FromError(err)
if s.Code() == codes.Unavailable || s.Code() == codes.DeadlineExceeded {
slog.Warn("stream broken, resuming", "cursor", cursor, "err", err)
backoff(ctx)
continue
}
return err
}
}
func watchOnce(ctx context.Context, client pb.WatcherClient, topic string, cursor *string) error {
stream, err := client.Watch(ctx, &pb.WatchRequest{Topic: topic, Cursor: *cursor})
if err != nil {
return err
}
for {
ev, err := stream.Recv()
if err == io.EOF {
return nil
}
if err != nil {
return err
}
switch p := ev.GetPayload().(type) {
case *pb.WatchEvent_Data:
handle(p.Data)
case *pb.WatchEvent_Heartbeat:
// liveness only
case *pb.WatchEvent_EndOfBacklog:
slog.Info("caught up")
}
if ev.GetCursor() != "" {
*cursor = ev.GetCursor()
}
}
}
func backoff(ctx context.Context) {
select {
case <-ctx.Done():
case <-time.After(500 * time.Millisecond):
}
}
func handle(d *pb.Data) { _ = d }
The key invariant: cursor only advances after the client has actually processed the event. If you advance the cursor in Recv() but the handler crashes before persisting, you’ve lost the event on resume.
5. Bidirectional Streams with Control Messages
Bidi is for cases where the client also needs to send messages mid-stream. The shape that works is one goroutine per direction, plus a context that ties them together.
func (s *Server) Channel(stream pb.Chat_ChannelServer) error {
ctx := stream.Context()
errCh := make(chan error, 2)
// Receive loop
go func() {
for {
msg, err := stream.Recv()
if err == io.EOF {
errCh <- nil
return
}
if err != nil {
errCh <- err
return
}
s.dispatch(msg)
}
}()
// Send loop
go func() {
out := s.subscribe(stream.Context())
for {
select {
case <-ctx.Done():
errCh <- ctx.Err()
return
case msg, ok := <-out:
if !ok {
errCh <- nil
return
}
if err := stream.Send(msg); err != nil {
errCh <- err
return
}
}
}
}()
return <-errCh
}
This pattern is verbose but correct. The two common bugs it avoids are: calling stream.Send concurrently from two goroutines (data race in the framing layer), and leaking goroutines when one side closes.
6. Common Pitfalls
6.1 Forgetting That stream.Send Blocks
In a server-streaming handler, every Send may block until the client’s flow-control window opens. If you’ve spawned 1000 of these handlers and 200 clients are slow, you have 200 goroutines blocked in Send. That’s usually fine, but if you’re holding a per-stream mutex during the call, you’ve serialized the world.
6.2 No Heartbeat
Without heartbeats, a network split that drops TCP keepalives but doesn’t reset connections leaves the stream “alive” indefinitely. Clients get stuck. Always send a heartbeat with a known cadence, and have the client time out if it doesn’t see one for 2-3x that cadence.
6.3 Cursor Drift After Filtering
If the server filters events before sending and advances the cursor across filtered-out events, the client’s cursor matches what the server has on resume. If the server only advances on sent events, the client’s view skips. Pick one model and document it. I prefer “cursor advances over all events, client sees only matched ones.”
6.4 Long Streams Behind Short Idle Proxies
Cloud load balancers love killing idle TCP connections. AWS NLB default is 350s. If your stream is quiet for that long, the LB resets it without telling either side. Heartbeats fix this if their cadence is shorter than the LB idle timeout.
6.5 Single-Stream Bottlenecks
One stream per subscriber doesn’t scale past a few thousand subscribers per pod. The HTTP/2 server has a MaxConcurrentStreams cap, and each stream costs goroutines and buffers. Shard subscribers across pods, or use fewer streams with multiplexed payloads (a single bidi stream that carries many logical subscriptions).
7. Troubleshooting
7.1 Client Sees Internal: stream terminated by RST_STREAM
The server closed the stream abruptly. Check server logs around that timestamp. Common causes: an unrecovered panic in the handler, a context deadline exceeded from a deadline set on the server-side context, or the HTTP/2 layer hitting MaxConcurrentStreams.
7.2 Memory Growth on the Server
Almost always a slow consumer with an unbounded buffer in front of stream.Send. Profile with pprof’s goroutine and heap views; look for goroutines parked in channel send or Send. The fix is bounded channels and shedding policy.
7.3 Resume Skips Events
The cursor isn’t monotonic on the server, or the client advanced it before persisting. Verify with a synthetic test that sends events 1..N, drops the connection mid-stream, and checks that the resumed client sees every event exactly once across the join.
8. What’s Next
Streaming gRPC isn’t hard, but it’s not the unary path with a loop on top. The model that works is: explicit cursors, heartbeats, backpressure all the way through, and a client that knows how to resume. With those four in place, the streams just keep running.
The grpc-go streaming examples are a good cross-check, though they don’t cover the production patterns above. Next up, we step outside the service and look at how API gateways like Envoy Gateway and Kong fit into this picture, since most external traffic to your gRPC services will pass through one.