How gRPC Actually Works

REST + JSON's biggest strength is human readability. But for internal microservice traffic — where nobody reads the bytes and latency and throughput matter — the cost of JSON parsing, a text format, and HTTP/1.1 head-of-line blocking shows up. gRPC, open-sourced by Google in 2015, removes these costs via protobuf's binary wire format, HTTP/2 multiplexed streams, and code generation. This guide covers how gRPC actually works, its four RPC modes, where it beats and loses to REST, and why you still can't call it directly from a browser.

The Big Picture

.proto file (schema)
       │
       │  protoc + plugin
       ▼
   client stub          server skeleton
   (Java/Go/Py/…)       (Java/Go/Py/…)
       │                     ▲
       │ method call         │ method impl
       ▼                     │
   ┌─────────────────────────────┐
   │   gRPC runtime              │
   │ ┌─────────────────────────┐ │
   │ │ protobuf encode/decode  │ │
   │ ├─────────────────────────┤ │
   │ │ HTTP/2 frames           │ │  ← multiplexed streams
   │ ├─────────────────────────┤ │
   │ │ TCP + TLS               │ │
   │ └─────────────────────────┘ │
   └─────────────────────────────┘

Key idea: developer defines one .proto → stubs generated in both
languages → calling those stub methods looks like "a regular function"
but is actually an RPC.

.proto and Code Generation

// user.proto
syntax = "proto3";

package myapp;

service UserService {
  rpc GetUser (GetUserRequest) returns (User);
  rpc ListUsers (ListUsersRequest) returns (stream User);     // server stream
  rpc UpdateProfile (stream ProfilePatch) returns (User);     // client stream
  rpc Chat (stream Message) returns (stream Message);         // bidi
}

message GetUserRequest { int64 id = 1; }
message User { int64 id = 1; string name = 2; string email = 3; }

// Compile
protoc --go_out=. --go-grpc_out=. user.proto
protoc --python_out=. --grpc_python_out=. user.proto

// Use the Go client
client := pb.NewUserServiceClient(conn)
user, err := client.GetUser(ctx, &pb.GetUserRequest{Id: 42})
// → Looks like a normal call, but internally: protobuf encode + HTTP/2 call

Protobuf — Why It Beats JSON

JSON:
  {"id":42,"name":"jade","email":"x@y.com"}
  → 38 bytes, expensive to parse (string → number, key matching)

Protobuf (wire format):
  08 2a 12 04 6a 61 64 65 1a 07 78 40 79 2e 63 6f 6d
   │  │  │  │       jade      │       x@y.com
   │  │  │  │                 field 3, length-delimited (7 bytes)
   │  │  field 2, length-delimited (4 bytes)
   │  varint(42) = id value
   field 1, varint (tag = 1<<3 | 0)

17 bytes total — roughly half.

Pros:
- Compact (less network)
- Fast (direct field-number mapping, no key matching)
- Schema enforced (zero runtime typos)

Cons:
- Not human-readable — need grpcurl / proto reflection for debugging
- Useless without the schema — old binary logs are hard to parse

HTTP/2 — gRPC's Second Foundation

HTTP/1.1:
  One TCP connection = one in-flight request (head-of-line blocking)
  100 requests = 100 connections or 100 sequential trips

HTTP/2:
  Multiplexed streams over one TCP connection — 100 concurrent requests
  Binary framing — no text parsing
  Header compression (HPACK) — repeated header costs ↓
  Server push (rarely used)

gRPC mapping:
  1 RPC = 1 HTTP/2 stream
  request / response = HEADERS frame + DATA frames + trailer

  → Thousands of concurrent RPCs over a single connection. Connection
    setup (TLS handshake) happens just once.

cf. REST + HTTP/1.1: a round trip per request, mitigated by keep-alive
    but no multiplexing. REST over HTTP/2 inherits some of the benefit.

The Four RPC Modes

# 1. Unary — most common, equivalent to a single REST call
  rpc GetUser (Req) returns (Resp);
  client → 1 request, server → 1 response.

# 2. Server Streaming
  rpc ListUsers (Req) returns (stream User);
  client → 1 request, server → N responses (one stream).
  Use: large result sets, progress updates, server-side push.

# 3. Client Streaming
  rpc Upload (stream Chunk) returns (UploadResult);
  client → N requests (one stream), server → 1 response (at the end).
  Use: file uploads, ingesting sensor data.

# 4. Bidirectional Streaming
  rpc Chat (stream Msg) returns (stream Msg);
  client ↔ server, two independent streams.
  Use: chat, real-time games, collaborative editing.

→ REST over plain HTTP needs SSE / WebSocket / long-poll as separate
  mechanisms for streaming. gRPC handles all four modes in one
  framework.

Deadlines · Cancellation · Metadata

# Deadline (timeout)
  ctx, cancel := context.WithTimeout(ctx, 200*time.Millisecond)
  client.GetUser(ctx, ...)
  → Auto-cancel if no response in 200ms.
  → Propagation: if that server makes another gRPC call, the deadline
    is inherited → cascade timeouts handled naturally.

# Cancellation
  Client cancels → the server stream gets a cancel signal too.
  → Halt unnecessary work immediately (long queries, big responses).

# Metadata (equivalent to headers)
  md := metadata.Pairs("authorization", "Bearer …", "trace-id", "abc")
  ctx := metadata.NewOutgoingContext(ctx, md)
  → Same role as REST headers, key/value pairs.

# Status codes
  gRPC has its own status codes (12 + 1) — different from HTTP statuses.
  OK / CANCELLED / DEADLINE_EXCEEDED / NOT_FOUND / PERMISSION_DENIED /
  RESOURCE_EXHAUSTED / UNAVAILABLE / INTERNAL …

vs REST — When to Use What

Axis	REST + JSON	gRPC + protobuf
Payload size	Large (text)	Small (binary)
Parse cost	High	Low
Schema enforcement	OpenAPI, separate	.proto is the schema
Streaming	SSE / WS, separate	Native, 4 modes
Multiplexing	Needs HTTP/2	Built-in
Human readable	Yes (curl works)	No (grpcurl etc.)
Browser support	Native fetch	No (needs gRPC-Web)
Cache-friendly	HTTP cache works	No (POST-only)
External exposure	Standard (familiar)	Rare (usually via gateway)

Browser Limits — gRPC-Web

Browser fetch / XHR can't reach some HTTP/2 features (trailer headers,
raw frame control). So you can't call pure gRPC directly.

Solution: gRPC-Web
- Browser ↔ proxy (Envoy / grpc-web-proxy) speaks HTTP/1.1 or a
  restricted HTTP/2 subset
- proxy ↔ backend speaks real gRPC
- Some streaming modes (client / bidi) are unsupported or hacky

→ So public-facing APIs are usually REST or GraphQL, and gRPC is
  common only for internal service-to-service traffic.

→ Connect-RPC / Twirp are alternative designs that work directly from
  browsers (HTTP/1.1 + JSON too).

Common Pitfalls

Schema breaking changes — protobuf field numbers are forever. Never reuse them. Mark deletions as reserved. Type changes also break compatibility (int32 → int64 OK, int32 → string not OK).
HTTP status vs gRPC status confusion — a successful gRPC transport is HTTP 200; the actual OK / NOT_FOUND comes back as a gRPC status code in the trailer. Monitoring has to look at both.
Load balancer compatibility — HTTP/2's multiplexed connections don't play well with L4 load balancers (one connection sticks to one server → unbalanced load). Use L7 (Envoy, nginx 1.13+) or client-side load balancing (xDS).
No deadline — if the client doesn't set a deadline, the call can hang. The standard pattern is to require deadlines on every RPC.
Simplistic error model — 13 gRPC status codes aren't enough for domain errors. Attach structured errors via google.rpc.Status's details (Any).
Auth for external exposure — gRPC supports Bearer tokens via metadata. But IAM / OAuth2 integration is custom code. See oauth2-explained.
Generated-code build burden — every .proto change requires regenerating stubs in every language. Manage with monorepo + buf or similar.

Wrap-up

gRPC's strength in one line: strong schemas + binary wire + HTTP/2 multiplexing + streaming + multi-language code generation. When all four stack up (many internal services, polyglot, high throughput), gRPC is overwhelmingly better than REST.

Conversely, for external public APIs, direct browser callers, and curl-heavy debugging, REST + JSON still wins. The practical pattern: gRPC inside, an edge gateway that translates to REST/GraphQL for the outside world. Cache, rate-limit, and other HTTP-based concerns (cors-explained, rate-limiting-strategies) live at that gateway.