Underneath every HTTP library, DB driver, and gRPC client is a socket. How that socket talks to the OS, why names like epoll / kqueue exist, and how Node.js — single-threaded — handles 10,000 connections: this guide covers the mechanism.
Socket — A File Descriptor for the Network
// C (Linux)
int sock = socket(AF_INET, SOCK_STREAM, 0); // create TCP socket
connect(sock, &server_addr, sizeof(server_addr)); // connect
write(sock, "GET / HTTP/1.1\r\n\r\n", 18); // send
read(sock, buf, 4096); // receive
close(sock);Unix's core abstraction — everything is a file. A socket is just a file descriptor (integer ID). Use the same read / write / close API.
TCP vs UDP — A Trade-off in Guarantees
| Property | TCP | UDP |
|---|---|---|
| Connection | 3-way handshake | connectionless (datagrams) |
| Order | Guaranteed | None |
| Delivery | Retransmit, guaranteed | Best-effort, may drop |
| Congestion control | Yes | None (app's responsibility) |
| Header overhead | 20 bytes | 8 bytes |
| Use case | HTTP, DB, file transfer | DNS, VoIP, games, video |
HTTP/3 (QUIC) builds reliable + multiplexing on UDP — avoiding TCP's head-of-line blocking.
TCP 3-Way Handshake
Client Server
│ │
│──── SYN (seq=x) ──────────────→│
│ │
│←──── SYN-ACK (seq=y, ack=x+1)─│
│ │
│──── ACK (ack=y+1) ────────────→│
│ │
│── data flow ──────────────────│Latency of 1.5 × RTT. New connection per request is expensive → keep-alive / connection pools.
The C10K Problem — The 1999 Wall
Can one server handle 10,000 concurrent connections? Dan Kegel's 1999 essay made it famous. The standard models of the time couldn't.
Model 1 — Thread per Connection (Traditional)
while (1) {
int client = accept(server_sock, ...);
pthread_create(&tid, NULL, handle, &client);
// a thread per connection
}- ~8 MB stack per thread → 10,000 threads = 80 GB RAM 😱
- Heavy context-switch overhead
- Scheduler pressure
Model 2 — select() / poll() (1990s)
fd_set readfds;
FD_ZERO(&readfds);
FD_SET(sock1, &readfds);
FD_SET(sock2, &readfds);
// ... 10,000 sockets
select(maxfd + 1, &readfds, NULL, NULL, &timeout);
// → tells which fds are readable
for (int i = 0; i < maxfd; i++) {
if (FD_ISSET(i, &readfds)) { /* handle i */ }
}- One thread handles N sockets (event loop)
- But the entire fd list is copied to the kernel on every call — O(N) overhead
- FD_SETSIZE cap (usually 1024)
epoll — Linux's Answer (2002)
int epfd = epoll_create1(0);
struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = sock;
epoll_ctl(epfd, EPOLL_CTL_ADD, sock, &ev); // register once
// ... register others too
while (1) {
int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
// → only ready fds returned (O(ready), not O(N))
for (int i = 0; i < n; i++) {
int fd = events[i].data.fd;
// handle fd
}
}- O(1) readiness — kernel tracks fd state changes internally, returns only what changed
- Virtually unlimited fd count
- edge-triggered vs level-triggered mode
BSD/macOS has kqueue, Solaris had /dev/poll, Windows has IOCP — same idea, different APIs.
Blocking vs Non-blocking — Two Faces of read
// Blocking (default)
int n = read(sock, buf, 4096);
// thread blocks until data arrives
// Non-blocking
fcntl(sock, F_SETFL, O_NONBLOCK);
int n = read(sock, buf, 4096);
if (n < 0 && errno == EAGAIN) {
// no data yet — return immediately, do something else
}epoll + non-blocking = one thread serves thousands of connections. Process only the ready fds reported by epoll_wait, then wait again.
Async I/O — POSIX aio, io_uring (Linux 5.1+)
// epoll is "readiness notification" — read/write done by the app
// io_uring offloads the operation itself to the kernel
struct io_uring ring;
io_uring_queue_init(32, &ring, 0);
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_read(sqe, fd, buf, 4096, 0);
io_uring_submit(&ring);
// kernel does the read asynchronously
struct io_uring_cqe *cqe;
io_uring_wait_cqe(&ring, &cqe); // result of completed op
// cqe->res = bytes readEven fewer syscalls than epoll. Eye-popping performance for database / disk-heavy workloads. But app code complexity goes up.
Node.js — Single Thread, 10K Connections
// Essence of Node.js:
[V8 main thread] ──→ event loop
↑
│ (libuv)
│
[epoll / kqueue / IOCP]
App code is single-threaded.
File/socket I/O is offloaded to the kernel; callback runs on completion.
CPU-bound work (big JSON.parse, crypto) → worker_threads.Single thread = no locks. But one long callback freezes the entire event loop. Push CPU-heavy work to worker_threads / external services.
Related Tools
- cURL Builder — build HTTP requests
- IP / CIDR Calculator — IP / CIDR calculations
- HTTP Status Codes — HTTP status code explanations
Common Pitfalls
- TIME_WAIT buildup — ~2 min TIME_WAIT after close. Many short-lived connections → port exhaustion. Use keep-alive / pools.
- Missing SO_REUSEADDR — "Address already in use" on server restart. Set it on the listen socket.
- Nagle vs delayed-ACK deadlock-like latency — small writes wait 200ms. Set
TCP_NODELAY. - Connection pool size — if your DB/Redis client pool is smaller than backend threads, threads wait.
- SYN flood — sending only SYN, never ACK. Defend with SYN cookies.
Wrap-up
The evolution of network programming is "one thread, more connections": thread-per-conn → select → epoll → io_uring. Same pattern: offload more to the kernel so the app can use wait time for other work via callbacks / await / coroutines.
async/await (JS, Rust, C#, Python) is the same essence — coroutines on top of epoll/kqueue, yielding on await, resumed when the kernel signals readiness.